Continuous Data Collection and When to Stop

Written in

by

One of my first personal projects I worked on was started at the beginning of the pandemic, with a lot of talk about groceries, scarcity, and scares about prices going up due to supply chain and places shutting down to keep the spread minimal.

This was before I had really thought about where else I could get data from or the kind of projects that would normally be worked on in a professional setting. I set out to collect my own data, as I’d seen so many other people do.

But when the pandemic continues and we’re closing in on three years… Is there a time to stop? Do I cut it off at a specific date, and use only data within specific parameters to fit my questions?

I will admit that I attempted to stop gathering grocery data in July and August, when I was doing the bulk of the work on the project. I would have a set timeline and it would be easy to pore over the data with that in place, then I could make clean visualizations based on it.

Then, receipts would continue to migrate over to my desk, and I’d continue to enter them. Not just that, but every week held some new, interesting thing about what we were getting. We’d massively overshoot our estimated guesses at the checkout, or we’d undershoot and wonder what we missed. Especially with specific dates and events in mind, certain things would be out of season, or supply would be worse due to supply or conflict.

There was always something new to consider, and it got harder to stop collecting data, when every little bit of information felt precious.

I think in this case, the answer is to continue to collect the data as long as it makes sense, and then to use a specific subset of dates for whatever timeline I want to look at.

I’ve seen some pretty interesting visualizations over the years, most notably from reddit.com/r/dataisbeautiful, and a few that stand out to me are sleep trends from FitBits, and someone’s personal bathroom diary that was very rudimentary. Like groceries, they’re just a constant in everyone’s lives, so collecting that data can lead to some interesting insights, especially done over a long period of time.

Like I said, when I started this data collection, it was with the idea that eventually the pandemic would end. That was March of 2020, when things were only just ramping up. Since then it’s been a constant wave of ups and downs, cancellations and going back in person.

Then, you have other circumstances: elections, war, supply chain issues. If I wasn’t just collecting for pandemic interest, I also have a record across all these other global events.

The other thing is, I don’t know what to do with this project now. I realize I’ve done a lo of visualization, but I could still run the data through SQL and get insights that way as well that would help.

What is the end goal for a continuous collection process like this? Do I take what I have now and create a presentation and maybe even a video about this, with a set timeline? Two years in, 2 1/2 years, maybe wait until the 3rd anniversary of the initial shutdown in my area and do it then?

I think I actually might want to make a sort of financial compass for people in our area who are unsure what to do, or for people we know who are starting out on their own and don’t have any kind of benchmarks for cost for budgeting.

From that perspective, I have a few directions to go in. I know at the end of it all I do want to create a presentation and a video, and get really into the nitty gritty. Then I want to talk about the uses for the information. Maybe I could do modeling or create a basic budget based on that information.

Probably I’ll go with the budget. I like modeling okay, but in the end it isn’t a priority for me since my focus is on analysis.

This post was a bit of a “think out loud” kind of thing for me, so I appreciate you sticking around for it. I wanted to get back to my projects since this blog is supposed to be a portfolio, but also to give a look inside how I problem solve and work out issues I might be facing.

As always, I wish you the swiftest, easiest cleaning for your data, and I’ll catch you in the next post!

Tags

Leave a comment