Learning/Portfolio Spotlight: BigQuery, Big Answers

Written in

by

The Google Data Analytics Professional Cert has been a godsend for a lot of reasons, not least of which is showing me new tools and how to use them, or reminding me about the possibilities of older ones I hadn’t used in a while.

I was a little wary of BigQuery at first, but now I can happily say it’s one of my favorite tools to use.

I will admit I struggled a little bit in the beginning. Not knowing you have to have an API enabled will do that to you, but after searching around and fixing the issue, I was good to go.

Full disclosure, I’m on the 90 day trial at the moment but I’m unsure if my personal projects will make me exceed any significant cost. There is a sandbox option but as far as I’m aware, you can’t create new projects. There are other limitations to the sandbox as well, but it’s totally free and if all you’re going to do is practice SQL queries, it’s a fabulous tool.

Of the other SQL platforms I’ve used, MS SQL Server 2014, and SQLite, BigQuery feels the easiest to get started and set up with. Again, the UI issues if you’re following the Google Cert courses are a bit sticky, but otherwise it feels simple.

This is the dataset I’ve been working with in BigQuery lately: https://www.kaggle.com/datasets/tysonpo/ice-cream-dataset

One thing I will say is that this dataset, adding in the reviews and the data of the brands, it gave me an error. Something about quotes not being matched. When I looked up the error, the advice given was to allow “quoted newlines” in the options when you create the table.

The Ice Cream dataset I linked is what I was using for the Ice Cream Code project I was working on. I’m re-appropriating it for the Google Cert Capstone Project, which I will unfortunately be using R for. That’s a whole other post.

Coming from Kaggle, the CSVs don’t really require a lot of cleaning. I checked for nulls, got a good idea of the shape and contents of the data, and then did what I could to make sure anything didn’t screw with the analysis.

One thing I did want to mention was my process for putting together queries, something I’ll reiterate in the project’s blog post. After I get an idea of what the data looks like, how big the dataset is, what kind of columns there are, I identify the primary and foreign keys, and come up with a full battery of questions. They act as a starting point, and help me feel a little more confident when I go in to execute the queries.

It also gives me time to consider what the question entails and how I could achieve and answer with SQL.

If you’re hesitant to try your hand at SQL outside of learning, I’d recommend doing this. Get your thoughts together, figure out how you might achieve it, then put it in the query editor and see what comes out. If it isn’t what you’re expecting, figure out what you need to change.

It’s a great starting point, and helps you get a better idea of working with SQL all on your own.

Here are some of the queries I came up with, from the original, to the finished product that worked for what I wanted:

SELECT COUNT(rating)
FROM reviews
GROUP BY brand

End result:

SELECT COUNT(stars)
FROM ice_cream.reviews
WHERE brand = ‘brand_name’

Although, I’ll admit there is something frustrating about the final result, because my hope was to get a total count of stars for each brand without having to go through each one individually.

While “GROUP BY brand” did work, it didn’t show me the names of the brands originally. I know there’s got to be a better way, there usually is, but I went the easy, longer route this time.

Once I got answers to all of the questions I had written down originally– taking some out when I realized they didn’t fit with the business problem– I took some time to go through everything I had found out. Compiling that current analysis helped me figure out even more specific questions that would help with the business problem.

Being a perfectionist at heart is hard when you want to work on your skill with literally anything, let alone a language like SQL, which feels relatively intuitive but has a lot of nuances to it. I’ve spent a lot of time learning SQL, but getting into it like this feels easy and more fun than I initially expected.

With BigQuery having an upload that feels simple and easy, and getting used to that as well as using the platform just in general, it makes me want to use the tool to expand my analysis of other projects I considered mostly finished.

I don’t think I could do much with the Frog analysis, mostly since it’s small, but the grocery analysis could definitely benefit from it. And a few others that I just got stuck on.

I always loved SQL and knew it was there for analysis but really working with it for personal projects has shown me the gap that’s been there. A missing piece to my toolkit, which I’d been spending so much time wondering about in the first place: how do you analyze categorical and qualitative data with a programming language?

The answer is, SQL. Load it up and get to poking and asking the data questions. It may not be as pretty as a visualization in R or Python, but it’s a huge step to understanding what’s there.

Thanks for joining me on this journey, as always, and I wish you all the best with your data cleaning and analysis!

Tags

Leave a comment