Beginner’s Guide: Personal Projects – Building a Portfolio

Written in

by

One of the hardest things for me in the beginning of my journey with data, back when I was working a different field, was that I thought all of my projects and work had to be innovative. Even before, when I flirted with the idea of going into technical writing, I still thought I had to be fresh and original.

Let me tell you, it was liberating to realize I didn’t need to collect all that data on my own, or come up with some mind blowing, wild questions or problems to solve.

At the end of the day, a project and a portfolio are about two things: How you learn, and your work process. No one needs anything crazy or new or incredible. You can do just a basic time series analysis on, like, shrimp or something.

What matters is that you’re putting in that work, and you’re coming back to it, enough to finish it and show something for it: a written report, a blog post, a presentation done in Canva or PowerPoint.

Before I get too deep into my own blog, I want to share another one that I saw weeks ago that made me feel pretty solid in my choice to make this blog portfolio.

Analysts Should Have Portfolios

Github as your portfolio works just as well if you aren’t someone who does a lot of writing. A couple of options off the top of my head are this platform– WordPress— and Medium, which is more a technical blog space.

There is also creating your own website if that’s more your speed or you have experience with HTML and CSS (at the very least. There’s a lot that goes into web design).

I am still in the development phase of this part, as you can see. I have a page ready to be published with some of the visualizations I’ve created, but I only have so many I’m proud of and feel confident putting on a page with a brief description.

I wanted to give you some options of where to start if you’re feeling a little overwhelmed by all the possibilities, and, like me, if you can’t think small when it comes to data projects.

Titanic Dataset

This one is referenced a lot in different code samples for python graph gallery specifically, and in general for other code examples. You can poke around at both categorical and ordinal (numerical) data, and there’s some fun things possible just for analysis.

Datasets for Machine Learning

I’m going a bit more broad for this one, but once you click through you’ll see why. I wanted to show a dataset that might be better for a machine learning approach, and came across this repository.

What I love about this repository is that it shows you what you could do with the dataset that it shows you, it tells you how big of a dataset it is, and when it was published.

Admittedly I looked for time series because that, to me, is easy to do both analysis and ML algorithms, testing/training with.

Data Engineering Projects

I am, very unfortunately, out of my depth with data engineering, but I wanted to give a wide breadth of options for the three categories I had essentially broken most data roles into.

The thing that intimidates me about even poking my nose into this goes back to some of the hangups I had during the residency program. There are at least three separate pieces of software or programming you’re going to need to have working together, and that can be a challenge on its own.

You need a dataset to start with that you can automate a cleaning process for, then you need a cloud for it to be stored into when that’s done, and then you need a platform that will spit out visualization.

I am very tentatively going to put this youtube video here for you to get an idea of what that looks like, and how to start. Seattle Data Guy shows a pretty decent breakdown of how to get into a project and what to do without taking too long to explain things.

Upon closer inspection, he does have a lot of information about data engineering in general and the channel may be a good resource overall for data engineering.

Run Wild – Finding Any Data Out in the Open

When you feel a bit more confident and want to get into other data, or if you’re already comfortable and want more info on where to get datasets, you have a lot of options to wade through.

Popular places to find datasets:

  1. kaggle.com
  2. Google Dataset Search
  3. data.gov
  4. UCI Machine Learning Repository
  5. Earth Data
  6. WorldData.ai
  7. CERN Open Data Portal
  8. r/datasets on Reddit

If you’re into sports – look at official league websites (NHL, MLB, NBA), they often have so much data collected that you could really do anything.

baseballsavant.com is also an option, as that’s where I went for my baseball data. To that end, there’s always bound to be a website for things you’re interested in. Searching Google is really going to be your best bet for that

I also wanted to mention that there’s always the option of web scraping which I’m personally both interested in and scared of at the same time. It requires working with the API (Application Programming Interface) of a website, then getting uncleaned data and figuring out what you want of what you got.

I do want to recommend collecting data of your own, that is, data that you generate. Whether it’s collecting how many times you go to the bathroom in a month, or your grocery receipts (such as what I did), or even just recording a mood tracker you might put in a bullet journal or something similar. There’s also getting the data from your Spotify or from messages or email.

There is data everywhere and I have by no means covered it all. I know I’ve missed things here and there, but my hope is just to get you started, and you’ll find more than I’ve mentioned.

Overall my hope with this whole series is that you’ll find where you want to be, start somewhere, and define your journey on your own. I’m still exploring myself, but I’ve found what I’m happiest doing– barring data engineering which I’m too intimidated to work on just yet.

Getting a job in data analysis for me isn’t the end goal, it’s just expanding my repertoire. It’s another step in the journey, and I’ll keep going in a direction that makes sense for me. But, the beginning of this is the most important, and sometimes it can be the most fraught.

I hope that you’re able to find the place where you want to be in this data space, there’s always room for you here. I’m proud of you for making this jump, and for doing the work to transition. This is no easy feat, but I’m in your corner!

I’m going to do a wrap-up post as well that will cover any miscellany things that I’ve missed, I think, so look forward to that. My posting frequency will go back to once a week, so expect the next post around Wednesday, and I’ll see you then!

Tags

Leave a comment