ETL, FTL

Written in

by

ETL, or Extract, Transform, and Load, is the process of uploading and cleaning data before deriving insights and analysis from it. A callback to last week’s post: I had done this before, but never knew there was a name for it. Another hazard of learning independently.

This is normally considered the ETL Pipeline, because it’s what data has to undergo before you can really do anything with it.

One of the things the books and courses I’ve been part of has said about this process over the years is that it is about 70% of what data analysts and data scientists do with their time, because it’s so extensive.

Also, part of my sign off tagline being “I hope your data cleaning is swift” is referring to this. Usually it doesn’t tend to be an easy process.

Part of that is due to discrepancies and inconsistencies between pieces of data that’s been wrangled. You have to make sure that everything is in one uniform format before it can be loaded into the database, whichever one you’re using.

One of the courses I’m learning from at the moment is from 2015, so it has me using SQL Server 2014 and Visual Studio 2013. The process of getting the database set up and ready for loading feels a little complicated, because there’s so many parts, but I’m really curious to see how other databases handle this.

One of the bigger questions I had when I was just starting out on this path years ago was, “How do you get all of this info into a database?” I learned how to add things to a database, but I didn’t know how to import full CSV files. One of my novice questions has been answered– and I kind of dislike that it took so long to find an answer. I think that’s more due to the fact that I didn’t know where to look.

The problem I’m having now with this post is the same problem I had with finding information on the ETL pipeline. It isn’t something that comes with a lot of description, because it varies so much depending on your dataset, database, and what you need to do. There’s really not much else to say.

You get your data, you make it look uniform and nice and ensure that it won’t create issues for you later, and then you load it into a database, or data warehouse.

I haven’t added a CSV file to an already set up data warehouse but I assume it includes an extra emphasis in the Transform phase where you need to make sure that you have the same columns and categories as the others, and you have something to base it off of.

It is not lost on me how amusing it is that it’s approximately 70% of the job of data analysis, and yet there’s so little to be seen explaining it. Again, it’s all dependent on what you use, but overall the first two steps are pretty similar across the board.

I am starting a cohort this week, so I’ll hopefully have something fun to post about my learning journey!

Allow me to have an emphasis today on my sign off– I hope your data cleaning is as swift and easy as can be, and I’ll see you next time.

Tags

Leave a comment