Introduction
Many “How to Data Science” courses and articles, including my own, tend to highlight fundamental skills like Statistics, Math, and Programming. Recently, however, I noticed through my own experiences that these fundamental skills can be hard to translate into practical skills that will make you employable.
Therefore, I wanted to create a unique list of practical skills that will make you employable.
The first four skills that I talk about are absolutely pivotal for any data scientist, regardless of what you specialize in. The following skills (5–10) are all important skills but will vary in usage depending on what you specialize in.
For example, if you’re most statistically grounded, you might spend more time on inferential statistics. Conversely, if you’re more interested in text analytics, you might spend more time learning NLP, or if you’re interested in decision science, you might focus on explanatory modeling. You get the point.
With that said, let’s dive into what I believe are the 10 most practical data science skills:
1. Writing SQL Queries & Building Data Pipelines
Learning how to write robust SQL queries and scheduling them on a workflow management platform like Airflow will make you extremely desirable as a data scientist, hence why it’s point #1.
Why? There are many reasons:
- Flexibility: companies like data scientists who can do more than just model data. Companies LOVE full-stack data scientists. If you’re able to step in and help build core data pipelines, you’ll be able to improve the insights that are gathered, build stronger reports, and ultimately make everyone’s lives easier.
- Independence: there will be instances where you need a table or view for a model or a data science project that does not exist. Being able to write robust pipelines for your projects instead of relying on data analysts or data engineers will save you time and make you more valuable.
2. Data Wrangling / Feature Engineering
Whether you’re building models, exploring new features to build, or performing deep dives, you’ll need to know how to wrangle data.
Data Wrangling means transforming your data from one format to another.
Feature Engineering is a form of data wrangling but specifically refers to extracting features from raw data.
It doesn’t necessarily matter how you manipulate your data, whether you use Python or SQL, but you should be able to manipulate your data however you like (within the parameters of what is possible of course).