In Learning

The art of learning data science is not always straightforward. The article below shares how to overcome the occasional plateau throughout the process. CK

Article written by Ken Jee originally appeared in Towards Data Science on March 20, 2020. 

Many people think of learning data science as a linear process. In truth, it is a messy progression that is closer to a series of plateaus.

Perceived vs Realistic Data Science Learning Curve

Luckily, there is already a pretty good generalized learning model that explains how these plateaus work. This model was created in the 1970’s by Noel Burch, and was expanded on by Robert Greene in his book Mastery. According to this framework, we go through 5 learning stages: Unconscious Incompetence, Conscious Incompetence, Conscious Competence, Unconscious Competence, and Mastery. Learning data science follows this process as well.

Learning Stages According to Noel Burch & Robert Greene

In this article, I explain what each of these stages looks like in the frame of data science. I talk about my experience at each stage and also include some of the quickest ways I’ve found to get “unstuck” from one to the next.

At this level you are just starting to get into the data science field. You have little to no coding or statistics experience. Stage 1 is generally characterized by overconfidence. Many people think that the field can be learned after taking a course or two.

This is generally how I felt when coming out of college. Armed with a couple economics courses, I felt that I was ready to take on the world of sports analytics. Learning data science seemed like the next practical step. At a high level, the field seemed simple, you just needed to find trends in data and make insights. Simple right?

Needless to say, I was completely wrong. After starting to learn programming, I was already hopelessly lost. I started to put together all of the different concepts that I needed to learn and the task went from easy to extremely daunting.

In order to move from Stage 1 to Stage 2, you need to develop an understanding of what data science is and what programming and math concepts are needed for it. I recommend watching YouTube videos and dabbling in some python programming to reach this understanding.

Once you start feeling overwhelmed by how big the worlds of programming and math are, you have reached Stage 2. This is a stage where many people get stuck or even quit. Here, you learn how deep the data science well goes. There is so much to learn, and it is not clear where to start.

The key to getting through this stage is breaking data science into small steps. You also just need to get started somewhere. At the most basic level, for data science, you need to know some programming (Python or R) and simple statistics. Find some places online to get introduced to these fields. I recommend taking free programming or data science course (kaggle.com micro courses are my favorite).

What reduced my overwhelmed feeling during this stage was thinking about why I wanted to learn this field. I had a specific project in mind. I wanted to build a model that would improve the outcomes of the daily fantasy sports that I was playing. If I only focused on the necessary skills to build that, I could make data science seem much smaller and more manageable.

To get to the next stage, I recommend honing in on a specific problem to make your learning criteria smaller. It is easier to understand what is needed to complete a single project compared to learning the whole field of data science. Doing a small project is also less intimidating than “learning python” or “learning statistics” because these concepts are so broad and vague. In the early stages, you really only need to know the basics of these two fields to be able to practice data science. If you knock out a few of these small projects, you will be well on your way.

I also recommend reviewing other people’s code on kaggle. You likely won’t understand it at all, so don’t panic. Over time, it will start to make sense, and just seeing code a lot is a good start to to this process. Make a list of the terms, packages, and algorithms that you see but don’t understand. Each day, research a few of these concepts and try to make sense of them. You will be shocked at how far you have come in a few weeks.

At this point, you have done a few different projects where you have learned how to implement specific algorithms. You now have some code that you can reference!

When I was at this stage, I collected all of the code snippets that I used regularly and put them into one master document. Instead of trying to remember how to do everything, I could just reference this Frankenstein document. This made it possible to go through significantly more projects at a faster clip. It may feel like cheating, but I think that it makes more sense to focus on implementation over syntax at this point. You should do as many projects as you can where you apply all of the different algorithms that you found in your research.

After applying these concepts, you should begin to understand how they work as well. During this stage, I had started grad school. In my courses, we were required to code most of these algorithms from scratch. I think that this is a good practice for all data scientists. While I have some PTSD from coding a neural net in matlab, it was definitely worthwhile. From coding the algorithms, you begin to understand the inputs, constraints, and limitations of the various techniques.

There is no secret to get from Stage 3 to Stage 4. What gets you past the threshold is reps, constant practice.

When you reach Stage 4, you know what to do when confronted with a problem. You don’t have to reference your monster of a code snippet library anymore, and you can start focusing on optimizing your problem solving approaches. Here, you blend subject area expertise and business intuition with your work to create the best possible solutions to your challenges.

I believe that this is where the true art of data science begins to take place. Instead of focusing on how to solve a problem, you are looking to create an elegant and sustainable solution. You spend more time with feature engineering, model tuning, and putting projects into production. You also work more closely with business stakeholders to make sure that the service you provide is having maximal impact.

I believe that I still fluctuate between Stage 3 and Stage 4 quite frequently. Hopefully I will feel this state of “flow” in my role more often going forward.

At this level, you have come close to reaching mastery at some area of the field. In data science, I don’t really think someone can master the whole discipline. You are able to push the boundaries by discovering new algorithms or novel approaches to solving problems.

This stage is illusive, and I think that very few people have reached this summit. I would argue that most of the people that fall into this category gravitate towards academia and are more focused on research than business implementation.

I hope that this framework will help you evaluate your own data science learning journey in a new light. I also hope that it gives you a solid roadmap to level up your data science knowledge. From my experience, learning data science is a long process but an enjoyable one.

Contact Cynthia

I'm always connected!
How can I help?