Hands on Career Path Modelling Using Markov Chain, with Python
Professionally speaking, I'm a very weird guy: I work as a Software/Machine Learning Engineer in a startup, I have a Master's Degree in Physics and I'm about to defend my dissertation for my PhD in Aerospace and Mechanical Engineering. During my ever-changing career, two things stayed the same: my love for science and my passion for coding.
A beautiful way to mix science and coding is by doing modeling. What I mean by that is that, in order to describe the world, you make a reasonable assumption based on some degree of approximation of reality. Based on this assumption and on your starting approximation, we can simulate a given process. The simulation will give us some results that stem from the original assumptions but that weren't exactly predictable before the simulation itself.
For example, let's say that we are trying to figure out how many cows can fit in a fence. A pretty bizarre assumption that a physicist would do is the following:
"Let's consider a squared-shaped cow"
Meaning that we approximate the shape of a cow to be the one of a square. Then we approximate the fence to be a bigger square (this assumption is far more reasonable).
Now, let's add a little bit of salt to the problem, and let's consider that the cow can have a stochastic length, meaning that the size of the square is not the same for every cow and that the distribution of the length of each square is the result of a gaussian distribution, with mean = 2 and variance = 1. Let's assume the fence to be large L = 10.
To solve this, we have to do a very simple monte carlo simulation. We keep adding square-shaped cows inside the 10×10 squared fence and see whenever we have problems cause we can't keep adding them in. We repeat this process N=1000 times and we extract the average of the number of cows.
Now, it is obvious that this is a pretty rough way of computing how many cows can get in a fence, and it is actually a famous joke in the Physics community. Nonetheless, it perfectly describes the approach of modeling. Now I love the heck of Artificial Intelligence. I study and do AI every day for a living, and I've been doing that since 2019. One thing that I do disagree with about AI though is the idea that thanks to AI we don't need to model anymore. We can just train a Machine Learning model instead.This is wrong (and a little sad). We need modeling as modeling is the foundation for understanding the world around us.
Hopefully, through some squared-shaped cows and compassion, I convinced you that modeling is cool.
In this blog post, I will use a very powerful modeling tool, named Markov Chains, to simulate the career development of an individual. We will start with a person who has just graduated, and we will give some probabilistic assumptions about what they will do in the future. By plugging these assumptions into a Markov Chain model, we will have a distribution of the career outcomes of individuals. Let's start!
0. Markov Chains introduction
If some of you guys have already read some of my articles (thank you, I love you, you are the best ❤), you already know that I love Markov Chains. If you are familiar with Markov Chains, you can safely skip this part.
If you aren't familiar with Markov Chains, in this article, I talk about them in general terms, and in my humble opinion, it is a good guide for someone who has never heard of Markov Chains before and wants to get the gist of it.
We don't need to know a lot of Markov Chains for this blog post, to be honest, as it is pretty code intensive but not very theory intensive, so I will just very quickly describe how this works. For brevity, I will describe this exactly with the example of career development.
Let's consider a stochastic event E. This event can happen at time t=0, t=1, t=n. At each one of these times (1 to n), the event can assume different values. For example, after t=2 (meaning 2 years), the individual might be studying for his bachelor's. We write this in mathematical terms with the expression:

Now, what is the probability that a person, after 5 years, is studying given that at year = 2 the person will be studying for his bachelor's degree? The Markov Chain assumption tells us that, in order to know that, it is only necessary to know what happened the year before, for example, the following expression:

tells us that if a person is studying for their bachelor's degree during their first year, the probability that they will keep studying for their bachelor's will be pretty high (0.8).
We can do the same after years 3, 4, 5 by only considering the event at years 2, 3 and 4. For this reason, we call it a chain.
Now, this is not particularly interesting, but we can ask ourselves: What is the probability that after 10 years the person will be employed? This will be the sum of all the other events (person with bachelor's, person with master's, person without a bachelor's, etc.) together (summed) at time t=10.
This will be the core of our simulation.
1. Our Simulation Assumptions
For this case study, I created a list of possible career choices, considering the starting point of a person who just graduated from High School.
This person can:
- Don't go to college at all
- Go to College
If they go to college they can:
- Go to an Ivy League University,
- Go to a State university,
- Go to a community college.
For each one of them, they can go to:
- STEM
- Business
- Humanities
After that, they can either get an internship or not get an internship. Either way, after that, they can be employed or unemployed. This is the path and this is how it looks:

Now, each one of the black circles (and the final red square) is equipped with a probability level, that is the probability, given the previous state, that we get to the given state. For example, Ivy League might be equipped with probability = 0.2, meaning that, given that we went to college (previous node) we are also going to an Ivy League University. In the image, I omitted the study major (Humanitarian, STEM, or Business) for brevity.
Each black circle is also equipped with a duration. The duration of "Don't go to college" is 0, because it is an instantaneous decision to look for a job/internship, while the duration of "Ivy League" is 4. The red square has a "salary", that, as it is in real life, changes based on the previous node. For example, a person without any degree would make less money (on average) than an Ivy League student with an internship. Of course, the salary of "Unemployed" is 0.
IMPORTANT TO NOTICE!!! These are my simulation assumptions. Feel absolutely free to change them based on what you want to assume is more relevant to your specific case.
Now, what can we do with this stuff? A lot of things. But in a few words, we can run the simulation in our Python system and analyze the response. This is known as Monte Carlo approach.
2. Monte Carlo simulations
Everything looks nice and all, but we need to have data digitally, not just words. The first thing is to find a format for the simulation data. There are many ways to do that: define a binary tree, define a graph, define a table, …
What I did, for simplicity, was to store this data in a .json file. I did this because of the simplicity of navigating through a json file, that becomes a dictionary in the Python variables.
This is the .json file:
{
"HighSchoolGraduate": {
"GoToCollege": {
"probability": 0.7,
"duration": 0,
"IvyLeague": {
"probability": 0.1,
"duration": 4,
"STEM": {
"probability": 0.4,
"WithInternship": {
"probability": 0.7,
"duration": 0,
"Employed": {
"probability": 0.99,
"AvgSalary": 120000
},
"Unemployed": {
"probability": 0.01
}
},
"WithoutInternship": {
"probability": 0.3,
"duration": 0,
"Employed": {
"probability": 0.9,
"AvgSalary": 110000
},
"Unemployed": {
"probability": 0.1
}
}
},
"Humanities": {
"probability": 0.3,
"WithInternship": {
"probability": 0.7,
"duration": 0,
"Employed": {
"probability": 0.97,
"AvgSalary": 90000
},
"Unemployed": {
"probability": 0.03
}
},
"WithoutInternship": {
"probability": 0.3,
"duration": 0,
"Employed": {
"probability": 0.85,
"AvgSalary": 80000
},
"Unemployed": {
"probability": 0.15
}
}
},
"Business": {
"probability": 0.3,
"WithInternship": {
"probability": 0.7,
"duration": 0,
"Employed": {
"probability": 0.98,
"AvgSalary": 100000
},
"Unemployed": {
"probability": 0.02
}
},
"WithoutInternship": {
"probability": 0.3,
"duration": 0,
"Employed": {
"probability": 0.88,
"AvgSalary": 90000
},
"Unemployed": {
"probability": 0.12
}
}
}
},
"StateUniversity": {
"probability": 0.5,
"duration": 4,
"STEM": {
"probability": 0.4,
"WithInternship": {
"probability": 0.7,
"duration": 0,
"Employed": {
"probability": 0.97,
"AvgSalary": 80000
},
"Unemployed": {
"probability": 0.03
}
},
"WithoutInternship": {
"probability": 0.3,
"duration": 0,
"Employed": {
"probability": 0.9,
"AvgSalary": 70000
},
"Unemployed": {
"probability": 0.1
}
}
},
"Humanities": {
"probability": 0.3,
"WithInternship": {
"probability": 0.7,
"duration": 0,
"Employed": {
"probability": 0.95,
"AvgSalary": 60000
},
"Unemployed": {
"probability": 0.05
}
},
"WithoutInternship": {
"probability": 0.3,
"duration": 0,
"Employed": {
"probability": 0.85,
"AvgSalary": 50000
},
"Unemployed": {
"probability": 0.15
}
}
},
"Business": {
"probability": 0.3,
"WithInternship": {
"probability": 0.7,
"duration": 0,
"Employed": {
"probability": 0.96,
"AvgSalary": 75000
},
"Unemployed": {
"probability": 0.04
}
},
"WithoutInternship": {
"probability": 0.3,
"duration": 0,
"Employed": {
"probability": 0.87,
"AvgSalary": 65000
},
"Unemployed": {
"probability": 0.13
}
}
}
},
"CommunityCollege": {
"probability": 0.4,
"duration": 2,
"Employed": {
"probability": 0.92,
"AvgSalary": 50000
},
"Unemployed": {
"probability": 0.08
}
}
},
"DoNotGoToCollege": {
"probability": 0.3,
"duration": 0,
"Employed": {
"probability": 0.85,
"AvgSalary": 35000
},
"Unemployed": {
"probability": 0.15
}
}
}
}
Now you might ask… did I get these numbers from a real database? Absolutely not. These numbers are completely arbitrary, you can do your own research/experiment. In my assumptions, I made sure that the probability sums to 1 for each level of the path (this is mandatory) and that the longest duration of education corresponds to the longest Average Salary, meaning that if you stay longer you will make more money, on average, when employed.
Save this file with a name you like. With a lot of creativity, I chose the name:
_career_decision_tree.json_
But you can choose the name you like. Just don't forget it.