Agility Meets AI: Hands-On with Monte Carlo Simulation
Let's check out an exploration of the Monte Carlo Simulation, one of the amazing concepts in Artificial Intelligence.
Join the DZone community and get the full member experience.Join For Free
After my article, “Role of Project Manager in Data Science”, a couple of program managers suggested me to elaborate the use case on meeting release commitments. We are going to explore simulation, one of the amazing concepts in Artificial Intelligence. Quantitative analytic techniques, such as the Monte Carlo simulation, helps program managers in decision making through probabilistic distributions of potential outcomes.
Monte Carlo relies heavily on the randomness of key variables in solving the problem. Along with key parameters, we also need to understand the relationship between them and sufficient data to analyze further. The five steps listed in “Forecasting the future: Let’s rewind to the basics” are essential to building an accurate model.
For a deep dive into the Monte Carlo simulation in program management, this link may be helpful.
Let's do a hands-on experiment with a case study most of us are familiar with, using MS Excel.
Example Case Study
Shubham is the CEO of company XYZ. After release planning, his team committed to delivering key functionalities to the customer. Mohit, his program manager, has identified challenges in meeting the target delivery date based on risks and burndown summary that he has been tracking:
Mohit: Boss, we may not be able to deliver all the committed items as per our initial plan.
Shubham: That’s not a great news Mohit. Can you elaborate more on this?
Mohit: Since the team is working on prioritized requirements, we are on track to deliver the high priority items, but we have challenges in completing our remaining work based on the velocity we have burnt so far and the changes to our planned scope.
Shubham: Okay, what do you suggest?
Mohit: Let us split the deliverables into two phases. All the high priority items required for the customer’s trade show will be delivered as per plan. We need to replan for the remaining work.
Shubham: Sounds like a plan. I, along with the release management team, will discuss this with our customers. Though the remaining items are relatively low in priority, we still need to provide a date that we can stick to. Can you provide your recommendation with a solid plan? Let us ensure this exercise doesn’t impact our team.
Mohit: Yes boss!
The solid plan means a higher probability of meeting the commitment for the remaining work. Let us help Mohit here with step-by-step guidelines on Monte Carlo simulation. Mohit’s objective is to find optimistic and pessimistic options based on the historical velocity with associated randomness.
This spreadsheet has the summary of the below steps.
Step 1: Identifying the Seeding Function
In our scenario, since we know our lowest and highest velocity, and we can come up with something like MIN (Actual Velocity in last 3 sprints)+RAND()*(MAX(Actual Velocity in last 3 sprints)-MIN (Actual Velocity in last 3 sprints))
We can choose any function (such as adding risk or scope parameters), but for simplicity, I chose this as velocity typically considers effort, complexity, and uncertainty involved during sizing.
Step 2: Setting up the Trials
The industry standard suggests at least 10000 runs for Monte Carlo simulation. Since we are doing it in excel anyway, we can target 15000 runs (or more). Setup a column for Trials numbering from 1 to 15000.
Step 3: Randomizing the Runs
Setup another column for Velocity with the first run as the seeding function (as described in Step 1). We now have two columns with 15000 rows where the first column is filled with runs and second column is filled with just the value for the first run. To randomize all runs, we need to follow the below:
- Select the cells that hold the value “1” (under “Trial” column) and the seeding value (under “Velocity” column)
- Press Ctrl + Shift + Down Arrow Key (we now have 2 rows x 15000 columns selected)
- Select Data -> What-If Analysis -> Data Table
- Click on Column Input Cell entry and select any cell from our spreadsheet that doesn’t hold any value
- Bingo… the 15000 cells under column “Velocity” hold the randomized runs
- Click on File -> Options -> Formulas -> Workbook calculation -> “Automatic” if you want to refresh the data frequently.
Step 4: Plotting Histogram
- To plot the frequency distribution, we first need to bin (bucketizing)
- Let’s bin velocity starting from 400 till 625 with increment of 25
- Click on Data -> Analysis -> Data Analysis
- If you don’t find this, you need to add this through File -> Options -> Add-ins -> Manage -> Excel Add-ins -> ‘Go’ -> Enable ‘Analysis ToolPak’
- Click Histogram -> Input Range (Col 1 to Col 15000 that holds random value of "Velocity"), Bin Range (Col 1 to Coln that holds the bin value of 400, 425, …625). You can check other checkboxes. You will get bin summary and histogram as below:
Now, Mohit can provide various options to his CEO – optimistic (maybe 500 story points velocity at 33%) or pessimistic (maybe 575 story points velocity at 83%). Then based on remaining effort, Mohit can figure out how many additional sprints we need.
While this problem sounds straightforward, to calculate the number of sprints required based on historical velocity, Monte Carlo builds provide a better range of possible outcomes with associated probabilities, especially when the seeding function is set right as discussed above.
To summarize, the Monte Carlo simulation is effective when combined with domain knowledge, identifying parameters that influence the decisions, collecting associated data, and setting up a proper experiment with a good amount of trials. Like any other model, this needs a constant feedback mechanism to improve prediction accuracy.
Opinions expressed by DZone contributors are their own.