The way we express our estimates color both the way we think about the thing being estimated and the way our estimates are heard.
Quite often, when asked for an estimate, people will give a single value. “How long will this take?” “Three weeks.” If the level of trust is sufficient, and the points of reference are sufficiently aligned, then a single value may be fine. Expressing an estimated value as a single value leaves a lot unsaid, though. How confident are we about that value? If we’re wrong, what might the actual be?
We often find ourselves in trouble when one person gives an estimated value with one set of assumptions, and another person hears it with another set of assumptions. I might stress that my estimate of 2 days for a task is an optimistic one, assuming that no unforeseen problems crop up. You might hear it as an expression of the most likely value, with a small range of error around it. When the communications go further than from one person to the next, the likelihood of misinterpreting the context of assumptions goes way up. If you now communicate my estimate to your boss, they may interpret it as a commitment to that duration. The caveats and warnings tend to fade but the number is remembered.
Over/Under a value
When I learned orienteering in Boy Scouts, one of the lessons was, when finding your way out of the woods by compass, make sure you err to one side. That way, when you reached a road, you’d know which way to turn. If you tried to go directly to your intended destination, you could be on either side when you hit the road.
You can use a similar philosophy when using estimates. Estimating “not more than” or “not less than” can often give enough information to know which way to turn using single point estimation.
Single value with error bars
For some situations, estimating a minimum or maximum value may not be as helpful as a “most likely” value. We can indicate the confidence we have in this value with error bars. Those error bars might be specific minimum and maximum values, though that means we’ve got three separate values to estimate. I’m more likely to use percentages. I’ve often estimated the time required for vaguely described tasks with error bars of -50% and +100%. This seems unbalanced to some people, but looks more reasonable on a logarithmic scale. It also fits with the epidemic over-optimism of software development estimation.
If you’re going to estimate the minimum and maximum possibilities, you might find that the estimated range is enough and you don’t need a most likely value. Beware how you communicate a range estimate, however. It’s all too frequent that the recipient hears just the endpoint they’d prefer. I’ve seen managers complain when a task estimated at 2 to 4 week comes in at 3 weeks, “but you said it could be done in 2 weeks.”
With a confidence level
You can express uncertainty of either a point estimate or a range estimate with a confidence level. “I’m 95% confident that this task can be completed within 2 weeks.” “It’s 95% sure that this work will take from 1 to 3 weeks.”
Sometimes a high confidence level is not what you’d prefer. When estimating user stories to determine how much work fits into a Scrum sprint, for example, using high-confidence estimates will result in taking on less work than you otherwise might attempt. Then, in accordance with Parkinson’s Law, the work expands to fill the allotted time. In such a situation, I recommend estimating at the 50% confidence level. I tell people that they should expect, in the long haul, to run short on time and run short on work in roughly equal amounts.
I don’t know how to estimate confidence levels with a precision that I would trust. I do know that estimating with mathematical models can give you a calculated confidence level. You can then use that confidence level to make decisions with surgical precision based on the odds of the estimation coming true. All of this is still, however, based on the fitness of the mathematical model and the accuracy of the input data.
Confidence levels are easily misunderstood by most people. I’ve often heard people complain that the weather report is inaccurate because it forecast a 30% chance of rain, and it didn’t rain.
Going further than a confidence level, specifying a probability distribution for our estimate gives us precision at any desired confidence level. If it’s a Gaussian distribution, for instance, we can know their is a 50% chance of the actual being the median value or below. There’s a 97.5% chance of the actual being at or below two standard deviations above the mean.
Of course, the same caveats about fitness of the mathematical models and the accuracy of the input data apply to estimates expressed as probability functions. Empirical data shows that the probability distribution of completion time for a software development project is unlikely to be symmetrical, and therefore the Baussian distribution is a poor model. This leads people to use a Weibull distribution, instead, to model the skew that development time is more likely longer than shorter compared to the median time. The Weibull shape parameter, β, has a dramatic effect on the shape of the distribution, and therefore the fitness.
Unitless measures relative to other estimates
In reaction to the all-too-frequent problems that arise from giving estimates in units of time, sometimes people switch to unitless measures for their estimates. This allows relative estimation between comparable tasks. “This task and that one are about the same size.” Unitless estimation allows the estimate to “float” in absolute terms and be calibrated by past experience. “If we got about 42 units of work done last week, we expect to accomplish about 42 units of work this week.” If the speed of getting work done changes, then only the expectation needs to change; the work does not need to be re-estimated.
The most common unitless measure of the day seems to be story points. These are so-called because they represent the “size” of a functional slice of work called a user story. They’re often restricted to a “modified Fibonacci series” to represent the reduced accuracy of estimation as the size of the story increases.
The human mind is a subtle and wondrous thing. It seems to become better at estimating the size of one story against another when freed from the tyranny of measurements of time. I’ve noticed that teams starting with a completely arbitrary size reference (“Let’s call this fairly small story a ‘2’ and estimate the others relative to it”) seem to achieve consistent sizing more rapidly than those who start with a time-related reference (“Let’s call a story a ‘2’ if it will take about 2 uninterrupted ‘ideal’ hours”).
Sadly, it’s often too tempting for someone, particularly someone with sufficient power in the organization, to calibrate story points back to units of time. This generally negates the advantage of using a unitless measure, as people start thinking in terms of hours instead of points. It’s even worse when the calibration is made to “ideal” hours, as someone will surely complain “Why do the programmers only accomplish 5 hours of story points a day?” <sigh>
Story points are still expressed as numbers, and that tempts people to do more mathematical calculations than their accuracy and precision will reasonably support. The next level of estimate abstraction is to eliminate the numbers. People use many things to stand in for the numbers. One of the most creative is animals. Perhaps the most common is T-shirt sizes, XS, S, M, L, XL. While you can assume that a 10-point story is twice the size of a 5-point, and five times that of a 2-point, you can’t assume such a relationship between a Small, Medium, and Large story. This is another means to free the mind from what it thinks it should know, and let it operate on the subconscious level that often gives better results.
What do you think?
Have I omitted some useful ways to express estimates? Or ways, useful or not, that people tend to use? Have I mischaracterized any of these? Please leave a comment with your thoughts.