Friday, April 22, 2016

Probability, Predictions and Peyton

Teams, managers, product owners are often asked to predict the future. What is more surprising than that is that we actually attempt to predict the future. We say things like "We will have 40 story points done in 4 weeks" or "we will have 15 issues resolved by the end of this month".  We look at our past numbers, figure out some math that we believe would be the most accurate predictor and respond with the numerical value that we "know" to be an accurate forecast.  The single number forecast is the equivalent of knowing exactly what the Dow Jones index would be at the end of the month. There is a basic flaw in our assumption that we can accurately figure out the single value that represents the future in our context.

Our minds, due to years of training and education have become deterministic thinkers. Most developers and engineers have the basis of their education in mathematics. The code we write and the tests we perform on a daily basis are in essence binary, they either work or they do not. It is when we take a broader look at the world around us that we realize that everything is probabilistic. There are numerous questions that need to be answered with degrees of certainty. The fact that a coin will come up heads can be said with a 50% certainty. A six sided dice rolling a 3 is 16.66% certainty. Only thing we have control over is the fact that it will roll something between 1 and 6, but beyond that it is a game of probability. Let us say if you hit all the green lights and there is no traffic, it takes you 15 minutes to get to work. Let us also say that having timed you over the past 20 days, the average is 20 minutes. How long will it take you to get to work tomorrow? What if we rephrase that question, with a 90% confidence, how long will it take you to get to work tomorrow? The addition of the probability does not only make the question easier to answer (more on that later), but also acknowledges the reality that the answer cannot be deterministic.



One of my favourite examples that demonstrate the need for probabilistic predictions is the following question - How many yards is Peyton Manning going to throw for in the next game? The question would make more sense if he was not retired, but Manning gives us a great dataset to work with. If Manning's coach asked him for an exact number of yards he would throw for, he would probably have a hard time coming up with a number. What if the question was phrased as - At least how many yards are you going to throw for with an 80% confidence? Or what is the least number of yards you can throw for with a 50% confidence? Those might be easier questions to answer if Manning knows his past performances and has a decent idea of the next opponent he is going to face. In fact he might be able to answer that question, knowing his past performances and his team, without knowing his opponent, just with a different number that represents his overall degree of confidence.

What is interesting about the example is that Peyton Manning is(was) one of the most consistent and reliable Quarterbacks in the business. He might even, at the time, be playing in the most efficient systems in the business, with all the right strategies and formations to ensure success. He cannot, even in that case give you the single number answer. He could give you a range answer, for example, more than 200 yards, but if you dig in further and ask him his level of confidence on it, it would probably be something like 80%.  There are too many variables in the game to allow him to answer the question with a single number. or even with a range with a 100% confidence. The variables, like the defense he is facing, the injuries to his own team, the current form of players, the weather, and others all effect the outcome. A few of these are determinable and fixed before the game starts, but many change through the course of the game. The same is true for any process that we run. Development processes, whether waterfall, scrum, kanban, or of any other type, have too many variables to be able to result in a deterministic prediction. Just as with most things in the world around us, randomness is inherent in our processes. Burndown charts, throughput numbers, velocity etc. are representation averages that we use to predict the future, which at best can be used at a 50% confidence.


The problem with predictions based on averages is exactly that, they are likely to succeed only an average number of times. Once we have accepted that every prediction requires a probability component, which represents your confidence level, we can look at better ways to answer the questions about what the future holds. We can use our past patterns to determine how likely we are to accomplish a certain task. Based on stats (http://espn.go.com/nfl/player/gamelog/_/id/1428/peyton-manning) from the last 3 years (2013, 2014, 2015), without knowing the opponent, Peyton Manning should have a 95% confidence of throwing for 150 yards or more. In the 42 regular season games that he has played over that period, he has missed that mark only twice. The likelihood of him missing that mark again is about 5%.  

Software development processes have as many variables as the ones that the quarterback faces. These could be brought under control to a good extent, but the randomness and the variability in the process is native. Last minute PTO, unknown complexity, issues from the field, undiscovered requirements, people quitting etc., all show up at seemingly random times. We cannot rely on making plans and predicting the future without taking the random nature of our world into account. Our confidence might shift based on which of the factors we have been able to control and how much of the randomness we have been able to minimize, but there will always be variables that stop us from making deterministic predictions.

In order to be successful at making predictions about the future, we have to stop thinking in binary pass/fail systems. The adoption of not just probabilistic thinking but also making predictions at varying degrees of confidence is what will lead us weigh our confidence against the level of risk we are willing to take. More on forecasting techniques and risk profiles in another post. For now, what we can learn from stats for one of the most consistent and successful Quarterbacks in recent history is that predictions cannot be absolute and have to take into account the probability of success. Which of those probabilities are we comfortable relying on for our plans, is a separate question.