Friday, November 11, 2016

The Polls Were Wrong, But Not As Wrong As You Think

Were the polls wrong? To some extent they were. What was worse though was our understanding of the polls. We took them at face value. Whether it was the media, the pollsters or the public in general, we talked about the polls in a deterministic way. Nate Silver's website, on the other hand, presented its results in a probabilistic manner. If we look at those results, things start to make a lot more sense. Yes, the underlying polls were inaccurate, but Silver adjusts for this to some extent.

As we have discussed before, the quality and accuracy of projections coming out of Monte Carlo depend squarely on the model being used as input for the projections. Silver tries to make the model a better one by adjusting the poll results. I am not sure what the Nate Silver secret sauce is, but my guess would be it involves looking at past performance of the particular poll. Also, Silver looks at a multitude of polls and combines their results. This results in a model that is not influenced greatly by one incorrect poll. It is most likely, a weighted combination where historically better performing polls have more of an impact than the more inaccurate polls. A great description of how the adjustments are done to the polls is available on Nate Silver's website - http://fivethirtyeight.com/features/a-users-guide-to-fivethirtyeights-2016-general-election-forecast

Silver then runs the simulations based on these adjusted polls. After running 20,000 simulations using a thoroughly well-adjusted model, Silver's team is now able to project probabilistically, what are the chances of each candidate winning the election. This was what the forecast looked like on the eve of election day.


What the above result means is that more than 1 out of 4 simulations resulted in Trump winning. Hillary had about a 70% chance of winning the election. Yes, that is a greater probability than what trump had, but it is around the same probability as that of rolling a 3 or more on an unloaded, 6 sided die. It is by no means a slam dunk.

...the probability of an attempted dunk being successful in the NBA is about 93%. That is more than 20% greater than the chances Hillary Clinton had of winning the election.

This is where our understanding of the polls was wrong. Yes, the polls themselves were not very accurate, but our understanding of them was even more flawed. Let us think about it in terms of a slam dunk. According to http://www.basketball-reference.com/ the probability of an attempted dunk being successful in the NBA is about 93%. That is more than 20% greater than the chances Hillary Clinton had of winning the election.  Along the same vein, Hillary's chances were also lower than NBA's overall free throw percentage - 76.5% .

Silver's models also predicted which states are likely to swing the elections. Notice the "Blue Wall" states being high on the list. Pennsylvania, Michigan, and Wisconsin are 2nd 3rd and 7th on the list - 
Both the most consequential and the least consequential events in our life are not deterministic. We live in a probabilistic world and we need to stop thinking deterministically about how things are going to turn out. Yes, the polls were inaccurate, especially in Michigan and Wisconsin, but they still provided us with enough of a trend as to say that this was not a sure thing. There was better than 1 out 4 chance of Trump moving into the White House. 

Thursday, November 3, 2016

What Do NFL Quarterbacks Have To Do With Software Teams?



In the last post, we attempted to predict the number of passing yards Aaron Rodgers will accumulate in the 2016 season. We reached some interesting conclusions based on the results of the Monte Carlo simulations that we ran. What does any of this have to do with Software Development teams and projects?  More than you would think. Coaches, agents, teams and the players themselves are interested in two things above all - Productivity and Predictability. Sounds familiar? Assuming we are maintaining quality, software team manager and directors are usually trying to increase both the productivity and predictability of their teams.

The Monte Carlo simulations for 16 games (1 season) for Aaron Rodgers can give us clues to both his productivity and predictability.  Let us take a look at what Rodgers' numbers would look like if we were to predict 16 games today.

15% Certainty4667 yds
30% Certainty4541 yds
50% Certainty4396 yds
70% Certainty4236 yds
80% Certainty4155 yds
85% Certainty4101 yds
The magnitude of these numbers, i.e. the general range of them, gives us an idea of the productivity of the quarterback. Let us take the middle number, the 50% Certainty number(the median) as an indication of this.  We can compare the median prediction for Rodgers (4396 yards) to that of other QBs to get an idea of the level of production we can expect from them in a season. The spread of these numbers, i.e. the difference between the 15% certainty and 85% certainty (566 yards for Aaron Rodgers) gives us an idea of the predictability or consistency of the quarterback. The lower the spread the more consistent the QB. QBs with higher spreads are less predictable as the answer for how many yards they would throw for changes greatly at different levels of certainty.

Now if you replace QBs with software teams and yards with stories, the same interpretations would hold true. The general magnitude of these numbers, represented by the median would be an indication of the productivity of the team. The spread of these numbers, represented by the difference between 15% certainty and 85% certainty number would be a representation for the predictability of the team. Just as in the case of Quarterbacks, the software teams would want the median to be high, to represent higher productivity. Teams would also want the spread to be low, in order to represent greater predictability. Running Monte Carlo simulations on Story throughput as described here, can get us these numbers for teams.

Let us take a look at what these numbers look like for 10 of the modern era quarterbacks. We have 9 currently active quarterbacks and Peyton Manning included in this dataset. We have run Monte Carlo simulations on the data from previous games(going as far back as 2011), for these quarterbacks. The graphs below show the median and spread for the results.





Once we have these numbers, we might be able to come up with a simple metric that helps us identify the kind of Quarterback we need or the team that would be best suited for a project. Very frequently one of these numbers comes at the cost of the other. Usually, the higher the median gets, the more spread is the distribution. The ideal team will have a very high median and a very small distribution. That gives us some hints towards how we should construct this metric. The median should probably be the numerator so that the metric increases as the median increases and the spread should be the denominator so that it has the opposite effect. This gives us a very simple metric - Median/Spread. that might not be enough, some teams might value higher predictability and others might value higher productivity. We can use exponents to get a greater emphasis on one part of the metric over the other. Furthermore, we can scale the metric to have the highest Quarterback being compared in the set to always be a 100 and others fall in behind the Quarterback.

Let us start with the simplest case where the two properties - productivity and predictability are weighted equally. Let us see what our formula for Quarterback rating tells us about our chosen quarterbacks.

Drew Brees
100.00
Peyton Manning
91.86
Aaron Rodgers
81.73
Russell Wilson
77.75
Tom Brady
77.58
Jay Cutler
75.46
Andrew Luck
72.42
Ben Roethlisberger
70.68
Ryan Tannehill
67.33
Eli Manning
66.40
What the table above tells us is that if we value productivity and predictability in equal amounts, Drew Brees would be our top pick. Most of the table looks like it is giving us expected results. There is one exception, Tom Brady seems to be lagging behind Rodgers and Wilson. That runs counter to our understanding of the football world. Upon closer inspection, we see that while Brady is more productive than both Wilson and Rodgers, both of the higher rated (Wilson is barely ahead) are more consistent and predictable.

Now, what if we gave productivity more weight, and pay a little less attention to consistency. Let us give productivity 25% more weight and see what it does to our ratings.

Drew Brees
100.00
Peyton Manning
90.30
Aaron Rodgers
78.31
Tom Brady
75.48
Russell Wilson
70.56
Jay Cutler
69.98
Andrew Luck
69.28
Ben Roethlisberger
68.50
Eli Manning
62.81
Ryan Tannehill
62.49
Drew Brees and Peyton Manning separate themselves from the pack once again. There seems to be a little more order to the world with Brady jumping ahead of Wilson. The "also-rans" do not change their order as much except for the very bottom of the table.

Running multiple combinations of these numbers, the top two remain almost constant. Regardless of how much weight we put on the two components(productivity and predictability), Drew Brees seems to beat out the competition in every case. Peyton Manning seems to always come in right behind Brees. Russell Wilson, with the lowest "spread" in his predictions, moves further up the chart the more we rely on consistency. Tom Brady moves further up the more importance we give to productivity. Neither of them catches up to Peyton unless we say that predictability is more than twice as important as productivity.

We need to answer the same questions in a software development context. What matters more to us- Productivity or Predictability? Usually, the answer is both. Unless we are careful though, one if them can hurt the other. We have to use them as balancing metrics. I do not have enough data to say if that balancing metric can be used to compare teams. We can definitely use the balancing metric to see if the same team(or QB) is improving in the direction we expect them to go. We can look at the predictions for our teams regularly and figure out these numbers make a quick determination of are we become more predictable or productive or both.

Another note on this. Brees tops the table every time, Brady floats all over the place, and Eli Manning is almost always in the bottom three. Meanwhile, Brady has 4 Super Bowl rings, Eli has 2 and Brees has 1. Productivity and predictability ar not the only tools to success. They are keys to making successful plans, but there are other variables in the equation. A great defense and a good running game are also needed to win championships. Similarly, for our software teams, regardless of how predictable and productive they are, working on the right things and producing quality products are imperative in order to achieve success.