Friday, November 11, 2016

The Polls Were Wrong, But Not As Wrong As You Think

Were the polls wrong? To some extent they were. What was worse though was our understanding of the polls. We took them at face value. Whether it was the media, the pollsters or the public in general, we talked about the polls in a deterministic way. Nate Silver's website, on the other hand, presented its results in a probabilistic manner. If we look at those results, things start to make a lot more sense. Yes, the underlying polls were inaccurate, but Silver adjusts for this to some extent.

As we have discussed before, the quality and accuracy of projections coming out of Monte Carlo depend squarely on the model being used as input for the projections. Silver tries to make the model a better one by adjusting the poll results. I am not sure what the Nate Silver secret sauce is, but my guess would be it involves looking at past performance of the particular poll. Also, Silver looks at a multitude of polls and combines their results. This results in a model that is not influenced greatly by one incorrect poll. It is most likely, a weighted combination where historically better performing polls have more of an impact than the more inaccurate polls. A great description of how the adjustments are done to the polls is available on Nate Silver's website - http://fivethirtyeight.com/features/a-users-guide-to-fivethirtyeights-2016-general-election-forecast

Silver then runs the simulations based on these adjusted polls. After running 20,000 simulations using a thoroughly well-adjusted model, Silver's team is now able to project probabilistically, what are the chances of each candidate winning the election. This was what the forecast looked like on the eve of election day.


What the above result means is that more than 1 out of 4 simulations resulted in Trump winning. Hillary had about a 70% chance of winning the election. Yes, that is a greater probability than what trump had, but it is around the same probability as that of rolling a 3 or more on an unloaded, 6 sided die. It is by no means a slam dunk.

...the probability of an attempted dunk being successful in the NBA is about 93%. That is more than 20% greater than the chances Hillary Clinton had of winning the election.

This is where our understanding of the polls was wrong. Yes, the polls themselves were not very accurate, but our understanding of them was even more flawed. Let us think about it in terms of a slam dunk. According to http://www.basketball-reference.com/ the probability of an attempted dunk being successful in the NBA is about 93%. That is more than 20% greater than the chances Hillary Clinton had of winning the election.  Along the same vein, Hillary's chances were also lower than NBA's overall free throw percentage - 76.5% .

Silver's models also predicted which states are likely to swing the elections. Notice the "Blue Wall" states being high on the list. Pennsylvania, Michigan, and Wisconsin are 2nd 3rd and 7th on the list - 
Both the most consequential and the least consequential events in our life are not deterministic. We live in a probabilistic world and we need to stop thinking deterministically about how things are going to turn out. Yes, the polls were inaccurate, especially in Michigan and Wisconsin, but they still provided us with enough of a trend as to say that this was not a sure thing. There was better than 1 out 4 chance of Trump moving into the White House. 

Thursday, November 3, 2016

What Do NFL Quarterbacks Have To Do With Software Teams?



In the last post, we attempted to predict the number of passing yards Aaron Rodgers will accumulate in the 2016 season. We reached some interesting conclusions based on the results of the Monte Carlo simulations that we ran. What does any of this have to do with Software Development teams and projects?  More than you would think. Coaches, agents, teams and the players themselves are interested in two things above all - Productivity and Predictability. Sounds familiar? Assuming we are maintaining quality, software team manager and directors are usually trying to increase both the productivity and predictability of their teams.

The Monte Carlo simulations for 16 games (1 season) for Aaron Rodgers can give us clues to both his productivity and predictability.  Let us take a look at what Rodgers' numbers would look like if we were to predict 16 games today.

15% Certainty4667 yds
30% Certainty4541 yds
50% Certainty4396 yds
70% Certainty4236 yds
80% Certainty4155 yds
85% Certainty4101 yds
The magnitude of these numbers, i.e. the general range of them, gives us an idea of the productivity of the quarterback. Let us take the middle number, the 50% Certainty number(the median) as an indication of this.  We can compare the median prediction for Rodgers (4396 yards) to that of other QBs to get an idea of the level of production we can expect from them in a season. The spread of these numbers, i.e. the difference between the 15% certainty and 85% certainty (566 yards for Aaron Rodgers) gives us an idea of the predictability or consistency of the quarterback. The lower the spread the more consistent the QB. QBs with higher spreads are less predictable as the answer for how many yards they would throw for changes greatly at different levels of certainty.

Now if you replace QBs with software teams and yards with stories, the same interpretations would hold true. The general magnitude of these numbers, represented by the median would be an indication of the productivity of the team. The spread of these numbers, represented by the difference between 15% certainty and 85% certainty number would be a representation for the predictability of the team. Just as in the case of Quarterbacks, the software teams would want the median to be high, to represent higher productivity. Teams would also want the spread to be low, in order to represent greater predictability. Running Monte Carlo simulations on Story throughput as described here, can get us these numbers for teams.

Let us take a look at what these numbers look like for 10 of the modern era quarterbacks. We have 9 currently active quarterbacks and Peyton Manning included in this dataset. We have run Monte Carlo simulations on the data from previous games(going as far back as 2011), for these quarterbacks. The graphs below show the median and spread for the results.





Once we have these numbers, we might be able to come up with a simple metric that helps us identify the kind of Quarterback we need or the team that would be best suited for a project. Very frequently one of these numbers comes at the cost of the other. Usually, the higher the median gets, the more spread is the distribution. The ideal team will have a very high median and a very small distribution. That gives us some hints towards how we should construct this metric. The median should probably be the numerator so that the metric increases as the median increases and the spread should be the denominator so that it has the opposite effect. This gives us a very simple metric - Median/Spread. that might not be enough, some teams might value higher predictability and others might value higher productivity. We can use exponents to get a greater emphasis on one part of the metric over the other. Furthermore, we can scale the metric to have the highest Quarterback being compared in the set to always be a 100 and others fall in behind the Quarterback.

Let us start with the simplest case where the two properties - productivity and predictability are weighted equally. Let us see what our formula for Quarterback rating tells us about our chosen quarterbacks.

Drew Brees
100.00
Peyton Manning
91.86
Aaron Rodgers
81.73
Russell Wilson
77.75
Tom Brady
77.58
Jay Cutler
75.46
Andrew Luck
72.42
Ben Roethlisberger
70.68
Ryan Tannehill
67.33
Eli Manning
66.40
What the table above tells us is that if we value productivity and predictability in equal amounts, Drew Brees would be our top pick. Most of the table looks like it is giving us expected results. There is one exception, Tom Brady seems to be lagging behind Rodgers and Wilson. That runs counter to our understanding of the football world. Upon closer inspection, we see that while Brady is more productive than both Wilson and Rodgers, both of the higher rated (Wilson is barely ahead) are more consistent and predictable.

Now, what if we gave productivity more weight, and pay a little less attention to consistency. Let us give productivity 25% more weight and see what it does to our ratings.

Drew Brees
100.00
Peyton Manning
90.30
Aaron Rodgers
78.31
Tom Brady
75.48
Russell Wilson
70.56
Jay Cutler
69.98
Andrew Luck
69.28
Ben Roethlisberger
68.50
Eli Manning
62.81
Ryan Tannehill
62.49
Drew Brees and Peyton Manning separate themselves from the pack once again. There seems to be a little more order to the world with Brady jumping ahead of Wilson. The "also-rans" do not change their order as much except for the very bottom of the table.

Running multiple combinations of these numbers, the top two remain almost constant. Regardless of how much weight we put on the two components(productivity and predictability), Drew Brees seems to beat out the competition in every case. Peyton Manning seems to always come in right behind Brees. Russell Wilson, with the lowest "spread" in his predictions, moves further up the chart the more we rely on consistency. Tom Brady moves further up the more importance we give to productivity. Neither of them catches up to Peyton unless we say that predictability is more than twice as important as productivity.

We need to answer the same questions in a software development context. What matters more to us- Productivity or Predictability? Usually, the answer is both. Unless we are careful though, one if them can hurt the other. We have to use them as balancing metrics. I do not have enough data to say if that balancing metric can be used to compare teams. We can definitely use the balancing metric to see if the same team(or QB) is improving in the direction we expect them to go. We can look at the predictions for our teams regularly and figure out these numbers make a quick determination of are we become more predictable or productive or both.

Another note on this. Brees tops the table every time, Brady floats all over the place, and Eli Manning is almost always in the bottom three. Meanwhile, Brady has 4 Super Bowl rings, Eli has 2 and Brees has 1. Productivity and predictability ar not the only tools to success. They are keys to making successful plans, but there are other variables in the equation. A great defense and a good running game are also needed to win championships. Similarly, for our software teams, regardless of how predictable and productive they are, working on the right things and producing quality products are imperative in order to achieve success.

Saturday, October 29, 2016

Passing Yards For Aaron Rodgers in The 2016 Season (Aka Monte Carlo Models for Aaron Rodgers)

In a post earlier this year, we tried to figure out how many passing yards Peyton Manning would put up if he were to return to football for one game. We answered this question using the techniques for probabilistic forecasting of single items. What if we were to try and use probabilistic forecasting techniques to figure out an entire season's worth of performances. We have already discussed in earlier posts that forecasting multiple items cannot be done by simply forecasting single items multiple times (well kind of yes and no...). This means we have to reach into our toolbox to retrieve our preferred technique for predicting multiple items - Monte Carlo simulations. We are also going to change our subject to a more relevant one. As we are almost midway through the current season, let us pick an active player and see what predictions we can make concerning his performances for the remainder of this season. We are going to pick Green Bay Packers' Quarterback Aaron Rodgers as our subject for these predictions. We will compare how well he has done in comparison to our predictions so far in the season and how well he will do in the remainder of the season. Answering how many yards a quarterback, especially one as prolific as Aaron Rodgers will throw for in a season can be a daunting task. As daunting as predicting the end date for a software project.




The quality and accuracy of projections coming out of Monte Carlo depend squarely on the model being used as input for the projections. The first decision we have to make is which past data points do we use as inputs to our Monte Carlo simulations. Aaron Rodgers was drafted in 2005 and took over as starting quarterback for the Packers in 2008. That means we can safely ignore all games he participated in before 2008. Also, clearly the team and the system under which Rodgers plays has changed quite a bit since 2008. For this reason, we can limit ourselves to the last 5 seasons. We are also going to exclude any games where Rodgers left the field injured and could not complete the game. That leaves us with 77 games, including the ongoing season. We will give each of the performance in these games equal weight for our simulations. Which means, that for each of the set of games we are trying to predict, Rodger's performance is equally likely to be similar to any of the past 77 games. 

Now that we have narrowed down our input data set to what we believe is a representative range for the upcoming games, Let us see how the results that we get from MonteCarlo compare to those we would have gotten by straight averages. For the seasons from 2011-2015, for the games that we are considering as input, The average for Rodgers was about 276.75 yards per game. If we were to make predictions based on this average, we would say that for the entire season, Rodgers will throw for 4428 yards this season. Also, for the first six games of the season(Games completed at the time of writing of this article) Rodgers, based on average would have accomplished 1660 yards.

Testing Predictions Against Past Games

Let us run the MonteCarlo simulations for the first 6 games of the season, assuming that these games have not yet taken place. In other words, let us pretend it is the beginning of the season and we are trying to predict how many yards Rodgers is going to throw for in the first 6 games.
We get the following results -

Predictions For The First 6 games of 2016 Season
15% Certainty1835 yds
30% Certainty1750 yds
50% Certainty1661 yds
70% Certainty1570 yds
80% Certainty1524 yds
85% Certainty1484 yds

These results can be interpreted as confidence ranges for Rodger's performance. What Monte Carlo is telling us is that we have an 85 percent confidence that Aaron Rodgers can throw for at least 1484 yards, we have a 50% certainty that he can throw for 1661 yards, 15% certainty for 1835 yards, and so on. As we see, the higher the confidence level the lower the number of yards we can predict.  So far, Rodgers has thrown for 1496 yards, which is 164 yards(or more than the total yards in a game vs Arizona last year) off. The 85 percent certainty number from Monte Carlo, on the other hand, is off by 12 yards, or for a prolific quarterback like Aaron Rodgers, the yards gained from one pass. At the beginning of the season, Rodgers (and his agent) can use this information to set expectations for the season. Coaches can use this information to plan for the season and decide how much importance they put on the run game and on their defense based on the level of confidence/risk they want to assume.

These numbers also provide us some validation for our present model and give us another bit of information. The fact that we are getting such a close prediction at the 85% certainty mark, tells us that Aaron Rodgers is performing at a lower level than what he is capable of. 85% certainty can by equated to saying that Rodgers is performing at 15% of his maximum potential and  about 30% of his median potential.

Predicting The Rest Of The Season

Using the same methods we used to predict the first six games, we can attempt to predict the remainder of the season. We will make the 6 games that have already happened this season as a part of our model. Running the model through Monte Carlo for the remaining ten games of the season gives us the following results - 

Predictions For The Last 10 games of 2016 Season
15% Certainty2948 yds
30% Certainty2849 yds
50% Certainty2733 yds
70% Certainty2615 yds
80% Certainty2549 yds
85% Certainty2516 yds

Since Rodgers has already thrown for 1496 yards this season, we can try to figure out the number of yards the Packers Quarterback will rack up for the season -

Predictions For The Entire 2016 Season


15% Certainty4444 yds
30% Certainty4345 yds
50% Certainty4229 yds
70% Certainty4111 yds
80% Certainty4045 yds
85% Certainty4012 yds

If we had taken the average yards per game from the previous 5 seasons(276.75 yds/game), and used that as a projection for the 16 games in this season. We would have predicted that Rodgers would pass for 4428 yards this season. Based on the simulations we have run so far, it seems that Rodgers can only hit that mark at the rate predicted with 20% certainty. For a quarterback that is operating below par and maybe inspiring a lower level of reliability, we should use a number at the other end of the scale if we were forced to pick a number. Using the 85% certainty number, which is 4012 yards, is probably a much safer bet to make and plan for, whether you are Rodgers, his coaches, his agent or someone placing bets in Vegas.

A Smarter Model

 Our model so far has been pretty straightforward. Assume that Aaron Rodgers will perform in future games in a manner similar to one of the past 77 games. The beauty of this model is the simplicity of it. It requires almost zero football knowledge to understand it. All it needs us to understand is that yards are a unit of measurement of productivity for a player in a football game. We do not have to understand any rules, strategies or other measures and metrics regarding football. What if we could come up with a smarter(maybe better) model that still maintains this simplicity. 

"Everything should be as simple as possible, but no simpler" - Albert Einstein

Let us run the same simulations using a model where we considered the opponent that the Green Bay Packers are up against. What that means for us is that as we try to figure out future performances, we will not randomly select from the past 77 games. We will instead simulate from full games that Aaron Rodgers has played against the particular opposition the packers up against. In essence, all games against Chicago Bears will be sampled only from prior games against Chicago Bears. Using this model we get the following results for the first 6 games of 2016 -

Predictions For The First 6 games of 2016 Season


15% Certainty1717 yds
30% Certainty1641 yds
50% Certainty1554 yds
70% Certainty1481 yds
80% Certainty1432 yds
85% Certainty1402 yds

These predictions are all lower than the predictions of the simpler "random" model. In fact, in this case, the actual yardage of 1496 has a 66% certainty or saying that Rodgers is performing at 34% of his maximum potential (as opposed to 15% from the "random" model) .Why is this model giving us more pessimistic results? Why are the same 1496 actual yards interpreted as different levels of performance for Aaron Rodgers? Taking a closer look at the data answers the question for us. Since 2011, Rodgers' only 400+ yard games have come against Denver, Washington, and New Orleans, teams that are not on the schedule for 2016. This means that when we simulate the games for 2016 based on the opposing team, these games do not get considered at all. This lowers the projections for the group of games we are simulating for.

The projections for the remainder of the season and the overall projections are as follows - 

Predictions For The Last 10 games of 2016 Season


15% Certainty3046 yds
30% Certainty2943 yds
50% Certainty2854 yds
70% Certainty2761 yds
80% Certainty2704 yds
85% Certainty2675 yds

Predictions For The Entire 2016 Season


15% Certainty4542 yds
30% Certainty4439 yds
50% Certainty4350 yds
70% Certainty4257 yds
80% Certainty4200 yds
85% Certainty4147 yd

This model for Monte Carlo suggests that Rodgers can be expected to do better than the predictions from the "simple" model from the rest of the season. The Packers' Quarterback has historically performed better against the teams the Packers are going to play in the remaining of the season as compared to those that they have played in the last 6 games. This is also borne out when we look at averages - Against the first 6 oppositions, Rodgers averaged 263 yds/game as opposed to 270 yds/game against the next 10 opponents.

In Summation

In summation, we can conclude that Monte Carlo predictions (probabilistic forecasts) give us a much better chance of answering the question regarding multiple game performance than averages do. Based on our level of confidence/risk tolerance, we can choose the certainty level and plan accordingly. We also see that different models give us different results. We have to figure out the best models that fit the reality of our situation, but at the same time not make them too complex or specialized. As Albert Einstein said - "Everything should be as simple as possible, but no simpler".

Monday, October 10, 2016

Donald Trump : The Lean-Agile Candidate

This article is not an endorsement of Donald J Trump. Instead, this is an analysis of how well the Trump campaign has utilized many of the techniques and strategies of Lean and Agile to run a very successful campaign under some very trying conditions. The campaign, since its beginning, has had a shoestring budget, a mercurial candidate with no prior public office experience, very little core establishment support and lack of a broad, informed policy platform. Despite all these hindrances, Donald Trump, has not only beaten out a crowded Republican field but also remained competitive till late in the election cycle against an established political Titan, Hillary Clinton.

Lean Startup

Donald Trump's candidacy was considered a joke for quite some time. This gave Trump the ability to take risks to make a mark for himself. In the beginning of his campaign, he made many outlandish statements. Many of which would easily have rendered him unelectable if he was being taken seriously. His immediate strategic focus though was to make some noise and gain notoriety. He is a startup in the field of established players. Doing what everyone else does is not going to help him separate himself in a field of 17 contenders for the Republican nomination. The same applies to new products. Yes, the table stakes (In this case having a pulse and enough backers to launch a bid) are necessary, but not enough to be successful. You have to stand out, even if it is in an unorthodox manner to gain market share. Breaking the mold and having a distinctive appeal is critical for any startup.


At this point, it is not just the primary voters that take note of Donald Trump, but his outlandish statements start bringing in a lot of media attention. The amount of free air time that Donald Trump's comments and Trump as a guest himself gets from the various networks greatly exceeds the paid and free airtime for any of his competitors. This goes a long way in cementing the Trump political brand. Trump knows that the voters in the Republican primary are not fans of the media. Hence, while he feeds soundbites to the media, he also chastises them for unfair coverage. This becomes a consistent theme for the remainder of the Trump campaign. As a new product, it is important to establish a brand with your customers. Use all avenues available to remind your customers of your brand and how it stands out.

This beginning and most of the rest of the campaign seems to have been run using Lean Startup principles almost by the book. Trump repeatedly employs the Build-Measure-Learn loop to not just figure out the right things to say, but also to create and adjust policy positions. The campaign guides its steps by observing the customer reaction and understanding them first hand, rather than through pollsters and policy experts.

Limiting WIP

Trump has been, until recently very focused on the immediate strategic direction. During the primaries, Trump campaign had multiple hurdles ahead of them. Instead of tackling all 16 of his opponents at once, Trump goes after them one at a time. While others are not taking him seriously, he starts by discrediting the most lucrative target. The establishment heavyweight Jeb Bush. Trump repeatedly calls him weak and makes sure that Bush is known as the establishment candidate. This is a great strategic direction to take as it is potentially the easiest and most lucrative. Notably, he does not go after the other candidates at this time. He limits his WIP to one strategic direction at a time. Trump does not have the resources to spread his attacks out. This forces him to be lean. repeating the same message about one candidate over and over helps him get the best results against that candidates with his customers.

The Trump campaign in time, shifts focus to Marco Rubio, John Kasich, and eventually Ted Cruz in order to eliminate the competition one at a time. He picks on them one by one and brands them in ways that would hurt them with republican primary voters.  This entire time Trump's focus was one of his Republican opponents and not Hillary Clinton. He made wildly unpopular statements, but these were unpopular overall, not amongst the voters that would show up to vote in the republican primaries. Trump proved that limiting your WIP works at all levels, especially at the strategy level in a lean organization.

Feedback Loops

Most political campaigns thrive on feedback loops. They adjust as they get more information through polls and media feedback. The Trump campaign has taken this to the next level. Trump seems to be deliberately creating these feedback loops using rallies and social media. There are elements of Lean UX, Dev-Ops and Continuous Delivery present in the implementation of these feedback loops.

Lean UX

As opposed to other candidates that spend long hours coming up with sound bites and attack lines, Trump's campaign spends very little time doing such analysis. Trump tries out every new sound bite with audiences, both live and on twitter, till he finds the one that sticks. The campaign is consistently pursuing the "linguistic killshot" as Scott Adams (of Dilbert fame) puts it. Lying Ted, Little Marco, Weak and Low Energy Jeb Bush and even Nice Ben Carson were all effective branding of Trump's opponents that hurt their campaigns. These "linguistic killshots" were not a result of hours and days of research. The Trump campaign took multiple ideas straight to the rallies and carried on the ones that gained traction. The users help shape the experience rather than simply being subjected to it.  Instead of spending days analyzing and doing research, Trump was getting ideas out first and getting feedback. When you are trying out new user experiences, the easiest way to validate which ones work is to actually put them in front of your customers. 

Linguistic killshot: An engineered set of words that changes an argument or ends it so decisively, I call it a kill shot. One of the ones Donald Trump used was referring to Bush as a "low energy guy" or Carly Fiorina as a "robot" or Ben Carson as "nice." - Scott Adams

Dev-Ops and CD

Much is made of Trump's late night/early morning tweets. These are often the more "fiery" and controversial tweets that Trump puts out. The Trump campaign has realized the power of social media and utilized in a lot more effectively than the Clinton campaign has.Trump has been running his campaign with a lot less operating cash than his opponents. In May, the Trump staff consisted of 69 staffers as opposed to 685 staffers for Hillary Clinton. This means that Trump has to find new and better strategies for getting the word out. Trump delivers his messages directly to his customers, early and often. Early morning tweets are not just seen by his followers when they wake up, but due to their controversial nature, re-tweeted and replied to by thousands of folks. What is even more important is that the early morning news and talk shows pick these up and talk about them for hours. Trump is not just the engineer of the messages but also does the deployment of it out in the field. He delivers early and often. By doing this, he is able to shape the conversation for the day without making the rounds of the news and talk shows.

According to New York Times, in March, Trump had received almost 2 billion dollars of free media coverage due to his continuous delivery of unique messages.


A common charge leveled against Trump is that he "takes the bait" and cannot resist responding to every charge leveled against him. This might be a valid charge and Trump might have little control over his instincts. The tendency to do this, though, still provides all the same benefits. By using multiple rapid fire responses, the campaign is able to identify which ones resonate with the voters and implement those in detail. Trump being the engineer and the deployer of these cuts down the amount of time it takes to run these responses by the "end-users". The campaign also does not waste time trying to craft the perfect response and allows the "customers" to choose which response the campaign spends time on, in order to perfect it.

AntiFragile

The trump campaign, (until very recently) has been the definition of anti-fragility.
Antifragility is a property of systems that increase in capability, resilience, or robustness as a result of stressors, shocks, volatility, noise, mistakes, faults, attacks, or failures.
Software developers anticipate volatility and shocks to the system in order to make the system perform better under these forms of stress. A great example of this is Netflix's Chaos Monkey that turns off services randomly to make sure the system is built to handle the stress. Trump is his own campaign's Chaos Monkey. He is unbridled in his speeches, tweets and personal interactions. His team has known this from day one. Every campaign manager and media relations person on his team has become an expert in the art of the spin. This means two things. First, it does not matter what Trump actually says. The media and the political elite might sneer at his remarks, but Trump's team has a lot of practice in handling these situations. They make every remark, that would otherwise sink a campaign, into a positive for the country. Second, it allows Trump to keep running his experiments with words. The media and the establishment up in arms against him is an anticipated stressor. It makes his image as the anti-establishment, outsider more robust with every attack. The Trump campaign is the mythological Hydra, which becomes stronger when it is attacked.


Small Batches

This political cycle has made it hard to concentrate on actual policy matters. While most candidates went into the race with filled out platforms and explicit positions on all the major issues, Donald Trump did no such thing. He went about making his policy explicit in small batches. He first released  his position on immigration in August 2016. Even in this case, most of the plan was kept flexible to account for the feedback from the voters. The Trump campaign has shifted positions based on what seems to work with the public at their rallies, rather than what the experts think. This is the exact opposite of a well thought out, heavily analyzed political platform. Rather than having a set in stone "product roadmap", the Trump Campaign releases information on policy initiatives in small batches so that they can be easily consumed and future initiatives adjusted as needed. 

Trump is executing Build-Measure-Learn, while everyone else is doing large upfront analysis. Trump's campaign is agile while most others are living in the traditional Waterfall world. Trump is able to make repeated policy shifts without people taking much notice because he makes these shifts in small batches by changing little details as opposed to having to stay stuck on a pre-defined and committed to a platform.

The Lean-Agile Candidate

Donald Trump is far from an ideal presidential candidate. He has great flaws that he seems to escape just as they catch up to him. Many times, he comes out stronger than before because these flaws help him show his anti-fragility. The Trump campaign has done a great job of tapping into the voters directly and making an otherwise improbable candidate into a strong presidential contender. The strategies and tactics used by the campaign, whether knowingly or not, bear great resemblance to the Lean and Agile principles that we encourage teams and organizations to adopt. Of late, Trump's old words have come back to haunt him. Such deep-rooted flaws are probably beyond the anti-fragile ability of his campaign. However the race ends, this lean and agile campaign has probably changed the world of political campaigns for years to come. 

Monday, September 12, 2016

Types of Variability and Roger Federer's Serves

What makes top ranked tennis players so extraordinary? Apart from the pure athleticism and power that they are naturally gifted with, there is the amazing consistency that they have developed in their shot making as well. The following graph from BBC shows the landing spots for Roger Federer's serves during Wimbledon 2012.


What stands out here is the placement, which is amazingly consistent. The serves are either close to the middle of the court or the sidelines. The graph is only showing successful serves, and hence there are some missing spots, but they would all be close to the existing clusters. You can see why he aims for certain areas as well, by the concentration of the "Unreturned" dots. There is some variability in the landing spots of the serves, even the best in the business cannot land it on a dime every time, but he is amazingly accurate. There might have been some serves that were affected by external factors, like gusts of wind, that strayed either off the court or away from the intended areas represented by the clusters. In effect, Federer might not be able to hit a dime with his serve every time due to the natural variations or due to external factors.

Understanding variation is the key to success in quality and business. - W Edwards Deming

Most systems, have two types of variabilities present in them. First, are the "common cause" variations that exist in all systems as it is almost impossible to get a process that produces work at the exact same pace for every item.  The second type is the "special cause" variations that happen once in a while due to external factors or special circumstances that are usually out of the team's control. These are usually the ones that are the major outliers in terms of cycle time. A scatter plot as shown in the single items forecasting post (and copied here) is an easy way to visualize these variabilities.


We have talked in earlier posts about understanding uncertainty, but it is not enough to just understand that uncertainty and variability exist in every process. We need to understand the types and sources of these variabilities so that we can react to them appropriately. Let us talk about these in the context of a software development team.
  • Common Cause Variability - This is the inherent variability present in knowledge work. This can be caused by some stories being easier to accomplish than others, internal queues within the team, process policies of the team, holidays etc. The main attribute of common cause variations is that they are caused due to things within the team's control. Thes are usually natural variations. These are similar to clusters of dots on Roger Federer's service map. On the scatter plot of stable teams that have consistent policies, these are easy to spot. For example, in the figure shown above, 95% of the stories are getting done in 14 days or less. The tight grouping and random distribution of the dots under the 2-week line represent the natural variability in the team's process.
  • Special Cause Variability - This is variability that is usually caused by external factors. This can be a result of work getting stuck in external queues, half the team getting sick, emergency production issues, a machine with un-checked in code dying, etc. These are usually things that come out of left field and cannot be predictably accounted for. These are also the outliers on Roger Federer's service map (some of which are not shown here) which could be caused by wind gusts or a racquet string breaking. Once again, these represent themselves on scatter plots for stable teams (like the one above) as the outliers. 
As we are fully aware the big question we are always asked is - "When will it be done?". In answering this question with high confidence, having low variability is the first important step. With a wide variation in story cycle times, the answers are only as good as gut feels. In order to get better at answering the question, we have to adjust our policies to ensure a lower range of variability. Just like professional tennis players we have to work hard at controlling the variability to the lower levels of natural variations. We know that variability will still exist, but we can definitely work at controlling the amount of variability through consistent and smart practices.

Very often, teams do the opposite and make the mistake of going for speed before achieving predictability. 

For someone taking up tennis for the first time, it is more important to get the mechanics and placement of a service than hit the serve a fast as possible. Once you have the basics in place, we can dial up the speed, while trying to keep the natural variability in control. The same is applicable to team's finishing stories. Very often, teams do the opposite and make the mistake of going for speed before achieving predictability. Our first objective should be to tweak the team's policies in order to finish the stories in short cycle times and limit the variability that the team is introducing into the process. Once we have limited this type of variability, we can start applying other adjustments to make our stories flow through faster. Going for speed before achieving predictability can often have the exact opposite effect on the amount of variability in the system.

There are multiple tactics at the team's disposal in order to reduce the common cause variability in their process. Some of these are outlined below -
  • Optimize WIP - The easiest way to reduce the variability in the system is to control the number of things that the team is working on. The more things we work on, the longer things take (Little's Law). The longer things take, the greater the range of the number of days your stories take to complete. It also reduces task switching, removing that variability as well.
  • Right Sizing Stories - Your stories don't all need to be the same size, but need to have an upper limit. If a story seems that it will take longer than the team's SLA (of course you will need to have one for this). The SLA then becomes your upper bound for common cause variations.
  • Sizing Stories In Progress - We very often realize that stories are larger than we initially thought as we start working on them. The inertia of the story often stops us from splitting it when in flight. Getting past the inertia can help us get a more predictable flow and also more options for story prioritization.
  • Swarming - Often there will be a story that cannot be split and is large enough to allow for multiple team members to work on it together. This can help the work item get done in a predictable timeframe.
  • Eliminating(or reducing) Queuing Time - Very often work just spends time sitting waiting for handoffs to happen. This time between Analyst-Developer or Developer-Tester handoffs is where a story can spend more time than even the time it was actually worked on. Reducing this as much as possible can bring a lot of stability to the system.
These and other similar strategies can help create a system where the variability due to common causes is within a very small range. Any variability that is due to special causes is very easily visible as outliers on a cycle time scatterplot a shown below.


If a definite pattern starts appearing with the outliers, this might suggest that these are common causes masquerading as special causes. Variabilities that might be special cause for teams might be common cause for an organization.

This separation in the two can be used to determine what strategies to use for special cause vs common cause variations. The outliers, as they represent the special causes, would, in this case, be things out of the Team's control. These would often be external dependencies that the team needs assistance with. If a definite pattern starts appearing with the outliers, this might suggest that these are common causes masquerading as special causes. Variabilities that might be special cause for teams might be common cause for an organization. These require similar approaches as mentioned above for controlling common cause variabilities at the organization level. They need to be reviewed by leads of teams in a common forum in order to detect these patterns and apply solutions that reduce these to common cause levels. Of course, that is not applicable to every special cause issue, but can go a long way in both become predictable and moving faster.

Taking steps to control common cause variability at the team level exposes special cause variations. Special cause variation at the team level can often be common cause variation at the organization level. This would often mean simple solutions, similar to the ones that helped at the team level, can be developed and applied at the organization level to reduce the time it takes for work items to finish. The delays at the organization level might be more costly than the delays at the team level. The easiest way to expose them is to get control of common cause variability at the team level. This can help the common cause variability at the organization level to become evident.

Reference : The scatterplots in this post are from the Analytics tool developed by Dan Vacanti (https://www.actionableagile.com) .

Monday, August 22, 2016

Probabilistic Forecasting - Effects Of Uncertainty On Predictions(aka Controlling Variability)

The variability in Monte Carlo predictions or the range of predictions is a direct result of the variability in the team's daily throughput. A team with a very consistent throughput will produce a "tighter" Monte Carlo result set. The results of simulations for teams that have a regular daily throughput will not change by much at different confidence levels. The results of simulations for teams that have fluctuating daily throughput will have more pronounced changes as we change confidence levels. Lets take the following two hypothetical teams as examples.
Both teams in this case have closed 30 stories over the course of 30 days.
Team A finishes one story almost every day, with a few days where they finish 2 stories. Their historical throughput graph looks like this - 

Here the horizontal axis is a timeline and the vertical axis is the number of stories done on that day. When we run Monte Carlo simulations (10,000 simulations) for this team the following results appear -

The percentage lines on the above graph are levels of confidence that help us interpret the graph. Based on the above results, Team A has a 95% chance of getting at least 24 stories done, 70% chance of 28 or more, and a 50% chance of getting 30 or more stories done over the next 30 days.

Now let us consider Team B, which has a more variable daily closure rate. The team tends to have some days when they close a bunch of stories and other days when they do not close any at all. Their throughput graph looks like this - 

Just like Team A, Team B also competed 30 items over the same time period.

Examining the results from Monte Carlo, we can see that there are many more possibilities and the numbers on the conservative end of the spectrum are much lower. Team B has a 95% chance of getting at least 16 stories done, 70% chance of 21 or more, and a 50% chance of getting 30 or more stories done over the next 30 days.
We can see that in both data sets, the middle value is 30 stories, but values that give the same amount of confidence on the higher side are much lower for the team with higher variability in throughput. We can conclude from this that in order to make predictions with high confidence, that still help us deliver the best results from our teams, we need to control the variability in our throughput. The question though is - How can team create systems that have lower variability so that we can make predictions with higher confidence?

Probabilistic Forecasting - Forecasting In The Face Of Uncertainty - Multiple Items(Monte Carlo)

Multiple Item Forecasts

The rate at which stories get done, can help us figure out what the capacity of a team is for a given period of time. We need to make sure that we do not fall into the Flaw of Averages trap though. We have to model the inherent uncertainty in our processes in order to make sensible predictions for a team. Apart from how long it takes for things to get done, uncertainty also presents itself in the form of how many things get done on a given day. Using the historical trends of how many things are getting done on a daily basis we can model the future, assuming that the team will behave the way it has behaved in the past. This, in essence, is the Monte Carlo method. Monte Carlo uses data from the past to give us probabilistic capacity for a team. We assume that if, say in the past 30 days there have been 3 days when the team closed 2 stories, then there is a 10% (3 divided by 30) likelihood that any random day in the future, the team will close 2 stories.
Monte Carlo Method
The Monte Carlo method(One variation of it) as we use it, runs through the following steps -
  1. We determine a past range to use and a future range to predict.
    1. The range we select is usually in the order of a few weeks.
    2. We use the latest few weeks as we believe that the latest data is the best representation of future performance.
  2. For the first day in the future range that we are trying to predict, we randomly select a day from the past range.
    1. The throughput from the randomly selected day in the past is assigned to the day in the future range we are trying to predict.
  3. Step 2 is repeated for all days in the future that we are trying to predict.
  4. When all the days have been predicted using the past range, the total of all the throughputs assigned to the future days, gives us one answer for how many stories can the team get done.
    1. We record this throughput as one possible result, which can answer how much capacity thee team has for a given time period.
  5. We repeat steps 2 through 4 a few thousand times and gather the results of each of those simulations.
The results of these simulations can be over a wide range, depending on the variability in the team's processes(i.e., fluctuations in number of stories closed on a daily basis) and the length of time we are trying to predict over. The numerous predictions though, all represent the possibilities available to us to choose from. Based on the distribution of these possibilities, we can start saying what is the probability that we can get at least x number of stories done. For example, if 85 percent of our simulations told us that over the next 60 days a team can do either 30 stories or more than 30 stories, we can say that we have an 85% confidence that the team can do 30 or more stories. The same set of simulations might have 50 percent of our results be 40 stories or more. This would give us a 50% confidence that the team can do 40 stories or more over the next 60 days. We can now plan according to the amount of risk we are willing to take. If we plan for more stories, at least we are aware of the amount of risk we are taking.

Next : Uncertainty And Predictions