Monday, September 12, 2016

Types of Variability and Roger Federer's Serves

What makes top ranked tennis players so extraordinary? Apart from the pure athleticism and power that they are naturally gifted with, there is the amazing consistency that they have developed in their shot making as well. The following graph from BBC shows the landing spots for Roger Federer's serves during Wimbledon 2012.

What stands out here is the placement, which is amazingly consistent. The serves are either close to the middle of the court or the sidelines. The graph is only showing successful serves, and hence there are some missing spots, but they would all be close to the existing clusters. You can see why he aims for certain areas as well, by the concentration of the "Unreturned" dots. There is some variability in the landing spots of the serves, even the best in the business cannot land it on a dime every time, but he is amazingly accurate. There might have been some serves that were affected by external factors, like gusts of wind, that strayed either off the court or away from the intended areas represented by the clusters. In effect, Federer might not be able to hit a dime with his serve every time due to the natural variations or due to external factors.

Understanding variation is the key to success in quality and business. - W Edwards Deming

Most systems, have two types of variabilities present in them. First, are the "common cause" variations that exist in all systems as it is almost impossible to get a process that produces work at the exact same pace for every item.  The second type is the "special cause" variations that happen once in a while due to external factors or special circumstances that are usually out of the team's control. These are usually the ones that are the major outliers in terms of cycle time. A scatter plot as shown in the single items forecasting post (and copied here) is an easy way to visualize these variabilities.

We have talked in earlier posts about understanding uncertainty, but it is not enough to just understand that uncertainty and variability exist in every process. We need to understand the types and sources of these variabilities so that we can react to them appropriately. Let us talk about these in the context of a software development team.
  • Common Cause Variability - This is the inherent variability present in knowledge work. This can be caused by some stories being easier to accomplish than others, internal queues within the team, process policies of the team, holidays etc. The main attribute of common cause variations is that they are caused due to things within the team's control. Thes are usually natural variations. These are similar to clusters of dots on Roger Federer's service map. On the scatter plot of stable teams that have consistent policies, these are easy to spot. For example, in the figure shown above, 95% of the stories are getting done in 14 days or less. The tight grouping and random distribution of the dots under the 2-week line represent the natural variability in the team's process.
  • Special Cause Variability - This is variability that is usually caused by external factors. This can be a result of work getting stuck in external queues, half the team getting sick, emergency production issues, a machine with un-checked in code dying, etc. These are usually things that come out of left field and cannot be predictably accounted for. These are also the outliers on Roger Federer's service map (some of which are not shown here) which could be caused by wind gusts or a racquet string breaking. Once again, these represent themselves on scatter plots for stable teams (like the one above) as the outliers. 
As we are fully aware the big question we are always asked is - "When will it be done?". In answering this question with high confidence, having low variability is the first important step. With a wide variation in story cycle times, the answers are only as good as gut feels. In order to get better at answering the question, we have to adjust our policies to ensure a lower range of variability. Just like professional tennis players we have to work hard at controlling the variability to the lower levels of natural variations. We know that variability will still exist, but we can definitely work at controlling the amount of variability through consistent and smart practices.

Very often, teams do the opposite and make the mistake of going for speed before achieving predictability. 

For someone taking up tennis for the first time, it is more important to get the mechanics and placement of a service than hit the serve a fast as possible. Once you have the basics in place, we can dial up the speed, while trying to keep the natural variability in control. The same is applicable to team's finishing stories. Very often, teams do the opposite and make the mistake of going for speed before achieving predictability. Our first objective should be to tweak the team's policies in order to finish the stories in short cycle times and limit the variability that the team is introducing into the process. Once we have limited this type of variability, we can start applying other adjustments to make our stories flow through faster. Going for speed before achieving predictability can often have the exact opposite effect on the amount of variability in the system.

There are multiple tactics at the team's disposal in order to reduce the common cause variability in their process. Some of these are outlined below -
  • Optimize WIP - The easiest way to reduce the variability in the system is to control the number of things that the team is working on. The more things we work on, the longer things take (Little's Law). The longer things take, the greater the range of the number of days your stories take to complete. It also reduces task switching, removing that variability as well.
  • Right Sizing Stories - Your stories don't all need to be the same size, but need to have an upper limit. If a story seems that it will take longer than the team's SLA (of course you will need to have one for this). The SLA then becomes your upper bound for common cause variations.
  • Sizing Stories In Progress - We very often realize that stories are larger than we initially thought as we start working on them. The inertia of the story often stops us from splitting it when in flight. Getting past the inertia can help us get a more predictable flow and also more options for story prioritization.
  • Swarming - Often there will be a story that cannot be split and is large enough to allow for multiple team members to work on it together. This can help the work item get done in a predictable timeframe.
  • Eliminating(or reducing) Queuing Time - Very often work just spends time sitting waiting for handoffs to happen. This time between Analyst-Developer or Developer-Tester handoffs is where a story can spend more time than even the time it was actually worked on. Reducing this as much as possible can bring a lot of stability to the system.
These and other similar strategies can help create a system where the variability due to common causes is within a very small range. Any variability that is due to special causes is very easily visible as outliers on a cycle time scatterplot a shown below.

If a definite pattern starts appearing with the outliers, this might suggest that these are common causes masquerading as special causes. Variabilities that might be special cause for teams might be common cause for an organization.

This separation in the two can be used to determine what strategies to use for special cause vs common cause variations. The outliers, as they represent the special causes, would, in this case, be things out of the Team's control. These would often be external dependencies that the team needs assistance with. If a definite pattern starts appearing with the outliers, this might suggest that these are common causes masquerading as special causes. Variabilities that might be special cause for teams might be common cause for an organization. These require similar approaches as mentioned above for controlling common cause variabilities at the organization level. They need to be reviewed by leads of teams in a common forum in order to detect these patterns and apply solutions that reduce these to common cause levels. Of course, that is not applicable to every special cause issue, but can go a long way in both become predictable and moving faster.

Taking steps to control common cause variability at the team level exposes special cause variations. Special cause variation at the team level can often be common cause variation at the organization level. This would often mean simple solutions, similar to the ones that helped at the team level, can be developed and applied at the organization level to reduce the time it takes for work items to finish. The delays at the organization level might be more costly than the delays at the team level. The easiest way to expose them is to get control of common cause variability at the team level. This can help the common cause variability at the organization level to become evident.

Reference : The scatterplots in this post are from the Analytics tool developed by Dan Vacanti ( .