In a post earlier this year, we tried to figure out how many passing yards Peyton Manning would put up if he were to return to football for one game. We answered this question using the techniques for probabilistic forecasting of single items. What if we were to try and use probabilistic forecasting techniques to figure out an entire season's worth of performances. We have already discussed in earlier posts that forecasting multiple items cannot be done by simply forecasting single items multiple times (well kind of yes and no...). This means we have to reach into our toolbox to retrieve our preferred technique for predicting multiple items - Monte Carlo simulations. We are also going to change our subject to a more relevant one. As we are almost midway through the current season, let us pick an active player and see what predictions we can make concerning his performances for the remainder of this season. We are going to pick Green Bay Packers' Quarterback Aaron Rodgers as our subject for these predictions. We will compare how well he has done in comparison to our predictions so far in the season and how well he will do in the remainder of the season. Answering how many yards a quarterback, especially one as prolific as Aaron Rodgers will throw for in a season can be a daunting task. As daunting as predicting the end date for a software project.
The quality and accuracy of projections coming out of Monte Carlo depend squarely on the model being used as input for the projections. The first decision we have to make is which past data points do we use as inputs to our Monte Carlo simulations. Aaron Rodgers was drafted in 2005 and took over as starting quarterback for the Packers in 2008. That means we can safely ignore all games he participated in before 2008. Also, clearly the team and the system under which Rodgers plays has changed quite a bit since 2008. For this reason, we can limit ourselves to the last 5 seasons. We are also going to exclude any games where Rodgers left the field injured and could not complete the game. That leaves us with 77 games, including the ongoing season. We will give each of the performance in these games equal weight for our simulations. Which means, that for each of the set of games we are trying to predict, Rodger's performance is equally likely to be similar to any of the past 77 games.
Now that we have narrowed down our input data set to what we believe is a representative range for the upcoming games, Let us see how the results that we get from MonteCarlo compare to those we would have gotten by straight averages. For the seasons from 2011-2015, for the games that we are considering as input, The average for Rodgers was about 276.75 yards per game. If we were to make predictions based on this average, we would say that for the entire season, Rodgers will throw for 4428 yards this season. Also, for the first six games of the season(Games completed at the time of writing of this article) Rodgers, based on average would have accomplished 1660 yards.
Testing Predictions Against Past Games
Let us run the MonteCarlo simulations for the first 6 games of the season, assuming that these games have not yet taken place. In other words, let us pretend it is the beginning of the season and we are trying to predict how many yards Rodgers is going to throw for in the first 6 games.
We get the following results -
Predictions For The First 6 games of 2016 Season
15% Certainty | 1835 yds |
30% Certainty | 1750 yds |
50% Certainty | 1661 yds |
70% Certainty | 1570 yds |
80% Certainty | 1524 yds |
85% Certainty | 1484 yds |
These results can be interpreted as confidence ranges for Rodger's performance. What Monte Carlo is telling us is that we have an 85 percent confidence that Aaron Rodgers can throw for at least 1484 yards, we have a 50% certainty that he can throw for 1661 yards, 15% certainty for 1835 yards, and so on. As we see, the higher the confidence level the lower the number of yards we can predict. So far, Rodgers has thrown for 1496 yards, which is 164 yards(or more than the total yards in a game vs Arizona last year) off. The 85 percent certainty number from Monte Carlo, on the other hand, is off by 12 yards, or for a prolific quarterback like Aaron Rodgers, the yards gained from one pass. At the beginning of the season, Rodgers (and his agent) can use this information to set expectations for the season. Coaches can use this information to plan for the season and decide how much importance they put on the run game and on their defense based on the level of confidence/risk they want to assume.
These numbers also provide us some validation for our present model and give us another bit of information. The fact that we are getting such a close prediction at the 85% certainty mark, tells us that Aaron Rodgers is performing at a lower level than what he is capable of. 85% certainty can by equated to saying that Rodgers is performing at 15% of his maximum potential and about 30% of his median potential.
Predicting The Rest Of The Season
Using the same methods we used to predict the first six games, we can attempt to predict the remainder of the season. We will make the 6 games that have already happened this season as a part of our model. Running the model through Monte Carlo for the remaining ten games of the season gives us the following results -
Predictions For The Last 10 games of 2016 Season
15% Certainty | 2948 yds |
30% Certainty | 2849 yds |
50% Certainty | 2733 yds |
70% Certainty | 2615 yds |
80% Certainty | 2549 yds |
85% Certainty | 2516 yds |
Since Rodgers has already thrown for 1496 yards this season, we can try to figure out the number of yards the Packers Quarterback will rack up for the season -
Predictions For The Entire 2016 Season
15% Certainty | 4444 yds |
30% Certainty | 4345 yds |
50% Certainty | 4229 yds |
70% Certainty | 4111 yds |
80% Certainty | 4045 yds |
85% Certainty | 4012 yds |
If we had taken the average yards per game from the previous 5 seasons(276.75 yds/game), and used that as a projection for the 16 games in this season. We would have predicted that Rodgers would pass for 4428 yards this season. Based on the simulations we have run so far, it seems that Rodgers can only hit that mark at the rate predicted with 20% certainty. For a quarterback that is operating below par and maybe inspiring a lower level of reliability, we should use a number at the other end of the scale if we were forced to pick a number. Using the 85% certainty number, which is 4012 yards, is probably a much safer bet to make and plan for, whether you are Rodgers, his coaches, his agent or someone placing bets in Vegas.
A Smarter Model
Our model so far has been pretty straightforward. Assume that Aaron Rodgers will perform in future games in a manner similar to one of the past 77 games. The beauty of this model is the simplicity of it. It requires almost zero football knowledge to understand it. All it needs us to understand is that yards are a unit of measurement of productivity for a player in a football game. We do not have to understand any rules, strategies or other measures and metrics regarding football. What if we could come up with a smarter(maybe better) model that still maintains this simplicity.
"Everything should be as simple as possible, but no simpler" - Albert Einstein
Let us run the same simulations using a model where we considered the opponent that the Green Bay Packers are up against. What that means for us is that as we try to figure out future performances, we will not randomly select from the past 77 games. We will instead simulate from full games that Aaron Rodgers has played against the particular opposition the packers up against. In essence, all games against Chicago Bears will be sampled only from prior games against Chicago Bears. Using this model we get the following results for the first 6 games of 2016 -
Predictions For The First 6 games of 2016 Season
15% Certainty | 1717 yds |
30% Certainty | 1641 yds |
50% Certainty | 1554 yds |
70% Certainty | 1481 yds |
80% Certainty | 1432 yds |
85% Certainty | 1402 yds |
These predictions are all lower than the predictions of the simpler "random" model. In fact, in this case, the actual yardage of 1496 has a 66% certainty or saying that Rodgers is performing at 34% of his maximum potential (as opposed to 15% from the "random" model) .Why is this model giving us more pessimistic results? Why are the same 1496 actual yards interpreted as different levels of performance for Aaron Rodgers? Taking a closer look at the data answers the question for us. Since 2011, Rodgers' only 400+ yard games have come against Denver, Washington, and New Orleans, teams that are not on the schedule for 2016. This means that when we simulate the games for 2016 based on the opposing team, these games do not get considered at all. This lowers the projections for the group of games we are simulating for.
The projections for the remainder of the season and the overall projections are as follows -
Predictions For The Last 10 games of 2016 Season
15% Certainty | 3046 yds |
30% Certainty | 2943 yds |
50% Certainty | 2854 yds |
70% Certainty | 2761 yds |
80% Certainty | 2704 yds |
85% Certainty | 2675 yds |
Predictions For The Entire 2016 Season
15% Certainty | 4542 yds |
30% Certainty | 4439 yds |
50% Certainty | 4350 yds |
70% Certainty | 4257 yds |
80% Certainty | 4200 yds |
85% Certainty | 4147 yd |
This model for Monte Carlo suggests that Rodgers can be expected to do better than the predictions from the "simple" model from the rest of the season. The Packers' Quarterback has historically performed better against the teams the Packers are going to play in the remaining of the season as compared to those that they have played in the last 6 games. This is also borne out when we look at averages - Against the first 6 oppositions, Rodgers averaged 263 yds/game as opposed to 270 yds/game against the next 10 opponents.
In Summation
In summation, we can conclude that Monte Carlo predictions (probabilistic forecasts) give us a much better chance of answering the question regarding multiple game performance than averages do. Based on our level of confidence/risk tolerance, we can choose the certainty level and plan accordingly. We also see that different models give us different results. We have to figure out the best models that fit the reality of our situation, but at the same time not make them too complex or specialized. As Albert Einstein said - "Everything should be as simple as possible, but no simpler".