With just a few regular season weeks left, many region races are a complicated mess, especially those that are only a few games into region play.
Here I’d like to look at the math behind the region races and the playoff seeding and how the Monte Carlo season simulation helps us answer some of those questions. I’ll only focus on the top six classifications since the Class A playoffs are seeded differently. I’m excluding teams playing non-region schedules (Clarkston and Cross Keys in 5-AAAAA, Forest Park in 4-AAAAAA, and Osborne in 6-AAAAAA) and for simplicity’s sake I’m ignoring the AAAAAAA at-large bid.
First, let's consider the sheer number of possible region standings. Since the order of the four playoff teams matters, we'll be working with permutations (as opposed to combinations). Calculating permutations is fairly simple, but requires a math function referred to as a factorial, denoted by a "!".
If each region sends four teams to the playoffs, then the possible permutations are:
(# of Members)! / (# of Members – 4)!
So for a region with six members, there are:
(6)! / (6 - 4)!
= (6 x 5 x 4 x 3 x 2 x 1) / (2 x 1)
= 720 / 2
= 360 possible permutations of four teams.
Here’s how Class AAAA breaks down:
We can see that Region 1-AAAA has 1,680 permutations while even Region 7-AAAA has 120 with just five teams.
Using this information, we can then determine the surprisingly large number of ways to seed the Class AAAA bracket:
1680 x 360 x 360 x 840 x 840 x 840 x 120 x 360 = 5,574,884,681,318,400,000,000
Here’s the number of possible brackets for the top six classifications:
For reference, Class AA has 3.64 x 10^23 possibilities while there are "only" about 5.6 x 10^21 grains of sand on the entire Earth.
In short, there are a lot of possible brackets.
However the outcomes of the games are probabilistic, meaning that, using the ratings, we can assign a probability that a particular team will win a particular contest. Since the outcomes of the games are probabilistic, by extension the region standings are probabilistic, and by extension again the playoff brackets are probabilistic.
But because there are too many possible brackets to simply calculate the odds associated with them all, the Monte Carlo method becomes a practical way to estimate them. The Monte Carlo method is a well-accepted method for solving complex combinatorial problems through simulation. So, the program essentially simulates the season many times over by simulating the remaining regular season games, constructing region standings, seeding the playoff brackets, and finally simulating the playoffs. As the program executes the simulations, it counts, among many other things, how often each team wins the championship. The program conducts 1,000,000 simulations in about five hours, or roughly 50 seasons a second, although as the season progresses it gets down to about three hours because it uses the results of games already played instead of simulating their outcomes.
Among the items the program counts are the number of times each permutation of playoff teams occurs for each region. For example, Region 1-AAAAAA has five teams and 120 possible permutations of playoff teams, so the program counts how often each one of those 120 permutations occurs after simulating the remaining regular season games and constructing the region standings. The output then looks something like:
Here we see that out of the 1,000,000 simulations, Valdosta, Northside (Warner Robins), Lee County, and Houston County finished in that exact order 83,078 times, or roughly 8.3% of the time. Since it is the most common occurring of the 120 permutations, it is the modal playoff seeding for Region 1-AAAAAA.
The modal bracket is then constructed by simply starting with the modal playoff seeding from each region and assuming the favored team wins each game, which is the most likely outcome of all the possible outcomes.
However the modal bracket, while useful, is still extremely limited. After all, while it represents the most likely outcome of the season it is still highly unlikely to exactly occur. For example, while there is an 8.3% chance the playoff seeding for Region 1-AAAAAA will be the modal playoff seeding, there is a 91.7% chance some other scenario will unfold.
The Monte Carlo simulation takes into account the modal playoff seeding plus all the other possible outcomes as well. That's why occasionally there will appear to be some discrepancies between the modal bracket and the table of odds.
For example, in 7-AAAAAA, the modal playoff seeding is Johns Creek, Centennial, Alpharetta, and Cambridge with a 23.0% chance of occurring. However, because of their remaining region games, Centennial actually fares slightly better in the full simulation. Overall they have a 45.5% of winning the region as compared to Johns Creek's 42.3% chance, Centennial has a 99.5% chance of making the playoffs whereas Johns Creek has a 98.2% chance, and ultimately Centennial's odds to win the title are 4,544.45 to one while Johns Creek's are 5,616.98 to one.
So again, while the modal bracket is useful and entertaining, there is much more to consider than the one "most-likely-but-still-highly-unlikely" outcome it represents.
Below is every region playoff permutation with greater than a 3% chance of occurring:
Using the more detailed output such as shown above, it's possible to consider some other measures of just how open some region races actually are.
The first measure could be to simply see how often the modal playoff seeding is realized in the simulation. For example, the modal playoff seeding for 7-AAAAAAA has a 49.2% chance of occurring. Another measure can be how many each of the possible permutations is eventually realized in the simulation. For example, Region 2-AA has 8 teams and so 1,680 permutations. Of those only 104 were realized during the 1,000,000 simulations. We might conclude the Region 2-AA race is fairly close with perhaps only a small handful of games remaining that will decide the playoff seeding. We can also see that Regions 1-AAA, 1-AAAAA, 1-AAAAAA, and 1-AAAAAAA all had 100% of their permutations realized in 1,000,000 simulations, so it appears those regions are still very much up for grabs.
But a final measure which I think nicely captures what we're looking for is what I will refer to as a "Parity Index", which shows the probability of the exact same playoff seeding being realized twice in a row in the simulations. Here we'll see that Region 6-AA might be the one to get the popcorn out for: