 # Against All Odds — Upsets

It’s a great time to be a Dayton fan!  It’s the first time the school has reached the Sweet Sixteen since before all the Dayton fans I know where born…and they did it as an 11 seed!  Their game against Ohio State was the first game of the tournament to tip.  A little over two hours later, everyone’s brackets were busted.  After watching the tournament this weekend, I felt like there were a lot of upsets this year.   (Or at the very least my bracket was getting busted up pretty quickly.)  But are there really more upsets this year than normal?

First, I’m going to define an upset as any lower seed beating a higher seed.  I’m of the personal belief that 8/9, 4/5, and 1/2 match-ups shouldn’t count as upsets, but for this analysis, I’m going to consider these as possible upsets.  First, let’s look at how many upsets there were this year.  Through two rounds, there have been 13 upsets.  That’s one less than last year at this time, and just at the average (if you round).  So this is a rather average year.  Three of four 1-seeds are still alive — not too much different from what you might expect.

Historically, 1999 had the most upsets with 19 in the first weekend of play.  Nothing really stuck out like how Florida Gulf Coast got to the Sweet Sixteen as a 15-seed last year.  1991 had the fewest number of upsets in the first weekend with just nine.  All the upsets through out the years appear to be random noise fluctuating around an average of 12.8 upsets (out of 48 games played) in the first two rounds per year.  The conclusion you can draw from this is that the number of upsets is rather consistent over the years with not much systematic change from year to year.

Thirteen upsets is a lot; it’s almost 1/3 of all the games played this weekend.  Last week, I posted the probability that a seed would win in the first round of the tournament.  This was a linear relationship starting with an almost certain probability for the 1-seeds and then going to a 50/50 split for an 8/9 game.  On the surface it doesn’t seem like it almost 1/3 of the games would be upsets, but if you look at all the possibilities it will make more sense.

Let’s look at Dayton’s 11-seed.  A 11-seed has a historical 34% chance of upsetting a 6-seed in the first round, but when considering there are four distinct 6-11 seed match-ups each year there’s only a 19% chance that all 6-seeds will win their first round games.  In fact, the most likely scenario is that just one 6-seed will upset a 11 seed.  This year there were two 6-11 upsets which is the second most likely scenario at 30% (still more likely than not getting any upsets).

The following table depicts the probability of different scenarios for each first round seeding combination.  All the green area on the table is why everyone’s brackets bust every year.  Keep reading if you are interested in the math, otherwise you might want to bounce, because it’s gonna get boring.

Still here?  Ok.  The basis for determining the probability of the upset scenario is the binomial distribution.  A binomial distribution requires two things, a binary outcome (hence the bi- prefix)  and a set probability of how that outcome is achieved.  The simplest example of a binomial distribution is determining the probability of successive coin flips.  The probability function is given as $P(X) = (^n_k) p^k q^{1-k}$

The $(^n_k)$ term is the combination of n terms taken k-at a time. $p$ is the probability of the event happening — the win probability $q$ is the compliment of the event so in this cause it would be probability of losing $n$ will be 4 since there are four games for a seed match up $k$ will be 0-4 depending on how many upsets we are looking for.

Looking at the probability that two (and only two) 11-seeds upset 6-seeds that will be $P(X) = (^4_2) (.34)^2 (1-.34)^2 = 6 * .34^2 * .66^2 = .302 = 30$%

You can derive this equation by writing out probability trees (if you remember those from high school math).  The problem with that method is that for each outcome (# of upsets = [0, 1, 2, 3, 4]) you have to write out the different combinations of games for each outcome.   This can get unwieldy quickly.  Binomial distributions can be used for many different applications, including the aforementioned coin-flip, likelihood of combinations of boy/girl babies, the probability that the ‘better’ team loses a 7-game playoff series, the likely number of winners for the lottery…so this will rear it’s head again for NHL, NBA, or MLB playoffs.