At the Pirates, we like to try to guess what the pitch count will be for the Pirates’ starting pitcher. In honor of opening day today, I present a cheat sheet!

The graphs might be a little bit overkill, but it’s cool all the different ways you can visualize the this simple data. The number of pitches is distributed normally with a skew left. This skew occurs because there are instances when the pitcher has a bad day and gets pulled really early. To account for this, I excluded any outing that didn’t have more than 50 pitches. We will consider these as rare events, which we shouldn’t try to use in our prediction. The idea of the game is to hit the exact pitch count, and this would preclude a rare event from being factored in. I also used the median number of pitches instead of the average number of pitches for the same reason. We want to consistently pick numbers which are the most likely to get hit, not to try to predict every game.

The idea of using the median over the mean is important when there is a skew to the normal distribution of the data. This is important for something like income. There is a huge skew for incomes across the entire US population since there are so few people that make outrageous amounts of money. The mean of incomes will be much higher than the median of incomes. The median will be much more representative of the central tendency of the data.

So applying this to the pitch count, the short outings are rare and the mean doesn’t represent the most probable outcome for the day.

Happy Opening Day!