Category Archives: all articles

Pirates 2014 — Take Your Finger Off The Panic Button

Pirates Panic Button


The Pirates did really, really well last year. They won 94 games, the NLWCG, and took the Cardinals to 5 games in the NLDS. Expectations for right or wrong reasons have been raised for the following year. With April coming to a close the Pirates are looking at a 9-15 record. I’ve seen a lot of criticism about the offense, Jason Grilli blowing saves, and Gregory Polanco not being called up. I’m not an expert on talent development, so I can’t fully address the pros and cons about calling up Polanco. I can say that just like all the criticisms at trade deadlines, one player can’t save a team. Let’s say Polanco helped them win 2 games in April beyond what Tabata/Snider could do. The Pirates would be sitting at a 11-13 record. Now what? You are still sitting in 3rd place in the NL Central. Now, here’s the other side, what if he doesn’t help much? You mess with player development and long term plans because you panicked over a month of baseball. I’m fine with Polanco getting 2+ months in AAA before being called up. McCutchen got a year and two months.

Now about those Pirates. How bad are they? They have pretty much the same team coming back this year from the team that won 94 games last year. They lost Burnett, who was stellar last season, and they had Marlon Byrd down the stretch who made a big impact in September. Other than those losses, the Pirates have the exact same team.

The basis of my analysis is boiled down to this: the Pirates weren’t really as good as you thought they were last year, and they aren’t nearly as bad you thought they are now. Why do I think this? Numbers! I used the current 2014 numbers compared to 2013 overall, and the first two months of the 2013 season. June, July, August, and September numbers are really good for the Pirates. I’m going to point out that bad months can happen.

Pirates Year to Year 2014

Compared to last year, this offense right now is not as good no matter what sample of 2013 you look at. But here’s the take away…the Pirates had a very mediocre offense all year in 2013. They have a below average offense right now. Based on past performances, the Pirates will regress upward toward where they were in 2013. So the bigger problem for the organization as a whole is that they have a mediocre offense. This is a long term problem, not a short term aberration happening right now. Long term problems require better solutions than knee jerk reactions.

My assessment of the Pirates is that their poor April performance, is part luck, part poor hitting, and mostly regressed pitching. The pitchers pitched out of their mind last year. Jeff Locke went to the All-Star game. Things were crazy. Grilli isn’t going to Mariano Rivera every year. (Especially because he didn’t touch Rivera-type numbers till his 12th year in MLB.) Even AJ Burnett isn’t pitching as well for the Phillies as he did for the Pirates in 2012/2013. The Pirates’ FIP has been below average instead of a stellar like it was for most of 2013.

Everyone keep their fingers off the panic buttons, things are going to be alright.





Data is pulled from

Text By The Hour

Text Message Analytics — Numbers

People communicate a lot through text messages, and lucky for me iPhones keep track of those text messages I’ve sent. iPhones store your text messages in a SQLite database, and this database is readily accessible in your iPhone backup on your computer. [This is why encrypting your backup might be a good idea if you have sensitive data.] I want to eventually perform some advanced text analytics to try to interpret the content of the text message. This post is only going to look at the ‘numbers’ aspect of my text messages. All the numbers on the following pages include both sent and received texts, and excludes texts that I either deleted or where deleted by the system. [I know I’ve deleted threads. I don’t think iOS deletes old messages, but it’s a possibility till I know otherwise.]

The most simple stat from text messages is how many have I sent/received per day or per week. The chart below has both. The notable trend is that there has been more text messages sent/received the longer I’ve had my iPhones. I’d suggest this is a little biases since I would be more likely to delete text threads that are much older, but I still think there would be the slight trend upwards regardless.


Text Message Trends


I wanted to look at area codes just out of curiosity. I thought that I would have the most texts between me and a 412 or 724 number. I’m a little surprised how many 412 numbers there are given how many people I know living in the Pittsburgh suburbs and I’m a 724. I think 412 is Allegheny County, while 724 is anything outside of that. I’m little surprised how close traditional SMS text messages, which go through your carrier network opposed over the Internet, since most of my friends have iPhones.



Text Area Code


The last chart is my favorite, a breakdown of how often I text for each hour of the day. I think this tells you something about my behavior, albeit nothing common sense won’t tell you. I generally text earlier in the morning (7am to 10am) more often during the work week compared to the weekend. There’s a spike at 12PM (lunch time) and 9PM (making plans/socializing) for any day of the week. There’s virtually no texting between 4am and 6am. There are some texts that occur after the ‘Ted-Mobsy-Hour’ of 2am, where nothing good happens after that time, but not a spike like there might have been in college.


Text By The Hour




‘Kids, if it’s after 2am, don’t text, just go home….and watch How I Met Your Mother.’

Morton PitchFX April 2014

Charlie Morton — PitchFX

I’m in a predictive modeling class for my grad program at NU, and we are learning a statistical programming language called SAS. One of the things we are trying early on is cluster analysis to determine if variables are related. I decided to play around with data that’s a little more interesting than housing prices. Charlie Morton has been on of my favorite pitchers to watch pitch. His curveball is just sexy. Cluster analysis can help us separate Morton’s pitches into different pitch types using PitchFX data I’ve been scraping.

I’ve plotted two charts, one is the vertical movement vs. the release speed. The second is the vertical movement vs the horizontal movement. [The movement parameters are calculated from the deviation of the ball from a straight path with no spin. And the horizontal movement is from the perspective of the catcher/batter. So imagine that Morton is throwing toward you.] So fastballs with backspin will have a positive vertical movement. Curveballs with top spin will have negative vertical movement. I used SAS to look at the speed, vertical, and horizontal movement and cluster similar pitches together. Without much tweaking, I was able to identify Morton’s fastballs and curveballs. He also has a third group which is a splitter according to

Morton PitchFX April 2014

Morton PitchFX April 2014

Morton is famous for his sinker, which is a two-seam fastball that ‘sinks’ relative to a four-seam fastball thrown at the same angle. I’ve annotated the sinker on the vertical movement to release speed chart below. Morton’s sinker is hard to differentiate because it’s almost as fast as his four-seamer. (low-90s) It doesn’t stay as high due to the different spin compared to the four-seam fastball. The advantage here is that a batter will swing as to hit the four-seam fastball, but the sinker will be an inch or two lower than what the batter adjusted for. Since the bat is round, the ball will come off the bat at a low angle, and bam! Ground ball.

Morton PitchFX April 2014 Annotated has updated and historical PitchFX data presented very nicely. I suggest checking them out if you want to see visualizations like this for other games or pitchers. Their visualization tools are easy to use and updated right after games end.

#SeanTrek GeoTracks

#SeanTrek GeoTracks 2012

#SeanTrek GeoTracks

You might remember #SeanTrek — the 46 day, 12,000 mile, 34 state excursion I took back at the very end of 2012. I didn’t know what I how I was going to use this at the time, but I geotagged just about everything I did on the trip. I checked-in to every place on Foursquare and obtained over 700 points in Portland and San Francisco, which is insane because I checked in just about everything I did or place I went. On top of Foursquare I geotagged every tweet I sent and picture I took. This resulted in me now having thousands of data points of both timestamps and location data.

The above map is what happens when you put all of them together. It outlines my entire trip! The more dense the marks the more I was in one place longer exploring it. Sparse points means I was driving a lot. You’ll find a lot of marks around Pittsburgh, Portland, SF, LA, Austin, and New Orleans, because I spent the most time there and didn’t drive much in most of those cities. I have a rather nice record of a long trip that didn’t require me to painstakingly record exactly what I did.

This map only has geotag data and the type of media. I’m hoping to use the geotag data and the timestamp to get an average speed between the two points. I also want to geocode some tweets or photos that were not geocoded in 2012 by interpolating using the timestamp now.

Once I properly extract the data from the tweets, I can have hashtags or mentions searchable by frequency and location. I used #SeanTrek a lot more than any other hashtag on the trip. Though curiously enough the first tweet mentioning #SeanTrek is not geotagged. (technical glitch) Hopefully, I’ll get some more things mapped out in the future.

Pirates — Run Probability

Presented without much commentary or analysis. This is how the Pirates fared last year given a certain number of outs and with runners on specific bases. So for example with no ones and nobody on base the Pirates had a 26% chance of a scoring a run from that point in the inning on till the end. So that would score a run once in about every 4 innings. The stat I always reference is bases loaded and no outs. It should be the highest, for the Pirates, it’s not. Runners on 2nd and 3rd with no outs is the highest.

For a point of comparison the black reference lines on the bar graph are the MLB average for that specific base-out state.

Pirates Run Probability