Some Thoughts On Data And Data Sources For TJS Analysis

Some Thoughts On Data And Data Sources For TJS Analysis

By: Jay Burke

I’m planning a series of posts that have to do with where I get data and some of the things I’ve done with it to come to various conclusions.  You may find some of these more interesting than others but I believe it is important to share thought processes and sources so that others may dive in to the depth of their level of interest.

In this post I’m planning on discussing where I found data about Tommy John Surgery (TJS) at the MLB level and what I did with it draw some conclusions about pre-surgery and post-surgery pitching performance.  For this study I needed to decide upon what I felt was the best measure of a pitcher’s performance in order to objectively look at pre-and-post-surgery.  I’ve chosen the baseball statistic called “Wins Above Replacement” (WAR).

WAR may not be a very familiar statistic.  It’s a more abstract measure and less tangible than wins, strikeouts, velocity, and statistics that are more ‘real world’.  A simple explanation is that WAR represents how valuable a player was to the team for a season.  It is the same measure for a hitter or a pitcher.  It considers all of the components of success, and does so in context, because it is a value over a “replacement level player”, or essentially how well a player did vs. the next guy on the depth chart.  WAR is an accumulation that (usually) grows as the year continues, and players typically end up with a score between 0 and 10.  On very rare occasions a player receives a score higher than 10.  Aaron Judge in 2022 received a 10.6 WAR. The words “for a season” are important.  When an ace pitcher has 200 strikeouts on a season, how good is that compared to the guy behind him on the depth chart?  If that replacement would have struck out five (5) batters, then the ace pitcher is more valuable than if the replacement would have struck out 50 batters.  An effect of this type of measure is that it adjusts for the quality of the competition.

For a starting point I needed a list of players who have had TJS.  I’d like to thank @MLBPlayerAnalys for their meticulous curation of the list of players who have had TJS. @MLNPlayerAnalys is their twitter handle, and they keep the list on Google Drive: https://docs.google.com/spreadsheets/d/1gQujXQQGOVNaiuwSN680Hq-FDVsCwvN-3AazykOBON0/edit#gid=0

This list includes players official MLB ID numbers and FanGraphs ID which makes merging data from multiple sources so much easier.  Even so, websites typically have different spellings, full names / nicknames, use of hyphens for foreign players, use of “Jr.”, etc.  It can be painfully slow and a manual process to try and merge data between different sources by name.

I wanted to merge this list of players who have had TJS with data from Baseball Reference and, unfortunately, the Baseball Reference site uses their own different player IDs.  Luckily I discovered another list on Baseball Reference that shows the current yearly WAR for every pitcher (all time), updated for active pitchers daily during the season: https://www.baseball-reference.com/data/war_daily_pitch.txt

As of the end of the 2022 season, this file provides 50,000+ pitcher/year examples of bWAR, which is Baseball Reference’s formula of WAR.  And using this file I was able to link Baseball Reference’s ID to MLB’s ID.  This link is the Rosetta Stone that allowed me to merge together TJS data and the WAR data on Baseball Reference.

I now had the pre/post TJS data to look at an MLB pitcher’s performance before and after surgery, and to compare the performance data to “comparable” players who were uninjured as a control.

So we are looking at the stats of players before they had TJS and comparing that data to their statistics after they return to MLB.  It is important to control for how the game changes from year to year.  A pitcher who has 200 strikeouts in a year before he had surgery, in a year where the average replacement level pitcher would have 50 strikeouts, cannot be said to have had the same performance 2-years later when he returns and has 200 strikeouts but the average replacement player would have five (5) strikeouts.  A reason I chose WAR for this study is that it provides a level of “indexing”: so when a player gets a six (6) WAR before the surgery and a six (6) WAR after he returns, it better represents that he is the same pitcher after the surgery.  This is because we controlled for the changing environment of the MLB.

The merged data of 50,000+ players, controlled for comparable players and for the changing environment of MLB from a player-personnel standpoint of replacements, allowed me to start drawing some conclusions about TJS and its effect upon performance and player value for the best players in the world.  These conclusions will be shared in my other related blog posts.  Hope you enjoy.

Jay Burke

Jay Burke has worked in the computer industry for 20+ years as a data modeler and analyst of the highest caliber. He obtained his Master of Science degree in computer science from the Illinois Institute of Technology (IIT) with a concentration in artificial intelligence expert systems. His work includes doing large scale simulations at McDonalds plus similar efforts involving military transportation for the United States Army (MTMCTEA) at Argonne National Laboratory. Jay is also a huge baseball and saber metrics fan who has held season tickets with the Chicago White Sox for his entire adult life.
View All Posts