Can A Player’s Arsenal Predict Their xFIP?
A few weeks ago, I was looking into a minor league player and was curious about how he would perform if he was put in the majors today. Since that player had a limited amount of innings last year, projecting his performance based on his stats was not plausible. Thus, I had to look into other routes. That’s how I came up with the idea of projecting a player’s performance based off his stuff. A player’s stuff becomes more reliable in quicker sample sizes, so just having a few games of data would allow us to make a projection.
Having this tool would be very helpful at the MLB level, but it’s also a tool that would have value in the minors or college levels. We may have stuff data with limited stats and be curious to see how that’d translate to actual performance.
Originally I used a player’s average velo, spin, break, etc. for each pitch type to project their SwStr% on each pitch. However, in order to improve my predictions, I decided to use play by play data from 2015-2019 scraped from baseball savant. For various pitch types I used a lasso regression, a type of regression that helps to find important variables, in order to predict whether or not a pitch was a swinging strike based on the movement, location, velocity, extension and release point of the pitch. Location, movement and velocity metrics allow us to account for not just how good the pitch is but also how well the pitcher commands it.
I then took the projected SwStr% for each pitch type and found a player’s best fastball (from the options of a four seamer, two seamer and cutter), breaking ball (from a slider or curveball) and changeup. Best was determined as the pitch that had the highest predicted swinging strike rate, assuming it was a pitch they threw often enough.
I then took players who had at least 40 innings pitched, a fastball, breaking ball and changeup and modeled their xFIP (park adjusted FIP) based on the average predictions for those pitches. This ended up being a total of 1,053 players from 2015-2019. I also modeled their xFIP based on how often they threw those pitches and included their BB% in some models to improve xFIP projections.
The best model ended up being the one that took into account how often a player threw their fastball, breaking ball and changeup, as well as the predicted SwStr% of those pitches and their walk rate. That model had a 0.64 RMSE, 0.51 MAE and a 0.28 R^2. The model considering only usage and each pitch type’s predicted SwStr% had a 0.60 RMSE, 0.47 MAE and 0.18 R^2. In no way is this perfect, but it is a helpful tool to use when not having other information or when considering data that is based on more of a pitcher’s “raw talent“, which would be their stuff and command.
Once I had a projected xFIP for each player, I gave that player scouting grades for each pitch based on the projected SwStr% for the pitch and on their projected xFIP which I called their “present value” or the value of their arsenal depending on the model. This is done by simply using the mean and standard deviations of a pitch type’s projection to scale out its grade. For example, if the projected SwStr% or xFIP is one standard deviation above the mean, then it is a 60 on the traditional 20-80 scouting scale. Scaling the projected number onto this universal rating system gives people a better idea of how good the pitch really is. It is a lot easier to say that a pitch is a 50 compared to saying it has a 11.2 mean predicted SwStr%.
Going forward this model could be improved by accounting for the opposing batter handedness based on the pitcher handedness. The model also doesn’t consider many factors that may determine how well a pitch performs, such as sequencing, how well their pitches play off each other and more. I also could look into creating separate models for starters and relievers to see how projections change and if the importance of predictors change based on the role. There also is a bit of selection bias here, since many pitchers don’t throw 40 innings and have three pitches. Individual pitch types also may still need to be tinkered with in order to get better individual grades. On the whole though I was happy to see that the arsenal grades were able to project xFIP and I think that this could be a useful way of looking at a player when faced with limited performance data.
After I trained my models using a train/test split, I went back and used the full data set to train and predicted on that set to give everyone an arsenal grade. Those grades for pitchers from 2015-2019 can be viewed here: https://docs.google.com/spreadsheets/d/18r3zJKb5sviA11YZc_iydI6mFSP8LnejB2UoKhyK_JU/edit - gid=0
The next article in this series will look into how well these arsenal scores predict the next year’s xFIP, while showing predicted 2020 xFIPs. After that we’ll look at some of the biggest differences between predicted and actual xFIP. Again, this will all be for people who threw 3 pitches, a fastball, breaking ball and changeup. Finally, we’ll look at what predictors the models found relevant for each pitch type and overall arsenal scores and if we can draw any conclusions from these results.
I also owe a big thank you to Jacob Buffa for the amazing help and guidance he’s provided me. He always gives me great detailed answers on any questions I have and I also owe a thank you to his company EBA for publishing this series of articles.
Author: Kaivon Steinle