What Factors Best Predict Swinging Strike %

Today we’ll be looking at what variables help to predict SwStr% for each pitch type. Determining impactful predictors can be done by looking at the coefficients of the model inputs. For each pitch type we can see what our regressions found as impactful. Every model went pitch by pitch to try and predict the probability of getting a swinging strike. For each model, release extension (release_extension), release height (release_pos_z), velocity (release_speed), vertical break (pfx_z), horizontal break (abs_pfx_x) , vertical location (plate_z) and horizontal location (plate_x) were considered. Horizontal movement is defined as the absolute value of the movement so both righties and lefties can be considered.

Four Seam Fastballs

This picture shows the importance of each predictor on the probability of a four seam fastball being a strike. We see that vertical break (pfx_z) and vertical location (plate_z) of the fastball are the most important predictors of a fastball being a strike. I don’t think this would be anything new to anyone. Also interesting is that vertical release point (release_pos_z) has a noteworthy negative coefficient, meaning it is better to release the ball from a lower angle. I don’t believe this is exactly approach angle, but it is good to see that this matters as the public’s viewpoint of approach angle being important continues to rise. Horizontal break (abs_pfx_x) also seems to have a negative impact on predicting if a fastball is a swinging strike.

Change Ups

Changeups don’t have as defined of coefficients as fastballs. Overall we see it is best to locate the ball down as evidenced by the negative coefficient on plate_z. The next most important coefficients seem to be having little vertical break, as shown by a negative coefficient on pfx_z and also having some side to side movement, as showing by a positive coefficient on abs_pfx_x.

Sliders

Sliders seem to be even trickier to evaluate. We know we want a downward component to them, as shown by a negative coefficient on pfx_z and we want to throw them low in the zone. Since I didn’t split this data up into lefties and righties, plate_x is pretty neutral. I would assume though that if I’d considered horizontal location based on handedness that having a location far away from the batter would be desired. Surprisingly velocity and horizontal movement didn’t have as much of an impact here as I would have expected

Curveballs

For curveballs, throwing the balls down (plate_z) seems to be the most important thing that can be done. Some horizontal break seems to help too. I’m really surprised not to see vertical break (pfx_z) and release speed not playing a bigger impact here. I’ll get into why that and other surprises may exist in these pitch types in a second.

Going back to the data I used, I took all MLB data from 2015-2019 and modeled the probability of a pitch type being called a strike. There a few problems that could arise with this that affect variable importance. First off, the models don’t consider previous pitches or the situation. Second, it is possible some pitches aren’t labeled properly or that tracking data has changed slightly over the years and so I may be better off making models for each season.

I also think it’s possible that using just MLB data has some clear bias to it. Everyone knows velocity is the most important aspect of a fastball, but when looking at MLB data, pretty much every fastball has a velocity people strive for. Certainly, many fastballs beat others velocity wise, but it is at least possible that the difference among pitchers at the MLB level isn’t the ability to throw the ball hard but is more the ability to locate a good pitch. Those who don’t throw the ball hard won’t be called up anyways. The same could be said of movement numbers. Those who don’t have good movement or velocity on a breaking ball don’t throw it at this level, so the differences may not be as pronounced as they should be. So just because velocity on a fastball may not be the biggest differentiator, it doesn’t mean you can get away with poor velocity in the majors and be a complete command artist. It may just mean that those guys don’t make it to the majors in the first place.

Another possibility is that velocity relates strongly to movement, so the model discredits velocity, but it still is important when training or designing a pitch.

One other possibility to improve pitch type grades could be to take a certain amount or percent of a pitch’s top predicted outcomes and define that as the skill of the pitcher. I also could start to account for pitch count if I don’t split the data by starters and relievers. This could allow starters to be penalized less for throwing pitches that may be thrown later in games and it could be a way of including stamina in an arsenal.

In the next article we’ll look at what parts of a pitcher’s arsenal were most impactful when predicting their xFIP.

What Factors Best Predict Swinging Strike %

Four Seam Fastballs

Change Ups

Sliders

Curveballs

Can We Simplify Blast Motion?

Where Does Our Arsenal Grade Miss the Most?