Sarah Bailey

Forecasting Batting Averages in MLB

We consider new baseball data from Statcast which includes launch angle, launch velocity, and hit distance for batted balls inMajor League Baseball during the 2015, and 2016 seasons.Using logistic regression, we train two models on 2015 data to get the probability that a player will get a hit on each of their 2015 at-bats. For each player we sum these predictions and divide by their total at bats to predict their 2016 batting average. We then use linear regression, which expresses 2016 actual batting averages as a linear combination of 2016 Statcast predictions and 2016 PECOTA predictions. When using this procedure to obtain 2017 predictions, we find that the combined prediction performs better than PECOTA. This information may be used to make better predictions of batting averages for future seasons.

Keywords: Batting Average, MLB, Logistic Regression, Big Data, Forecasting