Application of the Pythagorean Expected Wins Percentage and Cross-Validation Methods in Estimating Team Quality

Authors

  • Christopher Boudreaux
  • Justin Ehrlich
  • Shankar Ghimire Western Illinois University
  • Shane Sanders

Keywords:

Expected wins estimation, expected output, quality estimation, labor contest, sports analytics, cross-validation

Abstract

The Pythagorean Expected Wins Percentage Model was developed by Bill James (1980) to estimate a baseball team’s expected wins percentage (as distinct from the team’s actual wins percentage) over the course of a season. As such, the model can be used to assess how lucky or unfortunate a team was over the course of a season (actual wins – expected wins). From a sports analytics perspective, such information is valuable in that it is important to understand how reproducible a given result may be in the next time period.  In contest-theoretic (game-theoretic) parlance, James’ original model represents a (restricted) Tullock contest success function (CSF). We transform, estimate, and compare James’ original model and two alternative models from contest theory—the serial and difference-form CSFs—using MLB team win data (2003-2015) and perform a cross-validation exercise to test the accuracy of the alternative models. The serial CSF estimator dramatically improves wins estimation (reduces root mean squared error) compared to James’ original model, an optimized version of James’ model, or an optimized difference-form model.  We conclude that the serial CSF model of wins estimation substantially improves estimates of team quality, on average.   The work provides a real-world test of alternative contest forms. 

Downloads

Published

2021-03-31