An Analysis of an Alternative Pythagorean Expected Win Percentage Model

Applications Using Major League Baseball Team Quality Simulations

Authors

  • Justin Ehrlich Syracuse University
  • Christopher Boudreaux Florida Atlantic University
  • James Boudreau Kennesaw State University
  • Shane Sanders Syracuse University

Keywords:

Simulation-based statistical inference, Labor Productivity, Expected Wins, Major League Baseball, Pythagorean Model, Contest Success Function

Abstract

Abstract

Background. Contests are games in which the players compete for a prize and exert effort to increase their probability of winning. For sport contests, analysts often use the Pythagorean expected win percentage model to estimate teams’ expected wins (quality). We ask if there are alternative contest models that minimize error or information loss from misspecification and outperform the Pythagorean model.

Aim. This article aims to use simulated data to select the optimal expected win percentage model among the choice of relevant alternatives. The choices include the traditional Pythagorean model and the difference-form contest success function (CSF).

Method. We simulate 1,000 iterations of the 2014 MLB season for the purpose of estimating and analyzing alternative models of expected win percentage (team quality). We use the open-source, Strategic Baseball Simulator and develop an AutoHotKey script that programmatically executes the SBS application, chooses the correct settings for the 2014 season, enters a unique ID for the simulation data file, and iterates these steps 1,000 times. We estimate expected win percentage using the traditional Pythagorean model, as well as the difference-form CSF model that is used in game theory and public choice economics. Each model is estimated while accounting for fixed (team) effects.

Result. We find that the difference-form CSF model outperforms the traditional Pythagorean model in terms of explanatory power and in terms of misspecification-based information loss as estimated by the Akaike Information Criterion. Through parametric estimation, we further confirm that the simulator yields realistic statistical outcomes.

Conclusion. The simulation methodology offers the advantage of greatly improved sample size. As the season is held constant, our simulation-based statistical inference also allows for estimation and model comparison without the (time series) issue of non-stationarity. The results suggest that improved win (productivity) estimation can be achieved through alternative CSF specifications.

Downloads

Published

2021-01-29