Expected Points Generated

I have yet to find a good statistic to represent a soccer player’s value to their team. Soccer has no parallel to baseball’s Wins above Replacement (WAR) stat, which makes sense when you consider all the minute actions within a soccer game that make it hard to distill the ability and production of a player into a single number using a few key discrete metrics. Expected Goals is a good predictor of success, but only shows the expected scoring for a player, not how their performance contributes to their team’s performance.

With this in mind, I propose a different statistic: Expected Points Generated (EPG). EPG isn’t a new concept; other outlets have created Expected Points. I have neither the extensive data models nor the statistics knowledge needed to back this idea with historical data and proven statistic distributions, so I decided to combine a few different soccer analytics resources in order to produce my own representation of player value. This formula isn’t perfect, but I think it gives a good representation of how much of an effect a player has on a team’s performance.

Resources

Methodology

First, calculate OCxG, each player’s offensive contribution to their team every game, and map this vector to the range 0 to 1:

$$ \mathsf{PxG} = \text{xGoals & xAssists} $$ $$ \mathsf{TxG} = \text{Team xGoals For} $$ $$ \mathsf{PSPxG} = \text{Player Successful Passes %} $$ $$ \mathsf{PPxG} = \text{Player Passes per Game} $$ $$ \mathsf{TSPxG} = \text{Team Successful Passes per Game} $$ $$ \mathsf{PTPxG} = \text{Player Touch Percent per Game} $$ $$ \mathsf{GA} = \text{Actual Goals + Assists} $$ $$ \mathsf{GF} = \text{Actual Goals For} $$ $$ \mathsf{OF} = \text{Offensive Coefficient} $$ $$ \mathsf{OF} = \frac{GA}{GF} $$ $$ \mathsf{OCxG} = (\frac{PxG}{TxG} + \frac{(\frac{PSPxG}{100}) \times PPxG} {TSPxG} + \frac{PTPxG}{100} + \mathsf{OF}) $$ $$ \mathsf{mapOCxG} = \frac{OCxG - minOCxG}{maxOCxG - minOCxG} $$

Next, calculate DCxG, each player’s defensive contribution to their team every game, and map this vector to the range 0 to 1:

$$ \mathsf{Tkl} = \text{Tackles} $$ $$ \mathsf{TTkl} = \text{Team Total Tackles} $$ $$ \mathsf{Int} = \text{Interceptions} $$ $$ \mathsf{TInt} = \text{Team Total Interceptions} $$ $$ \mathsf{Off} = \text{Offsides} $$ $$ \mathsf{TOff} = \text{Team Total Offsides} $$ $$ \mathsf{Clr} = \text{Clearances} $$ $$ \mathsf{TClr} = \text{Team Total Clearances} $$ $$ \mathsf{Blk} = \text{Blocks} $$ $$ \mathsf{TBlk} = \text{Team Total Blocks} $$ $$ \mathsf{PlyrDefAvg} = \frac{Tkl + Int + Off + Clr + Blk}{5} $$ $$ \mathsf{TeamDefAvg} = \frac{TTkl + TInt + TOff + TClr + TBlk}{5} $$ $$ \mathsf{DCxG} = \frac{PlyrDefAvg - TeamDefAvg}{TeamDefAvg} $$ $$ \mathsf{mapDCxG} = \frac{DCxG - minDCxG}{maxDCxG - minDCxG} $$

Finally, calculate AppWeight, the proportion of possible minutes the player has played. Like ASA, I use an estimated game length of 96 minutes.

$$ \mathsf{App} = \text{Appearances (including as Sub)} $$ $$ \mathsf{Min} = \text{Played Minutes} $$ $$ \mathsf{AppWeight} = \frac{Min}{App * 96} $$

Now that you have the player’s OCxG, DCxG, and AppWeight, you can use the MLS match outcome probabilities generated by ASA…

$$ \mathsf{ProbWin = 0.483} $$ $$ \mathsf{ProbDraw = 0.281} $$

…to create the formula below:

$$ \mathsf{EPG} = [(3 \times \mathsf{ProbWin}) + (1 \times \mathsf{ProbDraw})] \times \mathsf{((mapOCxG + mapDCxG) * AppWeight)} $$

Examples

With version 1.6, I’ve written an R script for EPG you can try out to generate EPG for each team. Try it out!

Conclusion

Like I said before, EPG isn’t perfect. It’s likely that in some cases, players’ contributions to their team are underrepresented in their EPG rating. In the future, I hope to back this formula with hard data and make adjustments that reveal how non-scoring contributions affect a team’s performance, turning EPG into the soccer equivalent of WAR and a good estimator of a player’s value to their team.

Version 1.6 - Updated 5/30/2017.

Data provided by WhoScored and American Soccer Analysis. Thanks to them for being very transparent in how their stats are constructed and for providing advanced stats to the public for free.