Today, xG stats is known as one of the most' advanced' metrics in football analysis. Everyone seems to have their version of xG nowadays. This has brought us to the question; how good are the xG metrics? Many football geeks have been evaluating tons of models to find out the answer, and all seem to show unexpected results, which may hint at the flaws or accuracy of most xG models used today.

The Models for Calculating xG

People have been using multiple methods in calculating the accuracy of xG models. This includes using the classic machine learning application to RMSEP; a model used to evaluate the differences between predictions and outcomes (goals) and many others.

Generally, these models try to predict the shot's outcome based on its properties; this includes the location, build-up, and body part.

The models that calculate xG nowadays can show up to 95% accuracy, which may sound good at first, but it still depends on the context of the model's data.

The data for shots and goals are highly class-imbalanced, which means a large disparity between the number of positive (goal) and negative labels (no goal). This means that even the perfect model may not be accurate, as the model is based on different values.

Precision and Recall

Accuracy alone doesn't offer a good enough sense of how good a model is. Another thing that's worth looking at is the precision (the correct proportion of positive identification) and recall (what proportion of real positives was correctly identified?).

A metric shows that out of a tested 1023 goals, only 127 were predicted correctly (12% recall) and only predicted 58% of the goals correctly.

Skill vs Luck

It's clear that xG models still struggle in deciding if individual shots will result in a goal.

There is still a variety of missing information that may cause the inaccuracy of a model. We'll categorize this missing information into three parts:

a. Information to proxy with easy event data:

  • Shot location
  • Body part (head or foot)
  • Build up (penalty, cross, fast break)

b. Information that requires more sophisticated (tracking or event) data:

  • Goalkeeper's position
  • Defender's position
  • Footedness of shooter
  • Defender's velocity and direction

c. Information that is currently not possible to observe and collect

  • Precise angle and location of the body part that strikes the ball
  • Goalkeeper's reaction time
  • Wind pressure
  • Air pressure of the ball
  • And many others (you get the idea)

An xG-model that only uses the first category may be considered naïve and one that uses both the first and second category may be as sophisticated. A model that can manage to use all three categories is the goal for all xG enthusiasts, as it's likely to accurately predict whether a shot will turn into a goal or not.

An alternative way you can look at this is by classifying the A and B categories as repeatable properties of shots and skill, while the third category is something unrepeatable or luck. One purpose of analytics and gathering data is to expand the B category at the cost of category C. Knowing repeatable skills lets you improve your xG-model and make better decisions.

The Big Chance Variable

This variable is recorded by OPTA, where the Big Chance factor consists of 1 shot to have a big chance and a 0 otherwise. The coders then decide if the shot was a 'Big Chance' or not and double-check the decisions thoroughly.

Most xG models use this variable due to its use as a proxy for defensive pressure. As there is no solid tracking data for most leagues, this might be to fill the gap of missing information. For example, 'Big Chance' might help us track how many players there are between the ball and the goal.

These sound like essential things you need to calculate xG. However, it's likely that the OPTA coders fall for the 'outcome bias.' An Outcome bias is an error made when evaluating the chance's quality. So, when a player converts a chance, coders will likely note it as a 'Big Chance.' On the other hand, once a player messes up, they might consider it a rather tricky chance and won't give the 'Big Chance' label.

This shows that the 'Big Chance' metric includes post-shot info and can be roughly compared to creating a model that only uses 'shots on target.' These models are guaranteed to perform better than an 'all shots' model, as they have more information. Some models may use the 'Big Chance' metric more than others.

Conclusion of the Accuration of xG Stats

We have gathered enough information to conclude that the current xG-models aren't suitable yet to predict the outcome of individual shots (or group shots) by highlighting the majority of features not included in basic xG-models.

xG-estimates for group shots are still averagely useful. By improving the models, we may reduce the variance of xG-estimates to converge into a hypothetical, true xG-value quickly.