Machine Learning on the football field: Predicting match performance through GPS data

No Thumbnail Available

Date

2025-10-09

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This master thesis evaluates machine learning models for predicting football match performance from GPS-based practice data. The growing use of GPS-based wearable tracking in professional football underscores the need for approaches that transform large datasets into practical insights for teams, while also contributing new methods and strategies to the scientific literature. Random Forests (RF), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN) are tested on targets representing aerobic capacity (endurance in sustained activity), anaerobic capacity (ability to perform high-intensity efforts without oxygen), and explosiveness (short bursts of speed and power), using both non-overlapping rolling and adaptive feature windows. Three prediction strategies are compared: row-to-row, where each input window is paired with its corresponding target window; all-input-to-row, where the entire input sequence is used to predict each target row independently; and all-inputplus- previous-row to-row, which extends the second strategy by incorporating the previous target as an additional input. The results show that all models outperform linear regression in the last two strategies, with RF performing best for aerobic and anaerobic metrics and CNN and RNN for explosiveness. RF also provides feature importance scores, indicating that linear acceleration from the practice day immediately preceding the match is the strongest predictor. Angular velocity and angular jerk from the fourth and third practice days before the match also emerge as key factors, suggesting that strenuous training loads in the days leading up to competition may play a decisive role in match performance. CNN and RNN, in contrast, function as black-box models and do not directly provide interpretable insights into the relative importance of input features. Regarding windowing techniques, adaptive windowing reduces the Root Mean Squared Error (RMSE), which highlights a possible gain from moving into this approach for sport analysis. These findings offer practical insights for sports training and show how machine learning can turn wearable sensor data into useful performance metrics.

Description

Keywords

Data science, football, performance, machine learning, thesis

Citation

Collections