Background
Football is among the most popular games in the world, with more than 80 leagues played across the world. The English Premier League is one of the most ferociously contested leagues in the world, and of late, has also made the fans of the game competitive too. Fans can now, on a multitude of platforms, select their favorite players, who they think will perform well during a certain game week. Our aim is to predict the performance of each player in a game given the past performance over several matches. Using a dataset[1] of English Premier League matches over four seasons that have player statistics for each game, we intend to train a sequence model to predict their performance in the next match (a single numeric rating) given the performance of every player in the team over the last n matches. In addition, we want to use an unsupervised learning based clustering approach to group players by position and skill.
Methods
Supervised Learning
Given sets of N previous games for a particular player, predict the performance in the (N+1)th game.
- Using Sequential Neural Models like LSTM/GRU (PyTorch)[2],[3],[4]
- Use a Hidden Markov Model
Unsupervised Learning
-
Clustering the players according to their features to the different player groups (Goalkeeper, Centre Back (CB), Left Midfielder (LM), Forward (F) etc.)
- Dimensionality reduction using PCA and feature engineering for combining certain features at various scales
- K-means/ Bayesian / Fuzzy c-mean/ Hierarchical Clustering
-
Replacing existing players in a team from a pool of other players (basically finding the best fit) - basically a ranking problem (Stretch Goal)
Results
- We would have models that given a players data across the past $N$ matches, would be able to predict his/her performance on the $(N+1)^{th}$ match with regards to the players played with and the teams.
- We would have models that given a set of players and their previous matches, would be able to group them according to their position type. The model should be able to learn and prioritize certain features over others to cluster players based on their best possible position on the field.
Discussions
The best possible outcome would be a model that is able to not only analyze the impact of the player’s form on his performance on a given day, but also understands the role other players’ (from the same team or the opposition team) form and compatibility play on how well the said player performs in a given match.
References
- [1] Shubham Pawar. English premier league in-game match data https://www.kaggle.com/shubhmamp/english-premier-league-match-data, Mar 2019.
- [2] Geetanjali Tewari and Krishna Kartik Darsipudi. Predicting football match winner using the lstm model of recurrent neural networks, June 2018.
- [3] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- [4] Daniel Pettersson and Robert Nyquist. Football match prediction using deep learning, 2017.