Date of Award

Fall 2023

Document Type


Degree Name

Master of Science (MS)



First Advisor

Matthew Hawkey


This study investigated the application of advanced machine learning methods, specifically k-means clustering, k-Nearest Neighbors (kNN), and Support Vector Machines (SVM), to analyze player tracking data in soccer. The primary hypothesis posits that such data can yield a standalone, in-depth understanding of soccer matches. The study revealed that while k-means and spatial analysis are promising in analyzing player positions, kNN and SVM show limitations without additional variables. Spatial analysis examined each team’s convex hull and studied the correlation between team length, width, and surface area. Results showed team length and surface area have a strong positive correlation with a value of 0.8954. This suggested that teams with longer team length have a more direct style of play with players more spread out which led to larger surface areas. k-means clustering was performed with different k values derived from different approaches. The silhouette method recommended a k value of 2 and the elbow recommended a k value of 4. The context of the sport suggested additional analysis with a k value of 11. The results from k-means suggested natural data partitions, highlighting distinct player roles and field positions. kNN was performed to find similar players with the model of k = 19 showing the highest accuracy of 8.61%. The SVM model returned a classification of 55 classes which indicated a highly granular level of categorization for player roles. The results from kNN and SVM indicated the necessity of further contextual data for more effective analysis and emphasized the need for balanced datasets and careful model evaluation to avoid biases and ensure practical application in real-world scenarios. In conclusion, each algorithm offers unique perspectives and interpretations on player positioning and team formations. These algorithms, when combined with expert knowledge and additional contextual data, can significantly enrich the scope of analysis in soccer. Future work should consider incorporating event data and additional variables to enhance the depth of analytical insights, enabling a more comprehensive understanding of how formations evolve in response to various in-game situations.