Resources


This is a semi-structured collection of sports analytics, statistics, data science, and programming resources that I maintain. It’s primary purpose is to help students find resources for their sport analytics projects (appearing on this list does not constitute endorsement).

Sports analytics topics and problems

Compilations and reviews

Communication

  • How to watch basketball: https://cleaningtheglass.com/how-to-watch-basketball/

Reinforcement learning/AI in sport

Strategy proposal and simulation

Working in sports analytics (inc. academic collaborations)

Forecasting crowds

Playing ‘style’ and player ‘similarity’ (teams and players)

Forecasting player performance

Drafting

Recruiter perspectives

Randomness and skill vs. luck in sport

Meta-analytics & metric evaluation

In-game win probability & momentum

NFL:

AFL:

Rugby:

Soccer:

Basketball:

Match prediction & Team rating models

Tipping models

  • AFL Lab - SOLDIER Model: https://theafllab.wordpress.com/2019/03/02/the-soldier-model/
  • AFL Gains: https://ricporteous.netlify.com/post/machine-learning-in-afl/#creating-a-machine-learning-model-to-predict-afl-matches
  • AFLaytics - A Brownian Motion Inspired ELO Model: https://www.aflalytics.com/blog/2019/1/brownian-motion-inspired-elo-model/
  • Build an AFL Elo with FitzRoy: https://analysisofafl.netlify.com/models/2018-07-23-build-a-quick-elo/
  • AFL teams Elo ratings and footy-tipping: http://freerangestats.info/blog/2019/03/23/afl-elo

Causal Inference in sport

Decision making

Player evaluation/rating

Multi-trial/test events

Shooting/Kicking/Passing/Hitting ratings + xGoals

EPV, VAEP, xThreat, Equity

Basketball

AFL

Soccer

NFL

  • Yurko, R., Matano, F., Richardson, L. F., Granered, N., Pospisil, T., Pelechrinis, K., & Ventura, S.L. (2020). Going deep: models for continuous-time within-play valuation of game outcomes in American football with tracking data. Journal of Quantitative Analysis in Sports 1(ahead-of-print).

Rugby

Action valuation

Event stream analysis/Game state

Regression to the mean

Defensive valuation

Player tracking data

Role assignment

NMF

Pitch control

Pass models/Completion models

Running/rushing models

Trajectory prediction (‘ghosting’)

Racing sports

Subjective ratings

Coaching/scouting

E-Sports

Sport science

Technology validation

Fitness-Fatigue models

Training load and monitoring

Training plan generation and optimisation

Non-invasive monitoring

Injuries

Running and wearables

Match demands

Opinions and commentaries


Datasets and competitions

CV in Sport Open data

Open data for science

Competitions


Deep Learning

Reinforcement learning

Graph neural networks


Computer vision

Computer vision in sport

Player detection

Player/Team ID

Player tracking

Ball tracking

Action detection/recognition

Sport camera calibration

Telestration

CV in biomech

Vision transformers

Image feature matching/ Homography

Multi-object tracking


Sensors and Signal processing

Tutorial papers

Smartball examples

GPS

Kalman filter

Sensor fusion

Signal dimensionality reduction

Statistical parametric mapping

Functional data analysis

Multi-modal time series


Statistics, data science, and modelling

Modelling

Reporting and interpreting

Uncertainty estimation and visualisation

Probability distributions & Data generating processes

‘Significance’ and testing

Meaningful change & effect sizes

MBI

Statistics in sport science

Power and sample size

  • The tyranny of power: is there a better way to calculate sample size? - https://www.bmj.com/content/339/bmj.b3985
  • Sample Size Planning for Statistical Power and Accuracy in Parameter Estimation
    • This review examines recent advances in sample size planning, not only from the perspective of an individual researcher, but also with regard to the goal of developing cumulative knowledge. Psychologists have traditionally thought of sample size planning in terms of power analysis. Although we review recent advances in power analysis, our main focus is the desirability of achieving accurate parameter estimates, either instead of or in addition to obtaining sufficient power. Accuracy in parameter estimation (AIPE) has taken on increasing importance in light of recent emphasis on effect size estimation and formation of confidence intervals. The review provides an overview of the logic behind sample size planning for AIPE and summarizes recent advances in implementing this approach in designs commonly used in psychological research.
  • Power, precision, and sample size estimation in sport and exercise science research - https://www.tandfonline.com/doi/pdf/10.1080/02640414.2020.1776002
  • TWO SAMPLE-SIZE PRACTICES THAT I DON’T RECOMMEND - http://homepage.divms.uiowa.edu/~rlenth/Power/2badHabits.pdf
  • Exploratory analyses: How to meaningfully interpret and report them - https://onlinelibrary.wiley.com/doi/full/10.1002/pmrj.12980

Sample size calculations for clinical prediction models

Decision making

Courses

Causal inference

Matching

Statistical tests vs. Linear models

Common statistical tests are linear models (or: how to teach stats) - https://lindeloev.github.io/tests-as-linear/#1_the_simplicity_underlying_common_tests

Spline models and GAMs

Interpretable machine learning

https://christophm.github.io/interpretable-ml-book/

Bayesian

Bayesian and sport

Mixed effect models

Covariance structures & temporal models

Sport specific

Time series

Ordinal models

Zero inflated data

Stein’s Pardox

Gaussian processes

Clinical prediction models

Developing and reporting models

  • https://www.ncbi.nlm.nih.gov/pubmed/25560730
  • https://www.ncbi.nlm.nih.gov/pubmed/22397945
  • https://www.ncbi.nlm.nih.gov/pubmed/22397946
  • https://www.ncbi.nlm.nih.gov/pubmed/29741602
  • https://www.ncbi.nlm.nih.gov/pubmed/27362778
  • https://www.ncbi.nlm.nih.gov/pubmed/23393430
  • https://www.ncbi.nlm.nih.gov/pubmed/20010215
  • https://www.ncbi.nlm.nih.gov/pubmed/24898551

Evaluation

Data viz

Dimensionality reduction & Variable selection

Clustering

Synthetic data generation


Conferences and presentations

Sloan Sports Analytics Research Papers

KDD-Sports Analytics

http://large-scale-sports-analytics.org

Euro-KDD Sports analytics

https://dtai.cs.kuleuven.be/events/MLSA19/links.php

CVPR-sports

http://www.vap.aau.dk/cvsports/

Videos

https://www.youtube.com/watch?v=WjFdD7PDGw0&t=9s&index=2&list=WL Imitation Learning Tutorial ICML 2018 Tutorial session at the International Conference on Machine Learning (ICML 2018) - Yisong Yue (Caltech) & Hoang M. Le (Caltech). This is a high level talk about the machine learning techniques that people are using to train AI sports players like the ‘Ghosting’ video we watched in class.

https://www.youtube.com/watch?v=VkhPT2cPGLA&index=4&list=PLRPywWPWMCkoTF6yQQsI5Mes95ystQbXU&t=2248s Lecture: Machine Learning in Sports by Sam Robertson Good overview lecture on machine learning applications in sports.

https://www.youtube.com/watch?v=YBY9viGTdU0&index=2&list=PLRPywWPWMCkoTF6yQQsI5Mes95ystQbXU&t=388s 2015 NESSIS - Talk by Sam Robertson (Western Bulldogs) “A method to assess the influence of individual player performance distribution on match outcome in team sports” presented by Sam Robertson at the 2015 New England Symposium on Statistics in Sports, held on Sept 26, 2015, at the Harvard University

https://www.youtube.com/watch?v=O0rKs6P0rnY&index=5&list=PLRPywWPWMCkoTF6yQQsI5Mes95ystQbXU&t=62s Statistical Models for Sport in R – Stephanie Kovalchik (Tennis Australia) A hand on tutorial and walkthrough on doing sports analytics in R.

https://www.youtube.com/watch?v=djD-yL3vWNQ 2017 NESSIS - Talk by Ronald Yurko “NFLWAR: A reproducible method for offensive player evaluation in football” presented by Ronald Yurko at the 2017 New England Symposium on Statistics in Sports, held on Sept 23, 2017, at the Harvard University Science Center.

https://www.youtube.com/watch?v=RN2FLKoKC50 2017 NESSIS - Talk by Nathan Sandholtz “Replaying the NBA: Using Markov Decision Processes to test decision-making from the 2015-2016 regular season” presented by Nathan Sandholtz at the 2017 New England Symposium on Statistics in Sports, held on Sept 23, 2017, at the Harvard

https://www.youtube.com/user/42analytics/videos Sloan sports analytics conference presentations Library of many past sports analytics presentations.

https://www.anziam.org.au/MathSport+Proceedings MathSport Proceedings ANZIAM Mathsport has placed conference proceedings online to make the papers available to researchers everywhere.


Books

Applied Predictive Modeling - by Max Kuhn and Kjell Johnson

http://appliedpredictivemodeling.com/

This text is intended for a broad audience as both an introduction to predictive models as well as a guide to applying them. Non-mathematical readers will appreciate the intuitive explanations of the techniques while an emphasis on problem-solving with real data across a wide variety of applications will aid practitioners who wish to extend their expertise. Readers should have knowledge of basic statistical ideas, such as correlation and linear regression analysis. While the text is biased against complex equations, a mathematical background is needed for advanced topics.

An Introduction to Statistical Learning with Applications in R - Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

https://www-bcf.usc.edu/~gareth/ISL/

This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.

Computer Age Statistical Inference

https://web.stanford.edu/~hastie/CASI_files/PDF/casi.pdf

The twenty-first century has seen a breathtaking expansion of statistical methodology, both in scope and in influence. ‘Big data’, ‘data science’, and ‘machine learning’ have become familiar terms in the news, as statistical methods are brought to bear upon the enormous data sets of modern science and commerce. How did we get here? And where are we going? This book takes us on an exhilarating journey through the revolution in data analysis following the introduction of electronic computation in the 1950s. Beginning with classical inferential theories - Bayesian, frequentist, Fisherian - individual chapters take up a series of influential topics: survival analysis, logistic regression, empirical Bayes, the jackknife and bootstrap, random forests, neural networks, Markov chain Monte Carlo, inference after model selection, and dozens more. The distinctly modern approach integrates methodology and algorithms with statistical inference. The book ends with speculation on the future direction of statistics and data science.


Programming

R

Python