Resources


This is a semi-structured collection of sports analytics, statistics, data science and programming resources that I’ve found useful and educational.

Sports analytics topics and problems

Compilations

Communication

  • How to watch basketball: https://cleaningtheglass.com/how-to-watch-basketball/

Reinforcement learning/AI in sport

Strategy proposal and simulation

Working in sports analytics

Playing ‘style’ and player ‘similarity’ (teams and players)

Forecasting player performance

Drafting

Recruiter perspectives

Randomness and skill vs. luck in sport

In-game win probability

NFL:

AFL:

Soccer:

Basketball:

  • A Data Snapshot Approach for Making Real-Time Predictions in Basketball

Match prediction & Team rating models

Tipping models

  • AFL Lab - SOLDIER Model: https://theafllab.wordpress.com/2019/03/02/the-soldier-model/
  • AFL Gains: https://ricporteous.netlify.com/post/machine-learning-in-afl/#creating-a-machine-learning-model-to-predict-afl-matches
  • AFLaytics - A Brownian Motion Inspired ELO Model: https://www.aflalytics.com/blog/2019/1/brownian-motion-inspired-elo-model/
  • Build an AFL Elo with FitzRoy: https://analysisofafl.netlify.com/models/2018-07-23-build-a-quick-elo/
  • AFL teams Elo ratings and footy-tipping: http://freerangestats.info/blog/2019/03/23/afl-elo

Causal Inference in sport

Player evaluation/rating

Shooting/Kicking/Passing ratings + xGoals

EPV, VAEP, xThreat, Equity

Basketball

AFL

Soccer

NFL

  • Yurko, R., Matano, F., Richardson, L. F., Granered, N., Pospisil, T., Pelechrinis, K., & Ventura, S.L. (2020). Going deep: models for continuous-time within-play valuation of game outcomes in American football with tracking data. Journal of Quantitative Analysis in Sports 1(ahead-of-print).

Rugby

Action valuation

Regression to the mean

Defensive valuation

Player tracking data

NMF

Pitch control

Pass models/Completion models

Trajectory prediction (‘ghosting’)

Training plan generation and optimisation

Event stream analysis

Subjective ratings

Coaching/scouting

E-Sports

Sport science

Technology validation

Fitness-Fatigue models

Non-invasive monitoring

Injuries

Running and wearables

Movement variability

Datasets and competitions

Competitions


Deep Learning

Reinforcement learning

Graph neural networks


Computer vision

Computer vision in sport

Player detection

Player/Team ID

Player tracking

Action detection/recognition

Sport camera calibration

Telestration

Vision transformers

Image feature matching/ Homography

Multi-object tracking


Sensors and Signal processing

GPS

Kalman filter

Sensor fusion

Signal dimensionality reduction


Statistics, data science, and modelling

Probability distributions & Data generating processes

‘Significance’ and testing

Meaningful change & effect sizes

MBI

Statistics in sport science

  • Current Research and Statistical Practices in Sport Science and a Need for Change

Power and sample size

  • The tyranny of power: is there a better way to calculate sample size? - https://www.bmj.com/content/339/bmj.b3985
  • Sample Size Planning for Statistical Power and Accuracy in Parameter Estimation
    • This review examines recent advances in sample size planning, not only from the perspective of an individual researcher, but also with regard to the goal of developing cumulative knowledge. Psychologists have traditionally thought of sample size planning in terms of power analysis. Although we review recent advances in power analysis, our main focus is the desirability of achieving accurate parameter estimates, either instead of or in addition to obtaining sufficient power. Accuracy in parameter estimation (AIPE) has taken on increasing importance in light of recent emphasis on effect size estimation and formation of confidence intervals. The review provides an overview of the logic behind sample size planning for AIPE and summarizes recent advances in implementing this approach in designs commonly used in psychological research.
  • Power, precision, and sample size estimation in sport and exercise science research - https://www.tandfonline.com/doi/pdf/10.1080/02640414.2020.1776002
  • TWO SAMPLE-SIZE PRACTICES THAT I DON’T RECOMMEND - http://homepage.divms.uiowa.edu/~rlenth/Power/2badHabits.pdf

Sample size calculations for clinical prediction models

Decision making

Courses

Causal inference

Statistical tests vs. Linear models

Common statistical tests are linear models (or: how to teach stats) - https://lindeloev.github.io/tests-as-linear/#1_the_simplicity_underlying_common_tests

GAMs

Interpretable machine learning

https://christophm.github.io/interpretable-ml-book/

Bayesian

Mixed effect models

Covariance structures & temporal models

Sport specific

Time series

Ordinal models

Zero inflated data

Stein’s Pardox

Gaussian processes

Clinical prediction models

Developing and reporting models

  • https://www.ncbi.nlm.nih.gov/pubmed/25560730
  • https://www.ncbi.nlm.nih.gov/pubmed/22397945
  • https://www.ncbi.nlm.nih.gov/pubmed/22397946
  • https://www.ncbi.nlm.nih.gov/pubmed/29741602
  • https://www.ncbi.nlm.nih.gov/pubmed/27362778
  • https://www.ncbi.nlm.nih.gov/pubmed/23393430
  • https://www.ncbi.nlm.nih.gov/pubmed/20010215
  • https://www.ncbi.nlm.nih.gov/pubmed/24898551

Evaluation

Data viz

Dimensionality reduction

Variable selection

Synthetic data generation


Conferences and presentations

Sloan Sports Analytics Research Papers

KDD-Sports Analytics

http://large-scale-sports-analytics.org

Euro-KDD Sports analytics

https://dtai.cs.kuleuven.be/events/MLSA19/links.php

CVPR-sports

http://www.vap.aau.dk/cvsports/

Videos

https://www.youtube.com/watch?v=WjFdD7PDGw0&t=9s&index=2&list=WL Imitation Learning Tutorial ICML 2018 Tutorial session at the International Conference on Machine Learning (ICML 2018) - Yisong Yue (Caltech) & Hoang M. Le (Caltech). This is a high level talk about the machine learning techniques that people are using to train AI sports players like the ‘Ghosting’ video we watched in class.

https://www.youtube.com/watch?v=VkhPT2cPGLA&index=4&list=PLRPywWPWMCkoTF6yQQsI5Mes95ystQbXU&t=2248s Lecture: Machine Learning in Sports by Sam Robertson Good overview lecture on machine learning applications in sports.

https://www.youtube.com/watch?v=YBY9viGTdU0&index=2&list=PLRPywWPWMCkoTF6yQQsI5Mes95ystQbXU&t=388s 2015 NESSIS - Talk by Sam Robertson (Western Bulldogs) “A method to assess the influence of individual player performance distribution on match outcome in team sports” presented by Sam Robertson at the 2015 New England Symposium on Statistics in Sports, held on Sept 26, 2015, at the Harvard University

https://www.youtube.com/watch?v=O0rKs6P0rnY&index=5&list=PLRPywWPWMCkoTF6yQQsI5Mes95ystQbXU&t=62s Statistical Models for Sport in R – Stephanie Kovalchik (Tennis Australia) A hand on tutorial and walkthrough on doing sports analytics in R.

https://www.youtube.com/watch?v=djD-yL3vWNQ 2017 NESSIS - Talk by Ronald Yurko “NFLWAR: A reproducible method for offensive player evaluation in football” presented by Ronald Yurko at the 2017 New England Symposium on Statistics in Sports, held on Sept 23, 2017, at the Harvard University Science Center.

https://www.youtube.com/watch?v=RN2FLKoKC50 2017 NESSIS - Talk by Nathan Sandholtz “Replaying the NBA: Using Markov Decision Processes to test decision-making from the 2015-2016 regular season” presented by Nathan Sandholtz at the 2017 New England Symposium on Statistics in Sports, held on Sept 23, 2017, at the Harvard

https://www.youtube.com/user/42analytics/videos Sloan sports analytics conference presentations Library of many past sports analytics presentations.

https://www.anziam.org.au/MathSport+Proceedings MathSport Proceedings ANZIAM Mathsport has placed conference proceedings online to make the papers available to researchers everywhere.


Books

Applied Predictive Modeling - by Max Kuhn and Kjell Johnson

http://appliedpredictivemodeling.com/

This text is intended for a broad audience as both an introduction to predictive models as well as a guide to applying them. Non-mathematical readers will appreciate the intuitive explanations of the techniques while an emphasis on problem-solving with real data across a wide variety of applications will aid practitioners who wish to extend their expertise. Readers should have knowledge of basic statistical ideas, such as correlation and linear regression analysis. While the text is biased against complex equations, a mathematical background is needed for advanced topics.

An Introduction to Statistical Learning with Applications in R - Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

https://www-bcf.usc.edu/~gareth/ISL/

This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.

Computer Age Statistical Inference

https://web.stanford.edu/~hastie/CASI_files/PDF/casi.pdf

The twenty-first century has seen a breathtaking expansion of statistical methodology, both in scope and in influence. ‘Big data’, ‘data science’, and ‘machine learning’ have become familiar terms in the news, as statistical methods are brought to bear upon the enormous data sets of modern science and commerce. How did we get here? And where are we going? This book takes us on an exhilarating journey through the revolution in data analysis following the introduction of electronic computation in the 1950s. Beginning with classical inferential theories - Bayesian, frequentist, Fisherian - individual chapters take up a series of influential topics: survival analysis, logistic regression, empirical Bayes, the jackknife and bootstrap, random forests, neural networks, Markov chain Monte Carlo, inference after model selection, and dozens more. The distinctly modern approach integrates methodology and algorithms with statistical inference. The book ends with speculation on the future direction of statistics and data science.


Programming

R

Python