Resources


This is a semi-structured collection of sports analytics, statistics, data science, and programming resources that I maintain. It’s primary purpose is to help students find resources for their sport analytics projects (appearing on this list does not constitute endorsement).

Sports analytics topics and problems

Compilations and reviews

Communication

  • How to watch basketball: https://cleaningtheglass.com/how-to-watch-basketball/

Reinforcement learning/AI in sport

Strategy proposal and simulation

Working in sports analytics (inc. academic collaborations)

Forecasting crowds

Playing ‘style’ and player ‘similarity’ (teams and players)

Forecasting player performance

Extreme values

Drafting

Recruiter perspectives

Randomness and skill vs. luck in sport

Meta-analytics & metric evaluation

In-game win probability & momentum

NFL:

AFL:

Rugby:

Soccer:

Basketball:

Match prediction & Team rating models

Tipping models

  • AFL Lab - SOLDIER Model: https://theafllab.wordpress.com/2019/03/02/the-soldier-model/
  • AFL Gains: https://ricporteous.netlify.com/post/machine-learning-in-afl/#creating-a-machine-learning-model-to-predict-afl-matches
  • AFLaytics - A Brownian Motion Inspired ELO Model: https://www.aflalytics.com/blog/2019/1/brownian-motion-inspired-elo-model/
  • Build an AFL Elo with FitzRoy: https://analysisofafl.netlify.com/models/2018-07-23-build-a-quick-elo/
  • AFL teams Elo ratings and footy-tipping: http://freerangestats.info/blog/2019/03/23/afl-elo

Causal Inference in sport

Decision making

Player evaluation/rating

Multi-trial/test events

Shooting/Kicking/Passing/Hitting ratings + xGoals

EPV, VAEP, xThreat, Equity

Basketball

AFL

Soccer

NFL

  • Yurko, R., Matano, F., Richardson, L. F., Granered, N., Pospisil, T., Pelechrinis, K., & Ventura, S.L. (2020). Going deep: models for continuous-time within-play valuation of game outcomes in American football with tracking data. Journal of Quantitative Analysis in Sports 1(ahead-of-print).

Rugby

Action valuation

Event stream analysis/Game state

Regression to the mean

Defensive valuation

Defensive strategy

Ball tracking data

Player tracking data

Role assignment

NMF

Pitch control

Pass models/Completion models

Running/rushing models

Trajectory prediction (‘ghosting’)

Game state with tracking data

Applications of pose data

Racing sports

Subjective ratings

Coaching/scouting

E-Sports

Sport science

Technology validation

Fitness-Fatigue models

Training load and monitoring

Non-invasive monitoring

Training plan generation and optimisation

Injuries

Running and wearables

Match demands

Opinions and commentaries


Datasets and competitions

Data formatting

CV in Sport Open data

Open data for science

Competitions


Deep Learning

Reinforcement learning

Graph neural networks


Computer vision

Computer vision in sport

Player detection

Player/Team ID

Player tracking

Ball tracking

Action detection/recognition

Sport camera calibration

Telestration

Markerless mocap

CV in biomech

Vision transformers

Image feature matching/ Homography

Object detection

Multi-object tracking


Sensors and Signal processing

Tutorial papers

Smartball examples

GPS

Kalman filter

Sensor fusion

Signal dimensionality reduction

Statistical parametric mapping

Functional data analysis

Multi-modal time series


Statistics, data science, and modelling

Modelling

Reporting and interpreting

Uncertainty estimation and visualisation

Probability distributions & Data generating processes

‘Significance’ and testing

Meaningful change & effect sizes

MBI

Statistics in sport science

Exploratory research

Power and sample size

Packages and tools

Sample size calculations for clinical prediction models

Decision making

Courses

Causal inference

Matching

Creating DAGs

Statistical tests vs. Linear models

Common statistical tests are linear models (or: how to teach stats) - https://lindeloev.github.io/tests-as-linear/#1_the_simplicity_underlying_common_tests

Spline models and GAMs

Interpretable machine learning

https://christophm.github.io/interpretable-ml-book/

Bayesian

Bayesian and sport

Mixed effect models

Location and scale models

Covariance structures & temporal models

Sport specific

Time series

Ordinal models

Zero inflated data

Proportions

Stein’s Pardox

Gaussian processes

Clinical prediction models

Developing and reporting models

  • https://www.ncbi.nlm.nih.gov/pubmed/25560730
  • https://www.ncbi.nlm.nih.gov/pubmed/22397945
  • https://www.ncbi.nlm.nih.gov/pubmed/22397946
  • https://www.ncbi.nlm.nih.gov/pubmed/29741602
  • https://www.ncbi.nlm.nih.gov/pubmed/27362778
  • https://www.ncbi.nlm.nih.gov/pubmed/23393430
  • https://www.ncbi.nlm.nih.gov/pubmed/20010215
  • https://www.ncbi.nlm.nih.gov/pubmed/24898551

Evaluation

Data viz

Dimensionality reduction & Variable selection

Item reduction

Clustering

Synthetic data generation

Synthetic consumers


Conferences and presentations

Sloan Sports Analytics Research Papers

Videos


Books

Applied Predictive Modeling - by Max Kuhn and Kjell Johnson

http://appliedpredictivemodeling.com/

This text is intended for a broad audience as both an introduction to predictive models as well as a guide to applying them. Non-mathematical readers will appreciate the intuitive explanations of the techniques while an emphasis on problem-solving with real data across a wide variety of applications will aid practitioners who wish to extend their expertise. Readers should have knowledge of basic statistical ideas, such as correlation and linear regression analysis. While the text is biased against complex equations, a mathematical background is needed for advanced topics.

An Introduction to Statistical Learning with Applications in R - Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

https://www-bcf.usc.edu/~gareth/ISL/

This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.

Computer Age Statistical Inference

https://web.stanford.edu/~hastie/CASI_files/PDF/casi.pdf

The twenty-first century has seen a breathtaking expansion of statistical methodology, both in scope and in influence. ‘Big data’, ‘data science’, and ‘machine learning’ have become familiar terms in the news, as statistical methods are brought to bear upon the enormous data sets of modern science and commerce. How did we get here? And where are we going? This book takes us on an exhilarating journey through the revolution in data analysis following the introduction of electronic computation in the 1950s. Beginning with classical inferential theories - Bayesian, frequentist, Fisherian - individual chapters take up a series of influential topics: survival analysis, logistic regression, empirical Bayes, the jackknife and bootstrap, random forests, neural networks, Markov chain Monte Carlo, inference after model selection, and dozens more. The distinctly modern approach integrates methodology and algorithms with statistical inference. The book ends with speculation on the future direction of statistics and data science.

Sport Analytics

Textbooks