⬅   Back to Projects

NBA Player Stat Forecasting

Python Bayesian ModelingAnalytics

Problem

Directly predicting player box-score statistics suffers from high variance due to fluctuations in playing time. A player scoring 8 points in 12 minutes is fundamentally different from scoring 8 points in 36 minutes, but naive models treat them similarly.

This project addresses that issue by modeling availability and production separately.

Modeling Approach

The system uses a two-stage pipeline:

  1. Minutes Model Predicts expected minutes played based on recent usage, opponent context, and game conditions.

  2. Production Model Predicts per-minute rates (points, rebounds, assists), conditioned on the predicted minutes.

The final forecast is produced by combining both stages into a single probabilistic output.

Statistical Methods

  • Bayesian hierarchical regression with partial pooling
  • Player-level and opponent-level effects
  • Weakly informative priors to stabilize estimates under sparse data

Partial pooling allows the model to “borrow” data across similar contexts while still learning player-specific behavior.

Evaluation

To avoid data leakage, all models were evaluated using a walk-forward backtesting framework: Essentially that means that we do not test the model on data from a timeframe that it was trained on, and only evaluate it on future events.

Results & Insights

I am still working on this project so there will be more to report here in the future.