A machine learning project that predicts NFL game outcomes using historical data, team performance metrics, and advanced feature engineering.
This project uses historical NFL game data to train a Random Forest classifier that predicts whether the home team will win. The model achieves ~59% accuracy on test data, which is better than random guessing (50%) and competitive for sports prediction.
- Historical Data: Uses data from 2021-2025 NFL seasons
- Advanced Features: 29+ engineered features including win rates, point differentials, rest days, strength of schedule, home/away splits, and injury counts
- Multiple Models: Support for Random Forest, Logistic Regression, Gradient Boosting, XGBoost, and Ensemble models
- Easy Predictions: Simple command-line interface for predicting any matchup
- Python 3.8 or higher
- conda (recommended) or pip
-
Clone or download this repository
-
Create a conda environment (recommended):
conda create -n football python=3.9
conda activate football- Install dependencies:
pip install -r requirements.txtpandas==2.1.4
numpy==1.26.2
scikit-learn==1.3.2
nfl_data_py==0.3.0
matplotlib==3.8.2
seaborn==0.13.0
tqdm==4.66.1
joblib==1.3.2
Run the complete pipeline:
./src/run.shThis will:
- Clean old models
- Download NFL data
- Train the model
- Enter interactive prediction mode
Train the model manually:
# Navigate to src directory
cd src
# Train with default settings (Random Forest)
python train_model.py
# Train with different model types
python train_model.py --model_type logistic_regression
python train_model.py --model_type gradient_boosting
python train_model.py --model_type xgboost
python train_model.py --model_type ensemble
# Customize lookback window
python train_model.py --lookback_games 10
# Enable cross-validation
python train_model.py --use_cross_validationAfter training, make predictions:
Interactive Mode:
python src/predict.pySingle Prediction:
python src/predict.py "Kansas City Chiefs" "Buffalo Bills"
# Or use team abbreviations
python src/predict.py KC BUFWith Specific Date:
python src/predict.py KC BUF --game_date 2025-01-15Use Different Model:
python src/predict.py KC BUF --model_type xgboostUse these standard NFL team abbreviations for predictions:
AFC East: BUF, MIA, NE, NYJ
AFC North: BAL, CIN, CLE, PIT
AFC South: HOU, IND, JAX, TEN
AFC West: DEN, KC, LV, LAC
NFC East: DAL, NYG, PHI, WAS
NFC North: CHI, DET, GB, MIN
NFC South: ATL, CAR, NO, TB
NFC West: ARI, LA (or LAR), SF, SEA
- Accuracy: ~58-60%
- Baseline: 50% (random guessing)
- Good: 60-65%
- Great: 65-70%
- Point Differential
- Average Points Scored
- Strength of Schedule
- Win Rate
- Home/Away Win Rate
- Weighted Recent Form
- Scoring Trend
- Rest Days
- Injury Count
- Head-to-Head Record
Football Predictor/
βββ src/
β βββ data_loader.py # Downloads NFL data via nfl_data_py
β βββ features.py # Feature engineering (basic)
β βββ train_model.py # Model training pipeline
β βββ predict.py # Prediction interface
β βββ run.sh # Automated pipeline script
β βββ test_data.py # Data exploration
β βββ explore_nfl_data.py # API exploration tool
β βββ WIP/ # Advanced implementations
β βββ features.py # Advanced feature engineering
β βββ train_model.py # Enhanced training with more models
β βββ predict.py # Enhanced predictions
β βββ check_data_quality.py
βββ models/ # Saved trained models
β βββ nfl_random_forest_model.pkl
β βββ nfl_feature_engineer.pkl
β βββ nfl_feature_columns.pkl
βββ data/ # Cached data (auto-generated)
βββ notebooks/ # Jupyter notebooks for analysis
βββ requirements.txt # Python dependencies
βββ README.md # This file
- Downloads historical NFL game data (2021-2025) using
nfl_data_py - Includes schedules, scores, team stats, and injury reports
- Data is cached locally for faster subsequent runs
For each game, the model creates 29+ features describing both teams:
Performance Metrics:
- Win rate (last 5 games)
- Average points scored
- Average points allowed
- Point differential
- Weighted recent form
Situational Factors:
- Rest days since last game
- Bye week indicator
- Short rest indicator (<6 days)
- Home/away splits
Advanced Metrics:
- Strength of schedule
- Scoring trends
- Head-to-head history
- Injury count
- Uses scikit-learn's Random Forest Classifier
- 80/20 train-test split
- Prevents overfitting with max_depth and min_samples_leaf
- Evaluates with accuracy, precision, recall, F1-score
- Takes any matchup (home team vs away team)
- Generates features using recent historical data
- Outputs win probability and confidence level
- Shows key factors influencing the prediction
1. "No trained model found"
# Solution: Train the model first
python src/train_model.py2. "Not enough historical data for team"
- Check team abbreviation spelling (e.g., "DET" not "DT")
- Use standard 2-3 letter abbreviations
- Team must have played at least 3 games in the dataset
3. "Module not found" errors
# Make sure you're in the right environment
conda activate football
pip install -r requirements.txt4. Run script permission denied
# Make the script executable
chmod +x src/run.sh5. Injury data not loading (404 error)
- This is expected - injury data endpoint has limited availability
- Model works without injury data (injury_count will be 0)
- Features are still accurate without this data
If you see "WARNING: Using stale data", it means:
- The most recent game in the dataset is >30 days old
- Predictions will be based on outdated team performance
- Re-run data_loader.py to fetch fresh data
============================================================
PREDICTING: BUF @ KC
============================================================
[Step 1] Generating features for both teams...
β Features created successfully
[Step 2] Making prediction...
============================================================
PREDICTION RESULTS
============================================================
π PREDICTED WINNER: KC
π CONFIDENCE: 64.2% (HIGH)
Breakdown:
β’ KC (Home): 64.2%
β’ BUF (Away): 35.8%
============================================================
KEY FACTORS INFLUENCING PREDICTION
============================================================
Recent Performance:
KC......................... Win Rate: 80.0%
BUF........................ Win Rate: 60.0%
Offensive Power (Avg Points Scored):
KC......................... 28.4 PPG
BUF........................ 25.1 PPG
Defensive Strength (Avg Points Allowed):
KC......................... 18.2 PPG
BUF........................ 22.3 PPG
Point Differential:
KC......................... +10.2
BUF........................ +2.8
python src/explore_nfl_data.pyThis interactive tool lets you explore:
- Game schedules and scores
- Team statistics
- Player statistics
- Rosters and depth charts
- Injury reports
- Draft picks
Edit src/features.py to add new features:
- Turnover margins
- Time zone differences
- Weather conditions
- Coaching experience
- QB injury status
Train multiple models and compare:
python src/train_model.py --model_type random_forest
python src/train_model.py --model_type xgboost
python src/train_model.py --model_type ensemble- Accuracy Expectations: 58-60% accuracy is competitive for NFL prediction
- Home Field Advantage: Model accounts for home/away performance splits
- Injuries: Limited injury data due to API restrictions
- Updates: Data is cached; delete
data/folder to force refresh - Ethics: This is for educational purposes only, not gambling advice
Potential improvements:
- Add weather data integration
- Include player-specific stats (QB rating, RB yards, etc.)
- Implement betting line predictions
- Add real-time game predictions
- Create web interface
- Add playoff probability calculations
- Include coaching matchup analysis
This project is for educational purposes only.
- Data provided by nfl_data_py
- Built with scikit-learn, pandas, and NumPy
- NFL data courtesy of the NFL and nflverse project
Have fun predicting games! π
For questions or issues, please check the troubleshooting section or open an issue.