Model Validation

Statistical proof that our edge is real — not luck. Bootstrap confidence intervals, calibration analysis, and risk metrics across 3 sports.

Generated 2026-03-24 | Model v4

KEY RESULTS

At a Glance

1000
Backtested Trades
+12.9c
Avg c/Trade
73.0%
Win Rate
2/3
Statistically Significant
3/3
Profitable Sports

All results reflect the deployed strategy configuration including pregame filters. Backtested on real Polymarket bid/ask prices.

Performance Summary

Trades169
Win Rate70.4%
Avg c/Trade+12.6c
Total P&L$21.22
Backtest Typepoly_realistic

Statistical Significance

Bootstrap Mean+12.6c
95% CI[+5.4c, +19.6c]
99% CI[+3.1c, +21.5c]
p-value0.0007
InterpretationHighly significant

Risk Metrics

4.21
Sharpe Ratio
-624.0c
Max Drawdown
1.75
Profit Factor
33
Best Streak
12
Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.2148

Performance by Edge Size

Edge Trades Win Rate Avg c/Trade
5-10c 43 74.4% +6.7c
10-15c 42 69.0% +5.0c
15-20c 25 72.0% +12.9c
20+c 59 67.8% +22.1c

Pregame Filter Impact (all metrics above use the filtered set)

Without Filter
299 trades | 64.2% WR | +6.4c
With Pregame ≥ 55c Filter
169 trades | 70.4% WR | +12.6c

Example Trades

Matchup Side Score Market Model Fair Edge Result P&L
CHI @ HOU home 57-61 73.0c 87.1c +14.1c WIN +27.0c
PHX @ DET home 62-66 54.0c 68.1c +14.1c WIN +46.0c
IND @ PHI home 50-55 58.0c 72.1c +14.1c WIN +42.0c
WSH vs LAC away 72-67 53.0c 67.2c +14.2c WIN +47.0c
CHI @ DET home 57-61 57.0c 72.1c +15.1c WIN +43.0c

Performance by Month

Month Trades Win Rate Avg c/Trade Total P&L
2026-01 161 72.7% +14.7c +2368c

Model Architecture

ArchitectureSplit-Phase XGBoost (early-game + clutch-time models)
Features14 engineered features
CalibrationIsotonic regression
Training Data5,285 games

Features: score_diff, time_fraction, home_elo, away_elo, elo_diff, home_court, pregame_wp, score_diff_x_tf, score_diff_sq, total_score, score_diff_x_elo, pace_diff, ortg_diff, drtg_diff

Performance Summary

Trades781
Win Rate73.5%
Avg c/Trade+13.5c
Total P&L$105.45
Backtest Typepoly_realistic

Statistical Significance

Bootstrap Mean+13.5c
95% CI[+10.3c, +16.6c]
99% CI[+9.3c, +17.6c]
p-value0.0000
InterpretationHighly significant

Risk Metrics

4.74
Sharpe Ratio
-1122.0c
Max Drawdown
1.88
Profit Factor
66
Best Streak
9
Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.1903

Performance by Edge Size

Edge Trades Win Rate Avg c/Trade
5-10c 204 72.5% +3.9c
10-15c 240 71.7% +4.3c
15-20c 97 71.1% +9.7c
20+c 240 77.1% +32.4c

Example Trades

Matchup Side Score Market Model Fair Edge Result P&L
DAY @ LAS home 51-41 61.0c 72.3c +11.3c WIN +37.0c
UGA @ TEX home 53-49 68.0c 79.3c +11.3c WIN +30.0c
GMU @ URI home 53-48 61.0c 72.3c +11.3c WIN +37.0c
WEB @ MTST home 45-42 68.0c 79.3c +11.3c WIN +30.0c
HC @ COLG home 37-40 61.0c 72.3c +11.3c WIN +37.0c

Performance by Month

Month Trades Win Rate Avg c/Trade Total P&L
2026-01 781 73.5% +13.5c +10545c

Model Architecture

ArchitectureSplit-Phase XGBoost (early-game + clutch-time models)
Features14 engineered features
CalibrationIsotonic regression
Training Data12,285 games

Features: score_diff, time_fraction, home_elo, away_elo, elo_diff, home_court, pregame_wp, score_diff_x_tf, score_diff_sq, total_score, score_diff_x_elo, pace_diff, ortg_diff, drtg_diff

Performance Summary

Trades50
Win Rate74.0%
Avg c/Trade+5.2c
Total P&L$2.60
Backtest Typepoly_realistic

Statistical Significance

Bootstrap Mean+5.2c
95% CI[-7.5c, +17.0c]
99% CI[-11.8c, +20.9c]
p-value0.2000
InterpretationNot significant

Risk Metrics

1.87
Sharpe Ratio
-320.0c
Max Drawdown
1.30
Profit Factor
12
Best Streak
2
Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.2053

Performance by Edge Size

Edge Trades Win Rate Avg c/Trade
5-10c 11 81.8% +8.5c
10-15c 22 72.7% +1.0c
15-20c 11 81.8% +16.9c
20+c 6 50.0% -7.0c

Pregame Filter Impact (all metrics above use the filtered set)

Without Filter
147 trades | 64.6% WR | -1.2c
With Pregame ≥ 55c Filter
50 trades | 74.0% WR | +5.2c

Example Trades

Matchup Side Score Market Model Fair Edge Result P&L
NYR @ WSH home 2-1 65.0c 77.7c +12.7c WIN +35.0c
ANA vs TB away 0-1 75.0c 88.0c +13.0c WIN +25.0c
ANA @ EDM home 3-2 78.0c 91.0c +13.0c WIN +22.0c
CBJ @ VGK home 3-2 76.0c 89.1c +13.1c WIN +24.0c
NYR @ LA home 3-2 65.0c 79.3c +14.3c WIN +35.0c

Performance by Month

Month Trades Win Rate Avg c/Trade Total P&L
2026-01 42 73.8% +5.3c +221c

Model Architecture

ArchitectureXGBoost + Isotonic calibration
Features12 engineered features
CalibrationIsotonic regression
Training Data4,225 games

Features: score_diff, time_fraction, home_elo, away_elo, elo_diff, home_ice, pregame_wp, score_diff_x_tf, score_diff_sq, pace_diff, ortg_diff, drtg_diff

Benchmark Comparisons

How our model performs vs naive strategies. A model that can't beat simple baselines isn't worth using.

NBA

Strategy Win Rate c/Trade
Our Model 70.4% +12.6c
Random (50/50) 50.0% -2.0c
Market-Efficient 57.9% +0.0c

NCAAMB

Strategy Win Rate c/Trade
Our Model 73.5% +13.5c
Random (50/50) 50.0% -2.0c
Market-Efficient 58.5% -0.0c

NHL

Strategy Win Rate c/Trade
Our Model 74.0% +5.2c
Random (50/50) 50.0% -2.0c
Market-Efficient 68.8% +0.0c

Academic Foundation

Our approach is grounded in peer-reviewed research on sports prediction markets and probabilistic forecasting.

Beating the bookies with their own numbers - and how the online sports betting market is rigged
Kaunitz, Zhong, Kreiner (2017)
CLV validation - demonstrates that a positive closing line value strategy yields positive long-term returns
Verification of forecasts expressed in terms of probability
Brier, Glenn W. (1950)
Foundation for calibration analysis - Brier score measures probabilistic prediction accuracy
Using random forests to estimate win probability before each play of an NFL game
Lock, Dennis; Nettleton, Dan (2014)
In-game WP modeling methodology - random forests on game state features for real-time prediction
Why are gambling markets organised so differently from financial markets?
Levitt, Steven D. (2004)
Market efficiency analysis - sports markets exhibit inefficiencies exploitable by informed bettors
Optimal betting odds against insider traders
Shin, Hyun Song (1991)
Theoretical foundation for bookmaker pricing models and adverse selection in betting markets
A Brownian motion model for the progress of sports scores
Stern, Hal (1994)
Score-diff as Brownian motion - theoretical underpinning for WP models based on score differential and time

Methodology & Anti-Overfitting Safeguards

Training / test split — Models are trained on historical ESPN game-state snapshots (multiple seasons), then tested on held-out recent-season data using real Polymarket prices the model never saw during training. No future data leaks into features.

Realistic backtesting — Poly-price backtests use actual Polymarket bid/ask prices from enriched market snapshots, including a 2c taker fee per trade. Entry prices reflect real market conditions, not simulated fills.

Bootstrap confidence intervals — 10,000 resamples with replacement. The p-value is the fraction of bootstrap means ≤ 0, testing H0: "the model has no edge." A p < 0.05 means we're 95%+ confident the edge is real.

Calibration — Predictions are bucketed into 5%-wide bins (min 5 trades each). A well-calibrated model's dots land on the diagonal; points below the line indicate overconfidence.

Sharpe ratio — Annualized (sqrt(252) scaling) on per-trade P&L. Values above 1.0 indicate strong risk-adjusted returns; above 3.0 is exceptional.

Profit factor — Gross wins / gross losses. Above 1.25 = profitable. Above 1.5 = strong. Above 2.0 = excellent.

Fee assumptions — All results are net of a 2c flat taker fee per contract. Live Polymarket fees may vary by sport (e.g., NCAAMB has a 2% taker fee effective Feb 2026).

Pregame filter — For NBA and NHL, the deployed strategy requires the pregame market price to agree with the model's bet side at ≥55c. This filters out trades where the model disagrees with market consensus, reducing adverse selection.