Statistical Model Validation — Bootstrap CIs, Calibration

KEY RESULTS

At a Glance

1000

Backtested Trades

+12.9c

Avg c/Trade

73.0%

Win Rate

2/3

Statistically Significant

3/3

Profitable Sports

All results reflect the deployed strategy configuration including pregame filters. Backtested on real Polymarket bid/ask prices.

Performance Summary

Trades	169
Win Rate	70.4%
Avg c/Trade	+12.6c
Total P&L	$21.22
Backtest Type	poly_realistic

Statistical Significance

Bootstrap Mean	+12.6c
95% CI	[+5.4c, +19.6c]
99% CI	[+3.1c, +21.5c]
p-value	0.0007
Interpretation	Highly significant

Risk Metrics

4.21

Sharpe Ratio

-624.0c

Max Drawdown

1.75

Profit Factor

33

Best Streak

12

Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.2148

Performance by Edge Size

Edge	Trades	Win Rate	Avg c/Trade
5-10c	43	74.4%	+6.7c
10-15c	42	69.0%	+5.0c
15-20c	25	72.0%	+12.9c
20+c	59	67.8%	+22.1c

Pregame Filter Impact (all metrics above use the filtered set)

Without Filter

299 trades | 64.2% WR | +6.4c

With Pregame ≥ 55c Filter

169 trades | 70.4% WR | +12.6c

Example Trades

Matchup	Side	Score	Market	Model Fair	Edge	Result	P&L
CHI @ HOU	home	57-61	73.0c	87.1c	+14.1c	WIN	+27.0c
PHX @ DET	home	62-66	54.0c	68.1c	+14.1c	WIN	+46.0c
IND @ PHI	home	50-55	58.0c	72.1c	+14.1c	WIN	+42.0c
WSH vs LAC	away	72-67	53.0c	67.2c	+14.2c	WIN	+47.0c
CHI @ DET	home	57-61	57.0c	72.1c	+15.1c	WIN	+43.0c

Performance by Month

Month	Trades	Win Rate	Avg c/Trade	Total P&L
2026-01	161	72.7%	+14.7c	+2368c

Model Architecture

Architecture	Split-Phase XGBoost (early-game + clutch-time models)
Features	14 engineered features
Calibration	Isotonic regression
Training Data	5,285 games

Features: score_diff, time_fraction, home_elo, away_elo, elo_diff, home_court, pregame_wp, score_diff_x_tf, score_diff_sq, total_score, score_diff_x_elo, pace_diff, ortg_diff, drtg_diff

Performance Summary

Trades	781
Win Rate	73.5%
Avg c/Trade	+13.5c
Total P&L	$105.45
Backtest Type	poly_realistic

Statistical Significance

Bootstrap Mean	+13.5c
95% CI	[+10.3c, +16.6c]
99% CI	[+9.3c, +17.6c]
p-value	0.0000
Interpretation	Highly significant

Risk Metrics

4.74

Sharpe Ratio

-1122.0c

Max Drawdown

1.88

Profit Factor

66

Best Streak

9

Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.1903

Performance by Edge Size

Edge	Trades	Win Rate	Avg c/Trade
5-10c	204	72.5%	+3.9c
10-15c	240	71.7%	+4.3c
15-20c	97	71.1%	+9.7c
20+c	240	77.1%	+32.4c

Example Trades

Matchup	Side	Score	Market	Model Fair	Edge	Result	P&L
DAY @ LAS	home	51-41	61.0c	72.3c	+11.3c	WIN	+37.0c
UGA @ TEX	home	53-49	68.0c	79.3c	+11.3c	WIN	+30.0c
GMU @ URI	home	53-48	61.0c	72.3c	+11.3c	WIN	+37.0c
WEB @ MTST	home	45-42	68.0c	79.3c	+11.3c	WIN	+30.0c
HC @ COLG	home	37-40	61.0c	72.3c	+11.3c	WIN	+37.0c

Performance by Month

Month	Trades	Win Rate	Avg c/Trade	Total P&L
2026-01	781	73.5%	+13.5c	+10545c

Model Architecture

Architecture	Split-Phase XGBoost (early-game + clutch-time models)
Features	14 engineered features
Calibration	Isotonic regression
Training Data	12,285 games

Features: score_diff, time_fraction, home_elo, away_elo, elo_diff, home_court, pregame_wp, score_diff_x_tf, score_diff_sq, total_score, score_diff_x_elo, pace_diff, ortg_diff, drtg_diff

Performance Summary

Trades	50
Win Rate	74.0%
Avg c/Trade	+5.2c
Total P&L	$2.60
Backtest Type	poly_realistic

Statistical Significance

Bootstrap Mean	+5.2c
95% CI	[-7.5c, +17.0c]
99% CI	[-11.8c, +20.9c]
p-value	0.2000
Interpretation	Not significant

Risk Metrics

1.87

Sharpe Ratio

-320.0c

Max Drawdown

1.30

Profit Factor

12

Best Streak

2

Worst Streak

Equity Curve (Cumulative c)

Calibration (Predicted vs Actual) Brier: 0.2053

Performance by Edge Size

Edge	Trades	Win Rate	Avg c/Trade
5-10c	11	81.8%	+8.5c
10-15c	22	72.7%	+1.0c
15-20c	11	81.8%	+16.9c
20+c	6	50.0%	-7.0c

Pregame Filter Impact (all metrics above use the filtered set)

Without Filter

147 trades | 64.6% WR | -1.2c

With Pregame ≥ 55c Filter

50 trades | 74.0% WR | +5.2c

Example Trades

Matchup	Side	Score	Market	Model Fair	Edge	Result	P&L
NYR @ WSH	home	2-1	65.0c	77.7c	+12.7c	WIN	+35.0c
ANA vs TB	away	0-1	75.0c	88.0c	+13.0c	WIN	+25.0c
ANA @ EDM	home	3-2	78.0c	91.0c	+13.0c	WIN	+22.0c
CBJ @ VGK	home	3-2	76.0c	89.1c	+13.1c	WIN	+24.0c
NYR @ LA	home	3-2	65.0c	79.3c	+14.3c	WIN	+35.0c

Performance by Month

Month	Trades	Win Rate	Avg c/Trade	Total P&L
2026-01	42	73.8%	+5.3c	+221c

Model Architecture

Architecture	XGBoost + Isotonic calibration
Features	12 engineered features
Calibration	Isotonic regression
Training Data	4,225 games

Features: score_diff, time_fraction, home_elo, away_elo, elo_diff, home_ice, pregame_wp, score_diff_x_tf, score_diff_sq, pace_diff, ortg_diff, drtg_diff

Benchmark Comparisons

How our model performs vs naive strategies. A model that can't beat simple baselines isn't worth using.

NBA

Strategy	Win Rate	c/Trade
Our Model	70.4%	+12.6c
Random (50/50)	50.0%	-2.0c
Market-Efficient	57.9%	+0.0c

NCAAMB

Strategy	Win Rate	c/Trade
Our Model	73.5%	+13.5c
Random (50/50)	50.0%	-2.0c
Market-Efficient	58.5%	-0.0c

NHL

Strategy	Win Rate	c/Trade
Our Model	74.0%	+5.2c
Random (50/50)	50.0%	-2.0c
Market-Efficient	68.8%	+0.0c

Academic Foundation

Our approach is grounded in peer-reviewed research on sports prediction markets and probabilistic forecasting.

Beating the bookies with their own numbers - and how the online sports betting market is rigged

Kaunitz, Zhong, Kreiner (2017)

CLV validation - demonstrates that a positive closing line value strategy yields positive long-term returns

Verification of forecasts expressed in terms of probability

Brier, Glenn W. (1950)

Foundation for calibration analysis - Brier score measures probabilistic prediction accuracy

Using random forests to estimate win probability before each play of an NFL game

Lock, Dennis; Nettleton, Dan (2014)

In-game WP modeling methodology - random forests on game state features for real-time prediction

Why are gambling markets organised so differently from financial markets?

Levitt, Steven D. (2004)

Market efficiency analysis - sports markets exhibit inefficiencies exploitable by informed bettors

Optimal betting odds against insider traders

Shin, Hyun Song (1991)

Theoretical foundation for bookmaker pricing models and adverse selection in betting markets

A Brownian motion model for the progress of sports scores

Stern, Hal (1994)

Score-diff as Brownian motion - theoretical underpinning for WP models based on score differential and time

Methodology & Anti-Overfitting Safeguards

Training / test split — Models are trained on historical ESPN game-state snapshots (multiple seasons), then tested on held-out recent-season data using real Polymarket prices the model never saw during training. No future data leaks into features.

Realistic backtesting — Poly-price backtests use actual Polymarket bid/ask prices from enriched market snapshots, including a 2c taker fee per trade. Entry prices reflect real market conditions, not simulated fills.

Bootstrap confidence intervals — 10,000 resamples with replacement. The p-value is the fraction of bootstrap means ≤ 0, testing H0: "the model has no edge." A p < 0.05 means we're 95%+ confident the edge is real.

Calibration — Predictions are bucketed into 5%-wide bins (min 5 trades each). A well-calibrated model's dots land on the diagonal; points below the line indicate overconfidence.

Sharpe ratio — Annualized (sqrt(252) scaling) on per-trade P&L. Values above 1.0 indicate strong risk-adjusted returns; above 3.0 is exceptional.

Profit factor — Gross wins / gross losses. Above 1.25 = profitable. Above 1.5 = strong. Above 2.0 = excellent.

Fee assumptions — All results are net of a 2c flat taker fee per contract. Live Polymarket fees may vary by sport (e.g., NCAAMB has a 2% taker fee effective Feb 2026).

Pregame filter — For NBA and NHL, the deployed strategy requires the pregame market price to agree with the model's bet side at ≥55c. This filters out trades where the model disagrees with market consensus, reducing adverse selection.