Our Methodology

How we build independent win probability models for prediction markets

Why independent probabilities matter

Most sports analytics services derive "fair value" by devigging sportsbook lines — averaging Pinnacle, FanDuel, and DraftKings odds. This gives you a consensus probability that tracks the market by construction. It's useful for sports betting (finding +EV against soft books), but it cannot find prediction market mispricings — because the output already agrees with the market.

ZenHodl models are trained on game state only: score differential, seconds remaining, period, Elo ratings, and sport-specific features. No odds, no lines, no market prices are used as inputs. This makes our output genuinely independent from the market — when our fair probability diverges from the Polymarket ask price, that divergence is a real signal, not noise.

The tradeoff: our models have a worse Brier score than market-derived models (they're less accurate in an absolute sense). But they produce +10.9c/trade net profit across 1,552 backtested trades — because independence is what creates trading value.

Market-Derived ZenHodl (Independent)
Inputs Sportsbook odds/lines Score, time, Elo only
Output Tracks market by construction Genuinely independent
Brier Score Better Worse
Trading Value Zero (agrees with market) +10.9c/trade net

Data pipeline

We scrape ESPN's play-by-play API for every game across 7 sports. Each game produces ~400 snapshots — one per score change or significant event.

60,702
Games
25.6M
Rows
7
Sports
2020–26
Seasons

Sports: NBA, NCAAMB, NCAAWB, CFB, NFL, NHL, MLB

Data is stored as Apache Parquet files. One row = one game state (score, period, clock, ESPN WP, outcome label).

Module 1 of our course teaches you to build this exact scraper.

Feature engineering

Each game state is featurized with 13–16 variables. We deliberately keep the feature set small — overfitting to noise destroys trading value.

Feature Sports Description
score_diff All home_score − away_score
seconds_remaining All Total game seconds left
period All Current period/half/inning
time_fraction All Fraction of game elapsed (0→1)
elo_diff All Home Elo − Away Elo
pregame_wp All ESPN pre-game win probability (fixed prior)
score_diff_x_tf All Lead × time elapsed (interaction)
score_diff_sq All Lead² (quadratic, captures blowouts)
is_home_batting MLB 1 if home team is batting
down, distance CFB/NFL Football situation
yard_line CFB/NFL Field position
possession_home CFB/NFL 1 if home has the ball
pace features NBA total_score, ortg_diff, drtg_diff

Model architecture

We use sport-specific models — no one-size-fits-all approach.

Basketball & Football (NBA, NCAAMB, NCAAWB, CFB, NFL)

Logistic regression with isotonic spline calibration. We tested XGBoost, random forests, and neural nets. The key finding: simpler models produce better trading value because they don't overfit to noise in the training data. XGBoost achieves a better Brier score but worse c/trade. We select models by trading value, not accuracy.

For NCAAMB specifically, XGBoost+Isotonic outperforms LR+Spline — the extra complexity is justified by the larger training set (854 trades vs 30 for NFL).

Soccer

Poisson model: predicts goal rates for home and away teams, adjusted by Elo. Generates P(home win), P(draw), P(away win) from the Poisson distribution. Calibrated on 211 matches across EPL, La Liga, Ligue 1.

Esports (CS2, LoL)

Binomial model: Elo-based game win probability fed into a binomial series model (BO1/BO3/BO5). 258 CS2 teams (HLTV rankings), 255 LoL teams (lolesports API).

Spread/Total

Regression + CDF: predicts remaining margin/total, then P(cover) = Φ((predicted − line) / σ). Time-bucketed residual standard deviation: wide σ early (conservative), narrow σ late (confident).

Backtesting methodology

Our backtests are designed to avoid the common mistakes that inflate results.

What we tried that doesn't work

We believe in showing failures alongside successes.

Build this system yourself

Our course walks you through every step — from data scraping to live trading.