Fix PredictWin/PredictDraw formulas and documentation by thesprockee · Pull Request #9 · intinig/go-openskill

thesprockee · 2026-04-10T05:18:57Z

Summary

Fixes three bugs in PredictWin and PredictDraw prediction formulas, verified against the Weng & Lin (2011) paper and the Python openskill.py reference implementation.

Bug 1: Pairwise denominator (PredictWin + PredictDraw)

TeamSigmaSquared (already σ²) was squared again, producing σ⁴. Additionally, the beta coefficient used n (number of teams) instead of 2.

Per Weng & Lin (2011) eq. (49): c_iq = sqrt(σ²_i + σ²_q + 2β²) — always 2β², and sigma values appear directly, not squared.

Before: sqrt(n*β² + σ²_A*σ²_A + σ²_B*σ²_B)
After: sqrt(2*β² + σ²_A + σ²_B)

Bug 2: PredictDraw iteration and draw probability base

Used 1/n (team count) instead of 1/total_player_count for draw probability
Iterated ordered permutations instead of unordered combinations
Inconsistent normalization between 2-team and n-team cases

Rewritten to match openskill.py's itertools.combinations approach.

Bug 3: Incorrect documentation comments

Every default value comment in OpenSkillOptions was wrong (Z said 3.29 not 3, Epsilon said 0.000001 not 0.0001, Beta said 0.5 not sigma/2, Model said Glicko2 not PlackettLuce, Tau said 0.5 not nil).

Impact

The inflated denominator (~75% too large) compressed all predictions toward 0.5, making skill differences nearly invisible to any system using PredictWin or PredictDraw.

Using real player ratings from a production EchoVR matchmaking server (mu ≈ 1–14, sigma ≈ 3–5):

Matchup	Old P(win)	New P(win)	Error
μ=1.2 vs μ=9.3	0.38 vs 0.62	0.10 vs 0.90	+0.28
μ=1.2 vs μ=13.9	0.35 vs 0.65	0.04 vs 0.96	+0.31
μ=9.3 vs μ=13.9	0.43 vs 0.57	0.24 vs 0.76	+0.19

4v4 team match example:

Old: Blue 54.5% win probability (looks balanced)
New: Blue 82.4% win probability (clearly lopsided — 11 mu advantage)

The old predictions were telling matchmakers the match was basically a coin flip when one team had an overwhelming skill advantage. Any system relying on PredictWin/PredictDraw for match quality assessment was operating nearly blind to skill gaps.

References

Weng, R. C., & Lin, C.-J. (2011). A Bayesian approximation method for online ranking. JMLR, 12, 267–300. Equations (5), (49).
Cross-validated against openskill.py v6.2.0

Test plan

All existing tests updated and passing
Test values cross-validated against Python openskill.py output
Formulas verified against Weng & Lin (2011) equations (5), (49)

🤖 Generated with Claude Code

* upstream/main: Bump gonum.org/v1/gonum from 0.16.0 to 0.17.0 Bump gonum.org/v1/gonum from 0.15.1 to 0.16.0

The pairwise comparison denominator had two errors: 1. TeamSigmaSquared was squared again (sigma^4 instead of sigma^2). TeamSigmaSquared is already the sum of player sigma^2 values, so multiplying it by itself produces sigma^4. 2. The beta coefficient used n (number of teams) instead of 2. Each pairwise comparison involves exactly two teams. Per Weng & Lin (2011) "A Bayesian Approximation Method for Online Ranking", JMLR 12, eq. (49): c_iq = sqrt(sigma_i^2 + sigma_q^2 + 2*beta^2). The denominator is always sqrt(sigma_i^2 + sigma_q^2 + 2*beta^2), regardless of how many teams are in the match. Cross-validated against openskill.py v6.2.0 (PlackettLuce model). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

PredictDraw had two structural issues: 1. The draw probability base used 1/n (team count) instead of 1/total_player_count. With asymmetric teams this produces incorrect draw margins. 2. The iteration used ordered permutations (n*(n-1) pairs) instead of unordered combinations (n*(n-1)/2 pairs). The normalization denominator was also inconsistent between n<=2 and n>2 cases. Rewritten to iterate unordered pairs and average the pairwise draw probabilities, matching openskill.py's itertools.combinations approach. Cross-validated against openskill.py v6.2.0 (PlackettLuce model). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Every comment about default values was wrong: - Z: said 3.29, actual default is 3 - Sigma: said 25.0/3.29=7.59, actual is Mu/Z (25/3 ≈ 8.333) - Epsilon: said 0.000001, actual is 0.0001 - Beta: said "precision" with default 0.5, actual is Sigma/2 - Model: said Glicko2, actual is PlackettLuce - Tau: said "precision" with default 0.5, actual is nil (disabled) Corrected all comments to match the actual defaults set in NewPlackettLuce and NewWithOptions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR corrects the win/draw prediction math in rating to align with the intended Weng & Lin (2011) pairwise formulation, and updates related option documentation and tests to reflect the corrected behavior.

Changes:

Fix PredictWin pairwise denominator to use 2*beta^2 + sigmaSqA + sigmaSqB (instead of effectively using n*beta^2 and squaring variances again).
Rewrite PredictDraw to average unordered pairwise draw probabilities and base draw margin on total player count.
Update prediction test expectations and refresh OpenSkillOptions field comments.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`types/types.go`	Updates `OpenSkillOptions` documentation comments to match current default behaviors.
`rating/predict.go`	Fixes pairwise denominator in `PredictWin` and rewrites `PredictDraw` to use combinations + corrected constants.
`rating/predict_test.go`	Updates expected probabilities to reflect the corrected prediction formulas.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

types/types.go

rating/predict.go

rating/predict_test.go

Point go-openskill to thesprockee/go-openskill which fixes: - PredictWin/PredictDraw denominator: σ⁴ → σ² and nβ² → 2β² - PredictDraw: wrong iteration (permutations → combinations) and draw probability base (1/n → 1/total_players) - Documentation comments with incorrect default values The old denominator was ~75% too large, compressing all predictions toward 50/50. With the fix, a 1.2 vs 9.3 skill matchup correctly predicts 10%-90% instead of 38%-62%. See intinig/go-openskill#9 for details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…lfVsSelf The expected value is ~0.5, not 1.0, so the old name was misleading. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Return zero-value results when teams have no players, preventing divide-by-zero (NaN/Inf) from 1/totalPlayerCount and n*(n-1)/2 denominators. Callers can pass teams with pre-allocated slices that never get populated, leaving zero-player teams. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Point go-openskill to thesprockee/go-openskill which fixes: - PredictWin/PredictDraw denominator: σ⁴ → σ² and nβ² → 2β² - PredictDraw: wrong iteration (permutations → combinations) and draw probability base (1/n → 1/total_players) - Documentation comments with incorrect default values The old denominator was ~75% too large, compressing all predictions toward 50/50. With the fix, a 1.2 vs 9.3 skill matchup correctly predicts 10%-90% instead of 38%-62%. See intinig/go-openskill#9 for details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

thesprockee and others added 4 commits April 9, 2026 23:44

Merge remote-tracking branch 'upstream/main'

f1cb179

* upstream/main: Bump gonum.org/v1/gonum from 0.16.0 to 0.17.0 Bump gonum.org/v1/gonum from 0.15.1 to 0.16.0

thesprockee marked this pull request as ready for review April 10, 2026 05:26

Copilot AI review requested due to automatic review settings April 10, 2026 05:26

Copilot started reviewing on behalf of thesprockee April 10, 2026 05:26 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

types/types.go Show resolved Hide resolved

rating/predict.go Show resolved Hide resolved

rating/predict_test.go Show resolved Hide resolved

thesprockee marked this pull request as draft April 10, 2026 06:29

thesprockee mentioned this pull request Apr 10, 2026

fix: update go-openskill to fork with prediction bug fixes EchoTools/nakama#373

Merged

3 tasks

thesprockee force-pushed the fix/bugs-and-parity branch 2 times, most recently from 045285f to affc8c0 Compare April 10, 2026 11:30

thesprockee and others added 2 commits April 10, 2026 06:44

Rename TestPredictDraw100PercentForSelfVsSelf to TestPredictDrawForSe…

b700502

…lfVsSelf The expected value is ~0.5, not 1.0, so the old name was misleading. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

thesprockee force-pushed the fix/bugs-and-parity branch from affc8c0 to 12ff0f8 Compare April 10, 2026 11:44

thesprockee closed this Apr 10, 2026

thesprockee deleted the fix/bugs-and-parity branch April 10, 2026 11:45

thesprockee restored the fix/bugs-and-parity branch April 10, 2026 11:46

thesprockee reopened this Apr 10, 2026

thesprockee marked this pull request as ready for review April 10, 2026 11:47

intinig merged commit bdb85a3 into intinig:main Apr 10, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PredictWin/PredictDraw formulas and documentation#9

Fix PredictWin/PredictDraw formulas and documentation#9
intinig merged 6 commits intointinig:mainfrom
thesprockee:fix/bugs-and-parity

thesprockee commented Apr 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

thesprockee commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug 1: Pairwise denominator (PredictWin + PredictDraw)

Bug 2: PredictDraw iteration and draw probability base

Bug 3: Incorrect documentation comments

Impact

References

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thesprockee commented Apr 10, 2026 •

edited

Loading