Fix PredictWin/PredictDraw formulas and documentation#9
Merged
intinig merged 6 commits intointinig:mainfrom Apr 10, 2026
Merged
Fix PredictWin/PredictDraw formulas and documentation#9intinig merged 6 commits intointinig:mainfrom
intinig merged 6 commits intointinig:mainfrom
Conversation
* upstream/main: Bump gonum.org/v1/gonum from 0.16.0 to 0.17.0 Bump gonum.org/v1/gonum from 0.15.1 to 0.16.0
The pairwise comparison denominator had two errors: 1. TeamSigmaSquared was squared again (sigma^4 instead of sigma^2). TeamSigmaSquared is already the sum of player sigma^2 values, so multiplying it by itself produces sigma^4. 2. The beta coefficient used n (number of teams) instead of 2. Each pairwise comparison involves exactly two teams. Per Weng & Lin (2011) "A Bayesian Approximation Method for Online Ranking", JMLR 12, eq. (49): c_iq = sqrt(sigma_i^2 + sigma_q^2 + 2*beta^2). The denominator is always sqrt(sigma_i^2 + sigma_q^2 + 2*beta^2), regardless of how many teams are in the match. Cross-validated against openskill.py v6.2.0 (PlackettLuce model). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PredictDraw had two structural issues: 1. The draw probability base used 1/n (team count) instead of 1/total_player_count. With asymmetric teams this produces incorrect draw margins. 2. The iteration used ordered permutations (n*(n-1) pairs) instead of unordered combinations (n*(n-1)/2 pairs). The normalization denominator was also inconsistent between n<=2 and n>2 cases. Rewritten to iterate unordered pairs and average the pairwise draw probabilities, matching openskill.py's itertools.combinations approach. Cross-validated against openskill.py v6.2.0 (PlackettLuce model). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Every comment about default values was wrong: - Z: said 3.29, actual default is 3 - Sigma: said 25.0/3.29=7.59, actual is Mu/Z (25/3 ≈ 8.333) - Epsilon: said 0.000001, actual is 0.0001 - Beta: said "precision" with default 0.5, actual is Sigma/2 - Model: said Glicko2, actual is PlackettLuce - Tau: said "precision" with default 0.5, actual is nil (disabled) Corrected all comments to match the actual defaults set in NewPlackettLuce and NewWithOptions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR corrects the win/draw prediction math in rating to align with the intended Weng & Lin (2011) pairwise formulation, and updates related option documentation and tests to reflect the corrected behavior.
Changes:
- Fix
PredictWinpairwise denominator to use2*beta^2 + sigmaSqA + sigmaSqB(instead of effectively usingn*beta^2and squaring variances again). - Rewrite
PredictDrawto average unordered pairwise draw probabilities and base draw margin on total player count. - Update prediction test expectations and refresh
OpenSkillOptionsfield comments.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
types/types.go |
Updates OpenSkillOptions documentation comments to match current default behaviors. |
rating/predict.go |
Fixes pairwise denominator in PredictWin and rewrites PredictDraw to use combinations + corrected constants. |
rating/predict_test.go |
Updates expected probabilities to reflect the corrected prediction formulas. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
thesprockee
added a commit
to EchoTools/nakama
that referenced
this pull request
Apr 10, 2026
Point go-openskill to thesprockee/go-openskill which fixes: - PredictWin/PredictDraw denominator: σ⁴ → σ² and nβ² → 2β² - PredictDraw: wrong iteration (permutations → combinations) and draw probability base (1/n → 1/total_players) - Documentation comments with incorrect default values The old denominator was ~75% too large, compressing all predictions toward 50/50. With the fix, a 1.2 vs 9.3 skill matchup correctly predicts 10%-90% instead of 38%-62%. See intinig/go-openskill#9 for details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 tasks
045285f to
affc8c0
Compare
…lfVsSelf The expected value is ~0.5, not 1.0, so the old name was misleading. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Return zero-value results when teams have no players, preventing divide-by-zero (NaN/Inf) from 1/totalPlayerCount and n*(n-1)/2 denominators. Callers can pass teams with pre-allocated slices that never get populated, leaving zero-player teams. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
affc8c0 to
12ff0f8
Compare
thesprockee
added a commit
to EchoTools/nakama
that referenced
this pull request
Apr 10, 2026
Point go-openskill to thesprockee/go-openskill which fixes: - PredictWin/PredictDraw denominator: σ⁴ → σ² and nβ² → 2β² - PredictDraw: wrong iteration (permutations → combinations) and draw probability base (1/n → 1/total_players) - Documentation comments with incorrect default values The old denominator was ~75% too large, compressing all predictions toward 50/50. With the fix, a 1.2 vs 9.3 skill matchup correctly predicts 10%-90% instead of 38%-62%. See intinig/go-openskill#9 for details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes three bugs in
PredictWinandPredictDrawprediction formulas, verified against the Weng & Lin (2011) paper and the Pythonopenskill.pyreference implementation.Bug 1: Pairwise denominator (PredictWin + PredictDraw)
TeamSigmaSquared(already σ²) was squared again, producing σ⁴. Additionally, the beta coefficient usedn(number of teams) instead of2.Per Weng & Lin (2011) eq. (49):
c_iq = sqrt(σ²_i + σ²_q + 2β²)— always2β², and sigma values appear directly, not squared.Before:
sqrt(n*β² + σ²_A*σ²_A + σ²_B*σ²_B)After:
sqrt(2*β² + σ²_A + σ²_B)Bug 2: PredictDraw iteration and draw probability base
1/n(team count) instead of1/total_player_countfor draw probabilityRewritten to match
openskill.py'sitertools.combinationsapproach.Bug 3: Incorrect documentation comments
Every default value comment in
OpenSkillOptionswas wrong (Z said 3.29 not 3, Epsilon said 0.000001 not 0.0001, Beta said 0.5 not sigma/2, Model said Glicko2 not PlackettLuce, Tau said 0.5 not nil).Impact
The inflated denominator (~75% too large) compressed all predictions toward 0.5, making skill differences nearly invisible to any system using
PredictWinorPredictDraw.Using real player ratings from a production EchoVR matchmaking server (mu ≈ 1–14, sigma ≈ 3–5):
4v4 team match example:
The old predictions were telling matchmakers the match was basically a coin flip when one team had an overwhelming skill advantage. Any system relying on
PredictWin/PredictDrawfor match quality assessment was operating nearly blind to skill gaps.References
Test plan
🤖 Generated with Claude Code