Skip to content

Fix PredictWin/PredictDraw formulas and documentation#9

Merged
intinig merged 6 commits intointinig:mainfrom
thesprockee:fix/bugs-and-parity
Apr 10, 2026
Merged

Fix PredictWin/PredictDraw formulas and documentation#9
intinig merged 6 commits intointinig:mainfrom
thesprockee:fix/bugs-and-parity

Conversation

@thesprockee
Copy link
Copy Markdown
Contributor

@thesprockee thesprockee commented Apr 10, 2026

Summary

Fixes three bugs in PredictWin and PredictDraw prediction formulas, verified against the Weng & Lin (2011) paper and the Python openskill.py reference implementation.

Bug 1: Pairwise denominator (PredictWin + PredictDraw)

TeamSigmaSquared (already σ²) was squared again, producing σ⁴. Additionally, the beta coefficient used n (number of teams) instead of 2.

Per Weng & Lin (2011) eq. (49): c_iq = sqrt(σ²_i + σ²_q + 2β²) — always 2β², and sigma values appear directly, not squared.

Before: sqrt(n*β² + σ²_A*σ²_A + σ²_B*σ²_B)
After: sqrt(2*β² + σ²_A + σ²_B)

Bug 2: PredictDraw iteration and draw probability base

  • Used 1/n (team count) instead of 1/total_player_count for draw probability
  • Iterated ordered permutations instead of unordered combinations
  • Inconsistent normalization between 2-team and n-team cases

Rewritten to match openskill.py's itertools.combinations approach.

Bug 3: Incorrect documentation comments

Every default value comment in OpenSkillOptions was wrong (Z said 3.29 not 3, Epsilon said 0.000001 not 0.0001, Beta said 0.5 not sigma/2, Model said Glicko2 not PlackettLuce, Tau said 0.5 not nil).

Impact

The inflated denominator (~75% too large) compressed all predictions toward 0.5, making skill differences nearly invisible to any system using PredictWin or PredictDraw.

Using real player ratings from a production EchoVR matchmaking server (mu ≈ 1–14, sigma ≈ 3–5):

Matchup Old P(win) New P(win) Error
μ=1.2 vs μ=9.3 0.38 vs 0.62 0.10 vs 0.90 +0.28
μ=1.2 vs μ=13.9 0.35 vs 0.65 0.04 vs 0.96 +0.31
μ=9.3 vs μ=13.9 0.43 vs 0.57 0.24 vs 0.76 +0.19

4v4 team match example:

  • Old: Blue 54.5% win probability (looks balanced)
  • New: Blue 82.4% win probability (clearly lopsided — 11 mu advantage)

The old predictions were telling matchmakers the match was basically a coin flip when one team had an overwhelming skill advantage. Any system relying on PredictWin/PredictDraw for match quality assessment was operating nearly blind to skill gaps.

References

  • Weng, R. C., & Lin, C.-J. (2011). A Bayesian approximation method for online ranking. JMLR, 12, 267–300. Equations (5), (49).
  • Cross-validated against openskill.py v6.2.0

Test plan

  • All existing tests updated and passing
  • Test values cross-validated against Python openskill.py output
  • Formulas verified against Weng & Lin (2011) equations (5), (49)

🤖 Generated with Claude Code

thesprockee and others added 4 commits April 9, 2026 23:44
* upstream/main:
  Bump gonum.org/v1/gonum from 0.16.0 to 0.17.0
  Bump gonum.org/v1/gonum from 0.15.1 to 0.16.0
The pairwise comparison denominator had two errors:

1. TeamSigmaSquared was squared again (sigma^4 instead of sigma^2).
   TeamSigmaSquared is already the sum of player sigma^2 values, so
   multiplying it by itself produces sigma^4.

2. The beta coefficient used n (number of teams) instead of 2. Each
   pairwise comparison involves exactly two teams.

Per Weng & Lin (2011) "A Bayesian Approximation Method for Online
Ranking", JMLR 12, eq. (49): c_iq = sqrt(sigma_i^2 + sigma_q^2 +
2*beta^2). The denominator is always sqrt(sigma_i^2 + sigma_q^2 +
2*beta^2), regardless of how many teams are in the match.

Cross-validated against openskill.py v6.2.0 (PlackettLuce model).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PredictDraw had two structural issues:

1. The draw probability base used 1/n (team count) instead of
   1/total_player_count. With asymmetric teams this produces
   incorrect draw margins.

2. The iteration used ordered permutations (n*(n-1) pairs) instead
   of unordered combinations (n*(n-1)/2 pairs). The normalization
   denominator was also inconsistent between n<=2 and n>2 cases.

Rewritten to iterate unordered pairs and average the pairwise draw
probabilities, matching openskill.py's itertools.combinations
approach.

Cross-validated against openskill.py v6.2.0 (PlackettLuce model).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Every comment about default values was wrong:
- Z: said 3.29, actual default is 3
- Sigma: said 25.0/3.29=7.59, actual is Mu/Z (25/3 ≈ 8.333)
- Epsilon: said 0.000001, actual is 0.0001
- Beta: said "precision" with default 0.5, actual is Sigma/2
- Model: said Glicko2, actual is PlackettLuce
- Tau: said "precision" with default 0.5, actual is nil (disabled)

Corrected all comments to match the actual defaults set in
NewPlackettLuce and NewWithOptions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@thesprockee thesprockee marked this pull request as ready for review April 10, 2026 05:26
Copilot AI review requested due to automatic review settings April 10, 2026 05:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR corrects the win/draw prediction math in rating to align with the intended Weng & Lin (2011) pairwise formulation, and updates related option documentation and tests to reflect the corrected behavior.

Changes:

  • Fix PredictWin pairwise denominator to use 2*beta^2 + sigmaSqA + sigmaSqB (instead of effectively using n*beta^2 and squaring variances again).
  • Rewrite PredictDraw to average unordered pairwise draw probabilities and base draw margin on total player count.
  • Update prediction test expectations and refresh OpenSkillOptions field comments.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
types/types.go Updates OpenSkillOptions documentation comments to match current default behaviors.
rating/predict.go Fixes pairwise denominator in PredictWin and rewrites PredictDraw to use combinations + corrected constants.
rating/predict_test.go Updates expected probabilities to reflect the corrected prediction formulas.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@thesprockee thesprockee marked this pull request as draft April 10, 2026 06:29
thesprockee added a commit to EchoTools/nakama that referenced this pull request Apr 10, 2026
Point go-openskill to thesprockee/go-openskill which fixes:
- PredictWin/PredictDraw denominator: σ⁴ → σ² and nβ² → 2β²
- PredictDraw: wrong iteration (permutations → combinations) and
  draw probability base (1/n → 1/total_players)
- Documentation comments with incorrect default values

The old denominator was ~75% too large, compressing all predictions
toward 50/50. With the fix, a 1.2 vs 9.3 skill matchup correctly
predicts 10%-90% instead of 38%-62%.

See intinig/go-openskill#9 for details.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@thesprockee thesprockee force-pushed the fix/bugs-and-parity branch 2 times, most recently from 045285f to affc8c0 Compare April 10, 2026 11:30
thesprockee and others added 2 commits April 10, 2026 06:44
…lfVsSelf

The expected value is ~0.5, not 1.0, so the old name was misleading.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Return zero-value results when teams have no players, preventing
divide-by-zero (NaN/Inf) from 1/totalPlayerCount and n*(n-1)/2
denominators. Callers can pass teams with pre-allocated slices
that never get populated, leaving zero-player teams.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@thesprockee thesprockee force-pushed the fix/bugs-and-parity branch from affc8c0 to 12ff0f8 Compare April 10, 2026 11:44
@thesprockee thesprockee deleted the fix/bugs-and-parity branch April 10, 2026 11:45
@thesprockee thesprockee restored the fix/bugs-and-parity branch April 10, 2026 11:46
@thesprockee thesprockee reopened this Apr 10, 2026
@thesprockee thesprockee marked this pull request as ready for review April 10, 2026 11:47
@intinig intinig merged commit bdb85a3 into intinig:main Apr 10, 2026
1 check passed
thesprockee added a commit to EchoTools/nakama that referenced this pull request Apr 10, 2026
Point go-openskill to thesprockee/go-openskill which fixes:
- PredictWin/PredictDraw denominator: σ⁴ → σ² and nβ² → 2β²
- PredictDraw: wrong iteration (permutations → combinations) and
  draw probability base (1/n → 1/total_players)
- Documentation comments with incorrect default values

The old denominator was ~75% too large, compressing all predictions
toward 50/50. With the fix, a 1.2 vs 9.3 skill matchup correctly
predicts 10%-90% instead of 38%-62%.

See intinig/go-openskill#9 for details.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants