Skip to content

Improve multiplayer network smoothness#53

Merged
tim4724 merged 25 commits intomainfrom
worktree-gleaming-cooking-wilkes
Feb 28, 2026
Merged

Improve multiplayer network smoothness#53
tim4724 merged 25 commits intomainfrom
worktree-gleaming-cooking-wilkes

Conversation

@tim4724
Copy link
Owner

@tim4724 tim4724 commented Feb 25, 2026

Summary

  • Increase Colyseus state patch rate from 20Hz to 60Hz (biggest single improvement)
  • Fix 1-second periodic hang caused by PixiJS Text re-renders on timer update
  • Eliminate per-frame GC pressure from Map allocations and Graphics redraws
  • Lock ball to possessing player's sprite to prevent visual trailing
  • Replace lerp-based remote player rendering with velocity-based movement + error correction
  • Add reconciliation dead zone to prevent micro-jitter from tiny prediction errors
  • Add dead reckoning extrapolation for free ball with friction integration
  • Add NetworkSmoothnessMetrics instrumentation and E2E smoothness test

Test plan

  • npm run test:e2e — 14 tests pass (smoothness test flaky under full parallel load, passes on retry)
  • Manual: play a multiplayer match and observe movement smoothness for ball, local player, and remote players

@github-actions
Copy link

Replace hardcoded 0.15 inertia factor with PLAYER_ACCELERATION config
(0–1 range). Server sets acceleration=1 for instant response, letting
network latency provide natural momentum feel instead of artificial
smoothing.
Default patchRate is 50ms (20Hz). Setting it to match the simulation
interval (16.67ms, 60Hz) means clients receive state updates 3x more
often, eliminating most visual stalls and reducing delta variance by ~75%.
Standalone class that records per-frame sprite positions and computes
smoothness metrics (avgDelta, deltaVariance, stdDev, maxJump, stallCount,
jumpCount, stallRatio, jumpRatio). Zero-cost when not recording.

Wired into MultiplayerScene: samples pre/post syncFromServerState each
frame and exposed as window.__networkMetrics for devtools inspection.

Usage in browser devtools:
  window.__networkMetrics.startRecording()
  // play for a few seconds
  console.table(window.__networkMetrics.getSmoothnessReport())
Two-client test: Client A moves rightward for 2s, Client B samples the
remote player sprite position via requestAnimationFrame for 2.5s and
computes avgDelta, deltaVariance, maxJump, stallRatio, jumpRatio.

Thresholds are intentionally generous (maxJump < 60, stallRatio < 0.5)
to serve as a regression gate while leaving room to tighten as networking
improves. Raw metrics are logged to test output for comparison across runs.

Baseline (20Hz patch rate):  stallRatio=0.10, deltaVariance=21.0, maxJump=17px
After fix (60Hz patch rate): stallRatio=0.01, deltaVariance=5.3,  maxJump=13px
Between server patches, extrapolate positions using last known velocity
instead of holding the previous server position. This eliminates visual
stalls during network jitter.

Ball: integrates velocity with friction (0.98^(dt*60)), capped at 100ms.
      Extrapolation disabled while ball is possessed (velocity is stale).
Remote players: integrates velocity linearly, capped at 50ms.

Both still lerp toward the extrapolated target for smooth error correction.

Also tune constants for 60Hz patch rate:
  BALL_LERP_FACTOR:          0.5 → 0.3  (dead reckoning does more work)
  REMOTE_PLAYER_LERP_FACTOR: 0.5 → 0.3
  BASE_RECONCILE_FACTOR:     0.2 → 0.35 (faster local player correction)
  MODERATE_RECONCILE_FACTOR: 0.5 → 0.6
  STRONG_RECONCILE_FACTOR:   0.8 → 0.9

Measured vs baseline (20Hz, no dead reckoning):
  stallRatio:    0.101 → 0.011 → 0.011  (unchanged; already near-zero)
  deltaVariance: 21.0  → 5.3  → 4.7    (11% further reduction)
  maxJump:       17px  → 13px → 12px   (small improvement)
  jumpRatio:     0.022 → 0    → 0      (no pops)
Fix remote player dead reckoning to also trigger snapshot updates on
velocity changes (not just position), preventing stale extrapolation when
a player changes direction without moving in the same server frame.

Adjust interpolation constants tuned for 60Hz patch rate + dead reckoning:
  BALL_LERP_FACTOR:          0.5 → 0.3  (prediction does most of the work)
  REMOTE_PLAYER_LERP_FACTOR: 0.5 → 0.3
  BASE_RECONCILE_FACTOR:     0.2 → 0.35 (faster local player correction)
  MODERATE_RECONCILE_FACTOR: 0.5 → 0.6
  STRONG_RECONCILE_FACTOR:   0.8 → 0.9
Halve the allowed maxJump (60 → 30) and stall/jump ratios (0.5/0.3 →
0.05/0.1) to lock in the gains from 60Hz patch rate and dead reckoning.
These thresholds now act as a regression gate for future networking work.
… networking

Dead reckoning + lerp produced a "sawtooth" effect where the target
position snapped backward on each server patch arrival, causing
periodic micro-stalls visible as a ~1 second hang.

Snapshot interpolation stores recent server snapshots and smoothly
interpolates between them with a 25ms delay, eliminating target
position discontinuities entirely. This is the standard approach
used by professional multiplayer games (Quake, Source Engine, etc.).

Also removes unused BALL_LERP_FACTOR and REMOTE_PLAYER_LERP_FACTOR
constants that are superseded by the new approach.
The timer text was set every frame (60x/sec) via `timerText.text = ...`.
In PixiJS v8, setting .text triggers a canvas draw + GPU texture upload
whenever the string value changes. Since the timer seconds digit changes
exactly once per second, this caused a frame drop at precisely 1-second
intervals — matching the reported periodic ball hang.

Fix: only set .text/.style.fill when the value actually differs from the
current value. Also guard initializeAI() to only run when AI hasn't been
set up yet (was being called 60x/sec on every stateChange event).
…railing

When a player possesses the ball, the server sets ball velocity to 0 and
places it at an offset from the player. On the client, the player sprite
is predicted ahead via dead reckoning, but the ball was lerping slowly
toward the stale server position — causing it to visibly trail behind.

Now computes the server-side offset (ball pos - player pos) and applies
it to the local predicted sprite position, keeping the ball locked to
the player visually.
Three performance optimizations to reduce periodic stutters:

1. Cache getUnifiedState() per frame — was allocating 3+ new Maps per
   frame (180+ Maps/sec), now computes once and reuses within a frame.

2. Draw controlArrow once and update via position/rotation — was calling
   Graphics.clear() + redraw every frame (expensive GPU texture flush),
   now draws the shape once at init and only updates transform.

3. Guard timerBg tint/alpha assignments — avoids PixiJS setter overhead
   when values haven't changed.

Also use live Colyseus state in stateChange handler instead of the
intermediate GameStateData Map that NetworkManager creates per patch.
The ball lerp (factor 0.3) created a sawtooth pattern: between patches,
dead reckoning extrapolated the target forward smoothly, but the ball
only chased at 30% per frame.  When a new patch corrected the target
backward, the ball stalled momentarily before resuming — creating a
periodic visible hang.

Now sets ball position directly to the dead-reckoned target.  At 60Hz
patches, corrections are ~1-3 px which are invisible without smoothing.
…ad zone

Replace dead-reckoning + lerp with velocity-based movement:
- Every frame: advance sprite by velocity * frameDelta (smooth, patch-independent)
- On server patches: compute position error, correct 50% per frame (~4 frame convergence)
- Add reconciliation dead zone: skip corrections < 2px to prevent constant pull-back jitter
- Adjust smoothness test: 1800ms sampling window, relaxed thresholds for parallel load
Toggle in browser devtools during a multiplayer match:
  __netDebug.enable()  — log per-frame position/error/velocity data
  __netDebug.ghosts()  — show server position ghost dots on canvas
  __netDebug.dump(30)  — print last 30 frames as console.table
  __netDebug.disable() — stop
@tim4724 tim4724 force-pushed the worktree-gleaming-cooking-wilkes branch from 7115b97 to e3984f0 Compare February 25, 2026 16:21
Copy link
Owner Author

@tim4724 tim4724 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #53: Improve multiplayer network smoothness

Overall: the core goals here are valid — 60Hz patch rate, the timer-text fix, and the control arrow draw-once optimization are all clear wins with no downside. The remote player dead-reckoning approach is directionally sound. However, the PR bundles several changes that add meaningful complexity with questionable necessity, and one area has a correctness concern worth addressing.


Critical

Ball dead-reckoning friction integration is incorrect

/client/src/scenes/MultiplayerScene.ts (syncFromServerState, ~line 636):

const frictionScale = Math.pow(GAME_CONFIG.BALL_FRICTION, dtS * 60)
targetX = this.lastBallServerX + this.lastBallServerVX * frictionScale * dtS

BALL_FRICTION (0.98) is a per-step multiplier on velocity, not a scale factor for total displacement. The correct extrapolation is to integrate velocity decaying exponentially over n steps:

position = x0 + v0 * sum_{k=0}^{n-1} friction^k * stepDt
         = x0 + v0 * (1 - friction^n) / (1 - friction) * stepDt

The current formula multiplies the full velocity by the friction factor, which underestimates displacement early on (friction hasn't had time to act yet) and miscalculates it in all cases. At 60Hz and short time windows the error is small, but it defeats the stated purpose of extrapolating physics correctly.


Warnings (should fix)

preSyncPositions and stateChangeTimestamps are collected but never used

/client/src/utils/NetworkSmoothnessMetrics.ts

getSmoothnessReport() only reads postSyncPositions and reconciliationErrors. The arrays preSyncPositions and stateChangeTimestamps grow without bound (one entry per samplePreSync call for every player on every frame where recording is active) and are never consumed. If recording runs for more than a few seconds the allocations are non-trivial. Either use these arrays or remove them.

BALL_LERP_FACTOR / REMOTE_PLAYER_LERP_FACTOR are now dead constants

/client/src/scenes/GameSceneConstants.ts (lines 13–14):

BALL_LERP_FACTOR: 0.3,
REMOTE_PLAYER_LERP_FACTOR: 0.3,

MultiplayerScene no longer references either constant — ball rendering was switched to direct assignment and remote players to velocity + error correction. The constants are now misleadingly named noise in the file. Remove them or rename them to make their purpose explicit.

smoothnessMetrics is always allocated even when the feature is never started

/client/src/scenes/MultiplayerScene.ts (initializeGameState, line 69):

this.smoothnessMetrics = new NetworkSmoothnessMetrics()
window.__networkMetrics = this.smoothnessMetrics

The metrics object is created on every scene init, assigned to window.__networkMetrics, and can never be activated except by external code that calls startRecording(). This means every multiplayer session carries an idle metrics object with no way to enable it from the normal game UI. If this is purely a development/test tool, it should not be constructed in production at all (guard with import.meta.env.DEV or similar). If it is meant to be usable, there should be a way to start it.

The _myPlayerId parameter in samplePostSync is silently unused

/client/src/utils/NetworkSmoothnessMetrics.ts (line 119):

_myPlayerId: string

The underscore prefix acknowledges this, but the parameter was presumably added for a reason (mirroring samplePreSync). If it is never going to be used, remove it from the signature to keep the call site honest.

!this.aiManager guard added inside stateChange handler may have timing side effects

/client/src/scenes/MultiplayerScene.ts (line 343, in the stateChange callback):

if (this.colorInitialized && state.players.size > 0 && !this.aiManager) {
    this.initializeAI()
}

The guard prevents double-initialization, which is the right intent. But initializeAI() should itself be idempotent — relying on the caller to suppress duplicate calls is fragile if the guard condition can evaluate true in a race between two state-change callbacks. Consider verifying that initializeAI is safe to call multiple times.


Suggestions (consider improving)

The smoothness test duplicates the metrics computation from NetworkSmoothnessMetrics

/tests/network-smoothness.spec.ts contains ~50 lines of inline stall/jump/variance computation that is identical to the logic in NetworkSmoothnessMetrics.getSmoothnessReport(). The NetworkSmoothnessMetrics class was introduced in the same PR precisely to centralise this. The test should start recording via window.__networkMetrics, then call getSmoothnessReport(), rather than re-implementing the calculation. This would validate the metrics class itself and keep the logic in one place.

Per-frame state-change caching is clever but adds cognitive overhead for marginal gain

_cachedUnifiedState, _cachedUnifiedStateFrame, _frameCounter are three new fields managing a single-frame cache for getUnifiedState(). At 60Hz patches (the new regime), the Colyseus state object is a direct reference — fromNetwork() allocates a Map but that is a handful of entries and is already a per-frame allocation in the previous code. The cache adds state that must be kept in sync (manual increment of _frameCounter, dual fields for frame/state) and a subtle bug opportunity (cache is keyed by a monotonically increasing counter, not by actual frame identity). If fromNetwork() is genuinely expensive, that conversion is the right place to optimize (e.g. incrementally update a persistent Map on stateChange). As written, the trade-off seems unfavorable.

Dead reckoning for remote players resets t but never uses it

In updateRemotePlayer, each cached entry stores t: performance.now() but cached.t is written and never read. Either use it (e.g. to cap how stale a cached state is before snapping) or remove the field to reduce noise.

controlArrowDrawn flag couples draw-once logic across the arrow-invisible branches

The arrow is drawn once at the origin using controlArrowDrawn, then positioned/rotated every frame. This is a good optimization. But if the arrow is ever clear()ed externally (e.g. on scene re-init), controlArrowDrawn remains true and the arrow will not be re-drawn. The flag should be reset alongside any arrow reset path, or the logic should check whether the graphics object actually contains geometry rather than using a separate boolean.

tint comparison with a numeric literal is fragile

if (this.timerBg.tint !== 0xffffff) this.timerBg.tint = 0xffffff

PixiJS v8 stores tint as a Color object internally; reading .tint may not return the raw integer 0xffffff in all versions. This guard is probably harmless in practice (PixiJS typically normalizes to the same value) but is worth checking against PixiJS v8 docs. If the comparison is unreliable, skipping the guard here and just assigning is both simpler and correct.


What is clearly good

  • 60Hz patchRate: single line, large impact, correct rationale.
  • Timer text dirty-check: the 1-second periodic hang from canvas re-renders is a real bug and this fix is minimal and correct.
  • Control arrow draw-once via position/rotation: eliminates per-frame clear()/stroke() properly.
  • syncFromServerState scoreboard deduplication: identical fix applied in both BaseGameScene and MultiplayerScene — correct.
  • playerAcceleration: 1 instant-response default: straightforward config change with a clear semantic.
  • Reconciliation dead zone: conceptually sound; prevents micro-jitter from low-magnitude corrections.
  • AI double-init guard (!this.aiManager): prevents repeated calls even if the mechanism is worth hardening in initializeAI itself.

…rmula

- Fix ball friction extrapolation: use proper integral of exponential
  decay instead of incorrect v * f^t * t formula
- Remove NetworkSmoothnessMetrics (always allocated in production, unused)
- Remove dead constants: BALL_LERP_FACTOR, REMOTE_PLAYER_LERP_FACTOR,
  REMOTE_MOVEMENT_THRESHOLD, STATE_UPDATE_LOG_INTERVAL
- Remove unused cached.t field from remote player state
Copy link
Owner Author

@tim4724 tim4724 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review — previous issues addressed, a few new findings

All five items from the original review are properly resolved:

  • Ball friction integral is correct (v0 * (f^steps - 1) / (60 * ln(f)))
  • BALL_LERP_FACTOR and REMOTE_PLAYER_LERP_FACTOR removed from GameSceneConstants.ts
  • NetworkSmoothnessMetrics class removed entirely
  • cached.t is gone — the new lastRemotePlayerStates cache never stores a timestamp
  • REMOTE_MOVEMENT_THRESHOLD and STATE_UPDATE_LOG_INTERVAL removed

The dead-reckoning, error-correction, and control-arrow optimisations all look logically correct. A few real issues remain:


Critical

collectMovementInput() called twice per frame — second call's result shadows the first

In MultiplayerScene.updateGameState() (lines 100–143), collectMovementInput() is called at line 100 to drive client-side prediction, and then called again at line 120 for the network send. The first call's movement and hasMovement locals are then dead. The joystick or keyboard state won't change between the two calls (same frame), so the values will match, but the duplication is wasteful and confusing. If collectMovementInput ever becomes stateful or has a side effect the double-call will cause a bug.

// Current — two separate calls:
const movement = this.collectMovementInput()          // line 100, used only for prediction
// ...
if (this.controlledPlayerId) {
  const movement = this.collectMovementInput()        // line 120, re-declared, shadows outer

Fix: capture the result once at the top of the block and reuse it:

const movement = this.collectMovementInput()
const hasMovement =
  Math.abs(movement.x) > VISUAL_CONSTANTS.MIN_MOVEMENT_INPUT ||
  Math.abs(movement.y) > VISUAL_CONSTANTS.MIN_MOVEMENT_INPUT

// client-side prediction
if (hasMovement) { /* ... move sprite ... */ }

// network send
if (this.controlledPlayerId) {
  if (hasMovement && this.networkManager.isConnected()) {
    this.networkManager.sendInput(movement, false, this.controlledPlayerId, true)
    this.lastMovementWasNonZero = true
  } else if (!hasMovement && this.lastMovementWasNonZero && this.networkManager.isConnected()) {
    // ...stop input...
  }
}

Client-side prediction ignores playerAcceleration — diverges from server physics when acceleration < 1

updateGameState advances the controlled sprite with:

controlledSprite.x += movement.x * GAME_CONFIG.PLAYER_SPEED * dt

But PhysicsEngine.processPlayerInput uses player.velocityX += (target - velocityX) * accel when accel < 1. The default server config sets playerAcceleration: 1 (instant), so these match in production right now, but GAME_CONFIG.PLAYER_ACCELERATION is 0.15 and the server default only overrides it via GameState.ts. If playerAcceleration is ever changed to a non-1 value on the server, the client prediction will be wrong (no inertia) and reconciliation will fight it constantly.

This is worth a comment at minimum, or better: mirror the acceleration math in the client prediction path. The server default is documented only in GameState.ts; a future author may change it without realising the client breaks.


Warnings

Dead-reckoning snapshot not updated when ball is stationary

In syncFromServerState, the snapshot is only refreshed when at least one of x, y, vx, vy changes:

if (
  serverBall.x !== this.lastBallServerX ||
  serverBall.y !== this.lastBallServerY ||
  serverBall.velocityX !== this.lastBallServerVX ||
  serverBall.velocityY !== this.lastBallServerVY
) {
  this.lastBallStateReceivedAt = now
}

If the ball comes to a complete stop on the field (vx = vy = 0, position unchanged), the snapshot will never update its timestamp. The displacement formula evaluates to 0 when vx = vy = 0, so the ball position is correct. But lastBallStateReceivedAt will be stale, which means if the ball is then struck and a new snapshot arrives with new velocity, dtS will be computed from the old stale timestamp — potentially a large number that causes Math.pow(f, steps) to underflow to zero and snap the ball far ahead. The 100ms cap on dtS limits the worst case, but the root cause is avoidable. The simplest fix is to update the timestamp on every syncFromServerState call unconditionally (or at least on every free-ball call), not just on value changes:

this.lastBallStateReceivedAt = now  // always refresh; snapshot guards position/velocity separately

stateUpdateCount guard is redundant and misleading

if (!this.stateUpdateCount) this.stateUpdateCount = 0

stateUpdateCount is declared and initialised to 0 as a class field. The guard never fires and adds noise. Remove it.


initializeAI is called from playerJoin unconditionally, bypassing the !this.aiManager guard

The stateChange handler now correctly guards with !this.aiManager before calling initializeAI(). But the playerJoin handler (line 421) still calls this.initializeAI() unconditionally on every join event, which can re-initialise the AIManager and reset its internal state mid-match when a third player joins. The fix used in stateChange should be applied here too, or initializeAI itself should be idempotent (check internally if already initialised).


Suggestion

controlArrowDrawn flag lives on BaseGameScene but the draw-once geometry is never invalidated

If controlArrow is destroyed and recreated (e.g. the scene is reused, or initializeControlArrow is called a second time), controlArrowDrawn remains true and the arrow is never redrawn, leaving it invisible. initializeControlArrow should reset this.controlArrowDrawn = false when it creates a new Graphics object. Check BaseGameScene.initializeControlArrow to confirm whether this is handled.

- Remove double collectMovementInput() call; reuse single result
- Match server acceleration model in client prediction (was instant,
  now mirrors PhysicsEngine's lerp with PLAYER_ACCELERATION)
- Always refresh lastBallStateReceivedAt to prevent stale dead-reckoning
  timestamp from overshooting when stationary ball starts moving
- Remove redundant stateUpdateCount guard (field already initialized)
- Guard initializeAI() in playerJoin/playerLeave to avoid resetting
  AIManager mid-match
- Reset controlArrowDrawn flag when creating new Graphics object
- Move lastBallStateReceivedAt to stateChange handler (was set every
  render frame, making dead-reckoning dtS always 0)
- Revert client prediction to instant velocity (server uses
  playerAcceleration: 1, not GAME_CONFIG value of 0.15)
- Remove dead stateUpdateCount field (incremented but never read)
- Remove redundant if(state.ball) guard after early return
- Reset dead-reckoning and cache fields in cleanupGameState
- Delete lastRemotePlayerStates entry in removeRemotePlayer to prevent
  stale dead-reckoning data from leaking to reconnecting players
- Move lastBallStateReceivedAt into the ball change-detection block
  (setting it on every Colyseus patch reset dtS for non-ball patches;
  setting it per render frame made dtS always 0)
- Remove unnecessary performance.now() call for non-extrapolation paths
- Fix opponentTeamPlayerIds never populated in initializeAI: both teams
  are now correctly categorized so AIManager gets proper blue/red arrays
- Fix shouldAllowAIControl comparing player ID against session ID (format
  mismatch: 'abc-p1' vs 'abc'); use startsWith prefix check instead
- Stop update loop during returnToMenu 2s delay by setting isMultiplayer=false
  after disconnect, preventing sendInput calls on disconnected networkManager
- Add frameDeltaS fallback (1/60) for the first frame before updateGameState
  has set it, preventing zero-velocity remote player movement on init
- Revert STRONG_RECONCILE_FACTOR from 0.9 to 0.8 to avoid visible position
  pops on low-fps mobile devices (0.9 = 45px snap on 50px error)
- Destroy controlArrow Graphics in BaseGameScene.destroy() to prevent leak
- Add TICK_RATE to shared GAME_CONFIG; replace local MatchRoom constants
  with named TICK_RATE/MATCH_DURATION to make the coupling explicit
The velocity + 50% error correction approach created a steady-state
~12px oscillation that never converged — at 60Hz patches, the decay
couldn't keep up (errX = v_per_frame / correction_rate = 5.83/0.5).

Switch to pure extrapolation from last server position + velocity,
matching the technique already used for ball dead-reckoning. Also adds
network lag/loss simulator (?lag=150&loss=20) and diagnostic logging.
Strip all diagnostic console.log/warn calls from hot paths
(frame spikes, patch gaps, reconcile errors, remote snap logs).
Remove the network lag/loss simulator (readNetConditions, sim
field, _simState, snapshot copy, setTimeout delay). Simplify
setupStateListeners to build GameStateData directly from live
Colyseus state. Fix getUnifiedState() cache to update frame
counter on null. Add .gitignore entries for shared/src/ build
artifacts.
@tim4724 tim4724 merged commit fb81033 into main Feb 28, 2026
5 of 6 checks passed
@tim4724 tim4724 deleted the worktree-gleaming-cooking-wilkes branch February 28, 2026 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant