Skip to content

Fix ECC correction range: ±0.5 ULP → ±1 ULP#88

Open
josejg wants to merge 1 commit intoHomebrewML:mainfrom
josejg:josejg/fix-ecc-correction-range
Open

Fix ECC correction range: ±0.5 ULP → ±1 ULP#88
josejg wants to merge 1 commit intoHomebrewML:mainfrom
josejg:josejg/fix-ecc-correction-range

Conversation

@josejg
Copy link

@josejg josejg commented Mar 16, 2026

With round to nearest the quantization error is bounded by ±1/2 ULP, hence the logULP-1, however when doing stochastic rounding the error can be larger. E.g. if x is representable in FP32 and BF16, then x+eps will be rounded up with some low probability, and the error between the rounded value and x will be 1 ULP instead of 1/2.

So, to handle stochastic rounding correctly (the default in HeavyBall), the ECC needs to be computed using ±1ULP.

Tested with the precision_toy.py script from the blogpost

precision_toy

ECC with RNE produces errors ±0.5 ULP but when doing Stochastic Rounding
the error range increases to ±1 ULP. Since heavyball defaults to SR,
the `- 1` in `_log_ulp(x) - 1` limited the int8 correction to ±0.5 ULP.
This caused frequent clamping in the ECC correction terms, introducing a
per-step bias that accumulates through the EMA feedback loop.

Removing the `- 1` from both decode and compute_correction solves it.
Correction range doubles to ±1 ULP, clamping drops to 0%.
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5b526a141

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".


def decode(self, x):
ls = (_log_ulp(x) - 1).float()
ls = _log_ulp(x).float()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve legacy ECC scale when decoding checkpoints

Updating decode to use _log_ulp(x) makes previously saved ECC corrections decode at twice their original magnitude, because older checkpoints encoded correction in ULP/2 units. The load path (ChainOpt._restore_ecc_dtypes) only restores dtypes and does not rescale or recompute existing ::ecc tensors, so resuming training from checkpoints created before this change will restore incorrect optimizer/parameter state values when ECC is enabled.

Useful? React with 👍 / 👎.

narrow_f32 = self._bf16_to_f32(narrow) if narrow.dtype == torch.bfloat16 else narrow.float()
e = fp32 - narrow_f32
ls = (_log_ulp(narrow) - 1).float()
ls = _log_ulp(narrow).float()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep ECC normalization at half ULP for deterministic encode

_ULPState.encode still uses deterministic fp32.to(target.dtype) rounding (not stochastic_round_), so quantization error remains bounded by ±0.5 ULP. Normalizing with _log_ulp(narrow) here doubles the correction quantization step and reduces ECC reconstruction precision for every newly written state (including fp16 ECC modes, which never use stochastic rounding), introducing avoidable accuracy loss.

Useful? React with 👍 / 👎.

@josejg
Copy link
Author

josejg commented Mar 16, 2026

Ah, @ClashLuke I wrote the PR before the recent RNE changes, so i think this is not fully correct after rebasing. Both strategies work, the only difference is whether you prefer the main buffers to be biased or not. My intuition is that RNE+ULP/2 has 1 more bit of effective precision

@ClashLuke
Copy link
Member

Thank you for the fix and the detailed analysis on the ULP bound.
I'm planning to add ECC + SR as a first-class option in HeavyBall. I'll wire through a flag for the rounding mode so ECC can pick the right bound.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants