Skip to content

fix: remove double softplus on dt in non-fused Mamba2#845

Open
Mr-Neutr0n wants to merge 1 commit intostate-spaces:mainfrom
Mr-Neutr0n:fix/mamba2-double-softplus-dt
Open

fix: remove double softplus on dt in non-fused Mamba2#845
Mr-Neutr0n wants to merge 1 commit intostate-spaces:mainfrom
Mr-Neutr0n:fix/mamba2-double-softplus-dt

Conversation

@Mr-Neutr0n
Copy link
Copy Markdown

Bug

In the non-fused/reference path of Mamba2Simple, softplus is applied to dt twice — once explicitly before passing to the SSM step, and once inside the step function. This over-smooths the timestep values.

Fix

Removed the redundant softplus application to match the fused kernel behavior.

@albertfgu
Copy link
Copy Markdown
Collaborator

Sorry, can you describe the exact code path where the softplus is applied twice?

It seems you're referring to the path starting line 158? Are you claiming the softplus is applied on line 162 and again on line 182 (mamba_chunk_scan_combined)? But I don't think the latter applies the softplus by default.

@darxradi3nt
Copy link
Copy Markdown
Contributor

mamba_chunk_scan_combined defaults to dt_softplus=False, so it receives the already-transformed dt and applies no further softplus. softplus is applied exactly once.
The proposed change moves the operation inside the Triton kernel via dt_softplus=True, which produces the same result. That may be a cleaner factoring, but there's no double application to fix in the current code.

Suggest re-labeling as a refactor/cleanup rather than a bug fix and close the PR

@albertfgu
Copy link
Copy Markdown
Collaborator

Yes, if you relabel this I can merge it

@darxradi3nt
Copy link
Copy Markdown
Contributor

@Mr-Neutr0n?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants