Skip to content

[bug] int32 signed overflow in buffer x #884

@darxradi3nt

Description

@darxradi3nt

offset corrupts gradients silently for large batch/dstate configs

The buffer x pointer offset in both the forward and backward selective-scan CUDA kernels is computed using an all-int32 multiply.
When batch * dim * n_chunks * dstate exceeds max int32, it wraps the result negative.
The kernel then reads and writes memory located before x_ptr, silently corrupting adjacent tensors or triggering a CUDA illegal-address fault.

fixed in pr #883

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions