Skip to content

[Feature]: Add sageattentionV1 support #1503

@luoyu-intel

Description

@luoyu-intel

Feature Description

Dynamic int8 quantization of Q, perform a XMX_INT8 GEMM for Q*K.

Motivation and Use Case

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

Alternatives Considered

No response

Definition of Done

No response

Additional Context

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions