Feature Description
Dynamic int8 quantization of Q, perform a XMX_INT8 GEMM for Q*K.
Motivation and Use Case
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Alternatives Considered
No response
Definition of Done
No response
Additional Context
No response
Feature Description
Dynamic int8 quantization of Q, perform a XMX_INT8 GEMM for Q*K.
Motivation and Use Case
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Alternatives Considered
No response
Definition of Done
No response
Additional Context
No response