Status:
- support
deterministic_hadamard_matrix and deterministic_hadamard_matrix
- support inference with transformers with a triton mxfp4 kernel
- support rtn and autoroud tuning (iters>0)
Todo:
- align the implementation of SpinQuant: LLM Quantization with Learned Rotations, especially R2/R3, and implement structured fused
- support online transform for vllm inference
- support nvfp4
- verify multi-cards
Originally posted by @lkk12014402 in #1323
Originally posted by @lkk12014402 in #1323