HIP/ROCm fork of llama.cpp optimized for AMD gfx1030/RDNA2 architecture with support for PrismML's Bonsai Q1_0_G128 '1-bit' models, TurboQuant TQ3_0 KV cache, and EAGLE3 speculative decoding.
-
Updated
Apr 7, 2026 - C++
HIP/ROCm fork of llama.cpp optimized for AMD gfx1030/RDNA2 architecture with support for PrismML's Bonsai Q1_0_G128 '1-bit' models, TurboQuant TQ3_0 KV cache, and EAGLE3 speculative decoding.
ROCm/HIP fork of SGLang with TurboQuant tq2/tq3/tq4 KV cache, Triton and radix-cache serving, EAGLE3 speculative decoding, P-EAGLE checkpoint support, and PrismML Bonsai 1-bit GGUF compatibility on gfx1030/RDNA2.
Add a description, image, and links to the prismml topic page so that developers can more easily learn about it.
To associate your repository with the prismml topic, visit your repo's landing page and select "manage topics."