💬 Discord •
This repository aims to implement SOTA efficient token/channel mixers. Any technologies related to non-Vanilla Transformer are welcome. If you are interested in this repository, please join our Discord.
Install Torch(>=2.6.0), fla, xopes first, then install xmixers:
git clone https://github.com/Doraemonzzz/xmixers.git
cd xmixers
pip install -e .
| Paper | Code | Config |
|---|---|---|
| Elucidating the Design Space of Decay in Linear Attention | Link, core code: line 247 to 262. | Core method link, Ablation link |
To reproduce the results of the paper, we conducted training using the flame framework. First, we configured the environment in accordance with flame's requirements, then used flame's training script, and only needed to replace "config" with the corresponding name.