An attempt to implement the TTT paper but by making the inner model linear attention + MLP instead of doing linear attention over grads manually and analytically, i.e. let torch autograd do the thing.
| Name | Name | Last commit date | ||
|---|---|---|---|---|
An attempt to implement the TTT paper but by making the inner model linear attention + MLP instead of doing linear attention over grads manually and analytically, i.e. let torch autograd do the thing.