Running

This repo is meant to be a collection of my own implementations of diff attention/transformer arch and algorithms for text/video/image transformers

Its also meant to serve as a sandbox env for implementing different technqiues (example if I wanted to implement deepseek sparse attention) Everything is designed to be trained on 1 A100 GPU instance

Running

Text: python main_script.py --train text Image classifier: python main_script.py --train image VideoGPT: python main_script.py --train video

Objectives

Learn in detail by doing how the following attention mechanisms works

Matrix Form Text Attention
Image Attention
Temporal Spacial Attention

Train three types of transformers on basic datasets, including handling masking logic

text based shakespear one for text
mnist gen for images
mnist video for videos

Implement KV Caching and the following Attention improvements (text only)

GQA
MLA
DSA (DeepSeek Sparse Attention) TBD

Implement basic top K routing MoE
Implement VQ-VAE

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
arch		arch
datasets		datasets
fine_tuning_implementations		fine_tuning_implementations
generation		generation
models		models
training		training
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
env.example		env.example
input.txt		input.txt
main_script.py		main_script.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Running

Objectives

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Running

Objectives

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages