Skip to content

danstam17/mini_transformer_toolkit

Repository files navigation

This repo is meant to be a collection of my own implementations of diff attention/transformer arch and algorithms for text/video/image transformers

Its also meant to serve as a sandbox env for implementing different technqiues (example if I wanted to implement deepseek sparse attention) Everything is designed to be trained on 1 A100 GPU instance

Running

Text: python main_script.py --train text Image classifier: python main_script.py --train image VideoGPT: python main_script.py --train video

Objectives

  1. Learn in detail by doing how the following attention mechanisms works
  • Matrix Form Text Attention
  • Image Attention
  • Temporal Spacial Attention
  1. Train three types of transformers on basic datasets, including handling masking logic
  • text based shakespear one for text
  • mnist gen for images
  • mnist video for videos
  1. Implement KV Caching and the following Attention improvements (text only)
  • GQA
  • MLA
  • DSA (DeepSeek Sparse Attention) TBD
  1. Implement basic top K routing MoE
  2. Implement VQ-VAE

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages