mamba-distributed

Mamba paper: https://arxiv.org/pdf/2312.00752

Mamba-2 paper: https://arxiv.org/pdf/2405.21060

How to train a model

Environment set up

pip install -r requirements.txt

Download the data

import your own data

Train the model

This model has been trained on 8 A100 GPU cards on a single node, and it takes about 2 hours to train a model with 1 epoch.

torchrun --standalone --nproc_per_node=8 train.py

Evaluate the model

HellaSwag To evaluate the model, I used the hellaswag benchmark. The evaluation results are shown below:

Model Name	Parameters	Benchmark Name	Score
Mamba-2 (Mine)	280M	hellaswag	0.324
Mamba-2 (HF)	370M	hellaswag	0.299
GPT-2	124M	hellaswag	0.294
GPT-2	350M	hellaswag	0.375
GPT-3	124M	hellaswag	0.337

Training/Validation Loss

In addition, we can follow the traning and validation loss curve to evaluate the model at each step.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
log		log
README.md		README.md
dataloader.py		dataloader.py
eval.py		eval.py
model.py		model.py
plot.ipynb		plot.ipynb
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mamba-distributed

How to train a model

Environment set up

Download the data

Train the model

Evaluate the model

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mamba-distributed

How to train a model

Environment set up

Download the data

Train the model

Evaluate the model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages