Skip to content

j9smith/alchemy-lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Alchemy Lab

A Diffusion Model Research Framework

License

Alchemy Lab is a modular research infrastructure for building, training, and deploying diffusion models for image generation.

It is designed to:

  • Enable fast, configuration-driven experimentation
  • Facilitate evaluation, monitoring, and analysis
  • Bridge research protoypes and scalable systems

It is not intended to compete with high-level libraries such as those offered by Hugging Face, or vLLM/sglang. Instead, it is closer to a personal research platform.

Features

  • Modular diffusion core
  • Composable UNet-style architectures
  • Support for latent diffusion
  • Configuration-driven experiment management
  • Structured training harness with clear separation from model primitives
  • Support for distributed training (DDP)

Repository Structure

Alchemy Lab is organised as a monorepo with three pillars:

  • core - mathematical primitives and model components
  • lab - experiment configuration and training infrastructure
  • runtime - inference and deployment capabilities
src/alchemy/
|-- core/        # diffusion primitives, model components
|-- lab/         # training infrastructure
|-- runtime/     # inference (planned)

Installation & Usage

Alchemy Lab may be installed using uv after cloning to your local machine:

git clone https://github.com/j9smith/alchemy-lab
cd alchemy-lab
uv sync

Training

Once installed, experiments can be parameterised by amending the configuration files found in lab/configs, and then executed via the entrypoint lab/cli/train.py:

cd alchemy-lab/src/alchemy/lab/cli
uv run python train.py

Example config file:

defaults:
  - model: unet2d
  - vae: sd_vae_ft_mse
  - data: celeba_256
  - optim: adamw
  - loss: eps_linear
  - logging: default
  - checkpoints: default

train:
  resume: "checkpoint.pt"
  precision: fp32
  max_steps: 50000
  lr: 0.0002
  ema_decay: 0.9999
  log_dir: "./log_dir/"
  experiment_name: "default"
  save_every_n_steps: 5000
  save_path: "./weights/"
  save_prefix: "unet"

dist:
  backend: nccl

Sampling

Images can be sampled by loading saved checkpoints via the cli/sample.py script:

uv run python sample.py --ckpt ./weights/unet_stepXXX.pt --device cuda --use_ema --n 24

Sampled images are stored in output/samples.png.

Profiling

Training can be profiled by leveraging in-code NVTX annotations to produce NVIDIA Nsight Systems reports. This can be achieved by loading the dedicated profiling script lab/cli/profiling.py under nsys (ensure that nsys is installed):

nsys profile \
  --output ~/reports/alchemy_$(date +%Y%m%d_%H%M%S) \
  --trace cuda,nvtx,osrt,cublas,cudnn \
  --capture-range cudaProfilerApi \
  --capture-range-end stop \
  --stats true \
  --gpu-metrics-devices all \
  uv run python profiling.py

Roadmap

Alchemy Lab is very much a work in progress. Planned extensions include:

  • DiT architecture
  • Mixed precision training
  • Distributed training (FSDP)
  • Performance enhancements
  • ONNX export
  • Generic deployment infrastructure

About

Training and inference infrastructure for diffusion models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors