ferrograd

A from-scratch tensor compiler in Rust, inspired by tinygrad.

Architecture

Follows the same pipeline as tinygrad, simplified:

Tensor API → Lazy UOp Graph → Scheduling → Codegen → Compilation → Execution

Each layer is a standalone module you can study independently:

Module	What it does	Compiler concept
`tensor`	Lazy tensor API with kernel fusion	Lazy evaluation, scheduling
`uop`	DAG-based IR with hash-consing	Intermediate representations
`schedule`	Lazy graph → proto-kernels + rangeify	Scheduling, lowering
`optimize`	Symbolic simplification, upcast, unroll	Compiler optimizations
`codegen`	UOp graph → C source	Code emission
`gradient`	Reverse-mode autograd as graph transforms	Automatic differentiation
`rewrite`	Fixed-point graph simplification	Term rewriting
`nn`	Linear layers and loss functions	Neural network primitives
`dtype`	Data types bridging Rust, IR, and C	Type systems
`device`	Buffer, Device trait, CPU backend	Hardware abstraction
`shape`	Shape metadata and transformations	Tensor algebra
`dataset`	MNIST with auto-download and caching	Data loading

Example

Train a two-layer MLP on MNIST from scratch:

use ferrograd::dataset::MNISTDataset;
use ferrograd::nn::{Linear, Parameters};
use ferrograd::optim::Sgd;
use ferrograd::tensor::{cpu, Tensor};

// Define a model
struct Mlp { l1: Linear, l2: Linear }

impl Mlp {
    fn forward(&self, x: &Tensor) -> Tensor {
        let h = self.l1.forward(x).relu();
        self.l2.forward(&h)
    }
}

impl Parameters for Mlp {
    fn parameters(&self) -> Vec<Tensor> {
        [self.l1.parameters(), self.l2.parameters()].concat()
    }
}

// Train
let dataset = MNISTDataset::load().unwrap();
let model = Mlp { l1: Linear::new(784, 128), l2: Linear::new(128, 10) };
let optim = Sgd::new(model.parameters(), 0.01);

let batch_x = dataset.train_images.narrow(0, 0, 256);
let batch_t = one_hot(&dataset.train_labels[..256], 10);

let loss = model.forward(&batch_x).cross_entropy(&batch_t);
loss.backward();  // reverse-mode autograd
optim.step();     // SGD update

Everything is lazy — forward, cross_entropy, and backward just build a graph. optim.step() fuses it into kernels, compiles C via clang, and executes.

cargo run --example mnist --release   # full training loop
DEBUG=4 cargo run --example demo      # see generated C source

Status

The compiler pipeline is functional end-to-end: lazy tensor graphs, multi-kernel scheduling, reverse-mode autograd, and CPU code generation via clang. The MNIST example trains a small MLP from scratch.

Building

cargo clippy   # build + lint (clippy pedantic is on)
cargo test     # run all tests

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
dev		dev
examples		examples
src		src
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ferrograd

Architecture

Example

Status

Building

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ferrograd

Architecture

Example

Status

Building

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages