This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
The Modular Platform is a unified platform for AI development and deployment that includes:
- MAX: High-performance inference server with OpenAI-compatible endpoints for LLMs and AI models
- Mojo: A new programming language that bridges Python and systems programming, optimized for AI workloads
All builds use the ./bazelw wrapper from the repository root:
# Build everything
./bazelw build //...
# Build specific targets
./bazelw build //max/kernels/...
./bazelw build //mojo/stdlib/...
# Run tests
./bazelw test //...
./bazelw test //max/kernels/test/linalg:test_matmulMany directories include pixi.toml files for environment management. Use Pixi
when present:
# Install Pixi environment (run once per directory)
pixi install
# Run Mojo files through Pixi
pixi run mojo [file.mojo]
# Format Mojo code
pixi run mojo format ./
# Use predefined tasks from pixi.toml
pixi run main # Run main example
pixi run test # Run tests
pixi run hello # Run hello.mojo
# Common Pixi tasks available in different directories:
# - /mojo/: build, tests, examples, benchmarks
# - /max/: llama3, mistral, generate, serve
# - /examples/*/: main, test, hello, dev-server, format
# List available tasks
pixi task list# Install the MAX nightly within a Python virtual environment using pip
pip install modular --index-url https://dl.modular.com/public/nightly/python/simple/ --extra-index-url https://download.pytorch.org/whl/cpu
# Install MAX globally using Pixi, an alternative to the above
pixi global install -c conda-forge -c https://conda.modular.com/max-nightly
# Start OpenAI-compatible server
max serve --model-path=modularai/Llama-3.1-8B-Instruct-GGUF
# Run with Docker
docker run --gpus=1 -p 8000:8000 docker.modular.com/modular/max-nvidia-full:latest --model-path modularai/Llama-3.1-8B-Instruct-GGUFmodular/
├── mojo/ # Mojo programming language
│ ├── stdlib/ # Standard library implementation
│ ├── docs/ # User documentation
│ ├── proposals/ # Language proposals (RFCs)
│ └── integration-test/ # Integration tests
├── max/ # MAX framework
│ ├── kernels/ # High-performance Mojo kernels (GPU/CPU)
│ ├── serve/ # Python inference server (OpenAI-compatible)
│ ├── pipelines/ # Model architectures (Python)
│ └── nn/ # Neural network operators (Python)
├── examples/ # Usage examples
├── benchmark/ # Benchmarking tools
└── bazel/ # Build system configuration
-
Language Separation:
- Low-level performance kernels in Mojo (
max/kernels/) - High-level orchestration in Python (
max/serve/,max/pipelines/)
- Low-level performance kernels in Mojo (
-
Hardware Abstraction:
- Platform-specific optimizations via dispatch tables
- Support for NVIDIA/AMD GPUs, Intel/Apple CPUs
- Device-agnostic APIs with hardware-specific implementations
-
Memory Management:
- Device contexts for GPU memory management
- Host/Device buffer abstractions
- Careful lifetime management in Mojo code
-
Testing Philosophy:
- Tests mirror source structure
- Use
littool with FileCheck validation - Hardware-specific test configurations
- Migrating to
testingmodule assertions
- Work from
mainbranch (synced with nightly builds) stablebranch for released versions- Create feature branches for significant changes
# Run tests before committing
./bazelw test //path/to/your:target
# Run with sanitizers
./bazelw test --config=asan //...
# Multiple test runs
./bazelw test --runs_per_test=10 //...- Use
mojo formatfor Mojo code - Follow existing patterns in the codebase
- Add docstrings to public APIs
- Sign commits with
git commit -s
# Run benchmarks with environment variables
./bazelw run //max/kernels/benchmarks/gpu:bench_matmul -- \
env_get_int[M]=1024 env_get_int[N]=1024 env_get_int[K]=1024
# Use autotune tools
python max/kernels/benchmarks/autotune/kbench.py benchmarks/gpu/bench_matmul.yaml- Use nightly Mojo builds for development
- Install nightly VS Code extension
- Avoid deprecated types like
Tensor(use modern alternatives) - Follow value semantics and ownership conventions
- Use
Referencetypes with explicit lifetimes in APIs
- Fine-grained control over memory layout and parallelism
- Hardware-specific optimizations (tensor cores, SIMD)
- Vendor library integration when beneficial
- Performance improvements must include benchmarks
- Always check Mojo function return values for errors
- Ensure coalesced memory access patterns on GPU
- Minimize CPU-GPU synchronization points
- Avoid global state in kernels
- Never commit secrets or large binary files
Many benchmarks and tests use environment variables:
env_get_int[param_name]=valueenv_get_bool[flag_name]=true/falseenv_get_dtype[type]=float16/float32
Currently accepting contributions for:
- Mojo standard library (
/mojo/stdlib/) - MAX AI kernels (
/max/kernels/)
Other areas are not open for external contributions.
- Linux: x86_64, aarch64
- macOS: ARM64 (Apple Silicon)
- Windows: Not currently supported
- Docs index: https://docs.modular.com/llms.txt
- Mojo API docs: https://docs.modular.com/llms-mojo.txt
- Python API docs: https://docs.modular.com/llms-python.txt
- Comprehensive docs: https://docs.modular.com/llms-full.txt
- Atomic Commits: Keep commits small and focused. Each commit should address a single, logical change. This makes it easier to understand the history and revert changes if needed.
- Descriptive Commit Messages: Write clear, concise, and informative commit messages. Explain the why behind the change, not just what was changed. Use a consistent format (e.g., imperative mood: "Fix bug", "Add feature").
- Commit titles: git commit titles should have the
[Stdlib]or[Kernel]depending on whether the kernel is modified and if they are modifying GPU functions then they should use[GPU]tag as well. - The commit messages should be surrounded by BEGIN_PUBLIC and END_PUBLIC
- Here is an example template a git commit
[Kernels] Some new feature
BEGIN_PUBLIC
[Kernels] Some new feature
This add a new feature for [xyz] to enable [abc]
END_PUBLIC