AXL (Architecture eXperimental Lab) — A family of 27 agentic coding models optimized for training and running on consumer CPUs. No GPU required. Train on a Ryzen 5 5600G. Deploy via Python API (full quality) or Ollama (degraded quality).
git clone https://github.com/Koinic/AXL.git
cd AXL
pip install -e .# Start the API server (full AXL multi-scale quality)
python AXL/API/serve_model.py --model checkpoints/axl_micro_lion --port 8880 --name axl-micro-lion
# Then call the OpenAI-compatible endpoint:
curl http://localhost:8880/v1/completions \
-H "Content-Type: application/json" \
-d '{"prompt": "def fibonacci(n):", "max_tokens": 100}'This works with any OpenAI-compatible tool (Continue.dev, LlamaIndex, LangChain, Cursor).
Warning: Ollama GGUF files use only the fine-scale encoder (1/3 of the AXL architecture). The reported PPL values apply to the full multi-scale model. For full quality, use the Python API above.
cd AXL/HuggingFace/AXL-Micro-Lion
ollama create axl-micro-lion -f Modelfile
ollama run axl-micro-lion "def fibonacci(n):"# 1. Generate training data
python scripts/generate_all_training_data.py --skip-hf
# 2. Train AXL-Micro with Lion optimizer
python scripts/retrain_all_lion.py --models micro
# 3. Your model is saved in checkpoints/axl_micro_lion/import torch
from multiscale_transformer.model.config import load_config, ModelConfig
from multiscale_transformer.model.model import MultiScaleTransformer
from multiscale_transformer.training.tokenizer import ByteTokenizer
# Load model
ckpt = torch.load("checkpoints/axl_micro_lion/axl_micro_lion.pt", map_location="cpu")
cfg = ckpt["config"]
config = ModelConfig(
vocab_size=cfg.get("vocab_size", 258), d_model=cfg.get("d_model", 256),
n_heads=cfg.get("n_heads", 4), d_ff=cfg.get("d_ff", 688),
n_layers_per_scale=cfg.get("n_layers_per_scale", 3),
n_cross_attn_layers=cfg.get("n_cross_attn_layers", 1),
max_seq_len=cfg.get("max_seq_len", 256),
)
model = MultiScaleTransformer(config)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()
# Generate
tokenizer = ByteTokenizer()
ids = torch.tensor([tokenizer.encode("def fibonacci(n):\n")], dtype=torch.long)
with torch.no_grad():
out = model.generate(ids, max_new_tokens=100, temperature=0.8, top_k=40)
print(tokenizer.decode(out[0].tolist()))AXL is a multi-scale transformer architecture designed by Koinic for CPU-first training and inference. It processes token sequences at three parallel resolution scales — fine (1x), medium (2x), and coarse (4x) — each with a dedicated encoder stack, cross-scale attention, and adaptive gating fusion.
Why CPU-first? Training a 1B-parameter model on GPU costs $10,000+. AXL trains the same model on a consumer CPU for $0.004 in electricity.
Key innovations:
- Lion optimizer: 20x faster convergence than SGD, 50% less memory than AdamW
- Byte-level tokenizer (vocab=258): No vocabulary training, works with any language
- GGUF export: Real Q4_K_M quantization via llama.cpp
- Progressive training: Scale to 1B+ params on 16GB RAM
AXL/ # Distribution package
├── GitHub/ # Code for others to train AXL
│ ├── multiscale_transformer/ # Core library
│ │ ├── model/ # Attention, blocks, config, model, layers
│ │ ├── training/ # Lion, GaLore, viz, distill, streaming, dataset
│ │ ├── data/ # Quality filter, data mixture
│ │ ├── export/ # GGUF export, HF compat, Ollama server
│ │ ├── axl_v2/ # Agentic extensions (tool router, self-debug, vision)
│ │ ├── benchmarks/ # HumanEval, MBPP, Perplexity, CodeBLEU
│ │ └── tests/ # Test suite
│ ├── scripts/ # All training, export, benchmark, utility scripts
│ ├── configs/ # YAML training configs + BPE tokenizer
│ ├── data/ # Training datasets
│ ├── docs/ # Paper, references, quickstart
│ ├── examples/ # Inference and training examples
│ ├── llama.cpp/ # Quantization tools (Windows binaries)
│ ├── README.md # This file
│ ├── LICENSE # Apache 2.0
│ ├── CHANGELOG.md # Version history
│ ├── requirements.txt # Python dependencies
│ ├── pyproject.toml # Package config
│ ├── Dockerfile # Container build
│ └── docker-compose.yml # Multi-service orchestration
│
└── HuggingFace/ # Ready-to-use models
├── README.md # Organization overview
├── paper_axl.tex # Research paper
├── references.bib # Bibliography
├── AXL_ARCHITECTURE.md # Architecture documentation
├── AXL-Code-1B-Lion/ # 27 model directories
├── AXL-Reasoning-Lion/
├── AXL-Micro-Lion/
├── ...
└── AXL-Vision-v2/
python scripts/download_training_data.py --max_gb 5
# Downloads Python code from HuggingFace datasetspython scripts/generate_all_training_data.py --skip-hf
# Generates training data for all model types- Place your text files in
data/(any.txtor.jsonlfile) - Point the training script to your file:
python scripts/train_axl_micro.py --data_path data/my_code.txt --max_time 600- Text files (
.txt): Raw text, one file per dataset - JSONL files (
.jsonl): JSON lines withsource_code/target_codefields (for translation) - Byte-level tokenizer: Any file works — no preprocessing needed
from multiscale_transformer.data.quality_filter import DataFilter
filter = DataFilter(min_lines=2, max_lines=500, require_syntax_valid=True)
clean = filter.filter_file("data/raw_code.txt", "data/clean_code.txt")Configs are in configs/. Each defines architecture:
axl_micro.yaml— 12.8M params, fastest training (3 min)axl_code_1b.yaml— 318M params, largest model (20 min)- Or use the auto-builder:
from multiscale_transformer.training.model_builder import ModelBuilder
# Train one model with Lion optimizer (recommended)
python scripts/retrain_all_lion.py --models micro # 3 minutes
python scripts/retrain_all_lion.py --models code_1b # 20 minutes
# Train all models sequentially
python scripts/retrain_all_lion.py # ~50 minutes total
# Train with custom data
python scripts/train_axl_code_1b_lion.py --max_time 1200python scripts/quantize_all_models.py --models code_1b_lion
# Produces F16 + Q4_K_M GGUF files in checkpoints/# Primary: Python API server (full multi-scale quality)
python AXL/API/serve_model.py --model checkpoints/axl_code_1b_lion --port 8880 --name axl-code-1b
# OpenAI-compatible: http://localhost:8880/v1/completions
# Works with Continue.dev, LlamaIndex, LangChain, Cursor
# Alternative: Ollama (degraded quality - uses only fine-scale encoder)
cd checkpoints/axl_code_1b_lion
ollama create axl-code-1b -f Modelfile
ollama run axl-code-1b "def quicksort(arr):"| Model | Params | Time (Lion) | RAM |
|---|---|---|---|
| AXL-Comment-Lion | 7.2M | 2 min | 30 MB |
| AXL-Micro-Lion | 12.8M | 3 min | 107 MB |
| AXL-Reasoning-Lion | 70M | 10 min | 414 MB |
| AXL-Code-1B-Lion | 318M | 20 min | 2 GB |
Edit multiscale_transformer/model/config.py — ModelConfig dataclass controls:
d_model: Model dimension (64–1024)n_heads: Number of attention headsd_ff: Feed-forward dimensionn_layers_per_scale: Transformer layers per scale (3 scales total)max_seq_len: Context window in bytes
Or use the auto-builder:
from multiscale_transformer.training.model_builder import ModelBuilder, suggest_models
suggestions = suggest_models(available_ram_gb=16.0)
for s in suggestions:
print(f"{s['name']}: {s['actual_params']/1e6:.0f}M params, {s['estimated_ram_gb']:.1f}GB RAM")- Copy
scripts/train_axl_micro.pyas a template - Change the
ModelConfigto your architecture - Point
get_data()to your dataset - Choose optimizer:
Lion(recommended) ortorch.optim.SGD - Run:
python scripts/my_new_trainer.py
from multiscale_transformer.training.distill import DistillationTrainer
trainer = DistillationTrainer(
teacher_path="checkpoints/axl_code_1b_lion/axl_code_1b_lion.pt",
student_config=your_student_config,
temperature=2.0, alpha=0.7,
)
trainer.train(text_data, max_time=600, output_dir="checkpoints/distilled")from multiscale_transformer.training.viz import TrainLogger
logger = TrainLogger(log_dir="runs/my_experiment", backend="csv")
for step in range(1000):
logger.log(step, loss=0.5, lr=1e-4)
logger.close()| Model | Params | PPL | tok/s | Q4_K_M | Training |
|---|---|---|---|---|---|
| AXL-Code-1B-Lion | 318M | 1.90 | 6.1 | 188 MB | 20 min |
| AXL-Reasoning-Lion | 70M | 1.03 | 22.4 | 44 MB | 10 min |
| AXL-Refactor-Lion | 19.1M | 1.02 | 52.2 | 12 MB | 3 min |
| AXL-TestGen-Lion | 15.2M | 1.02 | 57.3 | 18 MB | 3 min |
| AXL-Chat-Lion | 9.9M | 1.03 | 73.4 | 7 MB | 3 min |
| AXL-Micro-Lion | 12.8M | 1.04 | 66.2 | 15 MB | 3 min |
| AXL-Secure-Lion | 11.7M | 1.03 | 63.5 | 8 MB | 3 min |
| AXL-Docs-Lion | 9.9M | 1.01 | 72.8 | 7 MB | 2 min |
| AXL-Comment-Lion | 7.2M | 1.02 | 75.8 | 5 MB | 2 min |
| Model | Params | PPL | Focus | GGUF |
|---|---|---|---|---|
| AXL-Micro-600K | 600K | 63.08 | Demo | 1 MB |
| AXL-Micro-8M | 12.8M | 3.13 | Code gen | 25 MB |
| AXL-Coder-15M | 26.0M | 5.97 | Agentic | 50 MB |
| AXL-Debugger-8M | 14.1M | 6.60 | Bug fixing | 27 MB |
| AXL-Fixer-12M | 20.9M | 5.90 | Debug | 40 MB |
| AXL-Reasoning-70M | 70M | 1.93 | CoT | 134 MB |
| AXL-300M | 322M | 5.98 | Flagship | 616 MB |
| AXL-Chat-10M | 9.9M | 1.02 | Dialogue | 19 MB |
| AXL-TestGen-15M | 15.2M | 1.01 | Test gen | 30 MB |
| AXL-Refactor-20M | 19.1M | 1.01 | Refactoring | 37 MB |
| AXL-Docs-8M | 9.9M | 1.03 | Docstrings | 19 MB |
| AXL-Comment-5M | 7.2M | 1.01 | Comments | 14 MB |
| AXL-Secure-10M | 11.7M | 1.01 | Security | 23 MB |
| Model | Params | PPL | Focus | GGUF |
|---|---|---|---|---|
| AXL-Code-1B | 318M | 31.22 | Code gen (SGD) | 606 MB |
| AXL-Chat-Pro | 12.8M | 3.42 | Advanced chat | 25 MB |
| AXL-Translate | 15.2M | 1.86 | Code translation | 29 MB |
| AXL-Vision-0.8M | 1M | — | Vision encoder | — |
| AXL-Vision-v2 | 4.1M | — | UI vision | — |
# Build and start all services
docker-compose up --build
# Services:
# - api-server (port 8880): OpenAI-compatible API
# - mesh-registry (port 8900): Model mesh discovery
# - axl-coder (port 8901): Coder model server
# - axl-debugger (port 8903): Debugger model serverReal Q4_K_M quantization via llama.cpp:
# Quantize a single model
./llama.cpp/llama-quantize.exe checkpoints/model-f16.gguf checkpoints/model-q4_k_m.gguf Q4_K_M
# Quantize all models
python scripts/quantize_all_models.pyAXL processes sequences at three parallel resolution scales:
- Fine (1x): Processes all tokens. Attention cost: O(N^2 d)
- Medium (2x): Tokens grouped in pairs via learned downsampling. Cost: O(N^2 d/4)
- Coarse (4x): Tokens grouped in quadruplets. Cost: O(N^2 d/16)
Cross-scale attention connects all scale pairs, and adaptive gating fusion combines representations at fine resolution. The Lion optimizer (sign-based momentum) provides 20x faster convergence than SGD.
Full details: Research Paper | Architecture Doc
Full LaTeX paper: docs/paper_axl.tex
Covers: multi-scale architecture, Lion optimizer, related work (MEGABYTE, ByT5, EvaByte), uniqueness comparison, 27-model benchmark results, training cost analysis ($0.004 for Code-1B).
@misc{axl_2026,
title={AXL: Multi-Scale Agentic Transformer for CPU-Optimized Code Generation},
author={Koinic},
year={2026},
url={https://github.com/Koinic/AXL}
}Apache 2.0 — see LICENSE.