AXL — Multi-Scale Agentic Transformer for CPU-Optimized Code Generation

AXL (Architecture eXperimental Lab) — A family of 27 agentic coding models optimized for training and running on consumer CPUs. No GPU required. Train on a Ryzen 5 5600G. Deploy via Python API (full quality) or Ollama (degraded quality).

Quick Start

Installation

git clone https://github.com/Koinic/AXL.git
cd AXL
pip install -e .

Run a Model (Full Quality via Python API)

# Start the API server (full AXL multi-scale quality)
python AXL/API/serve_model.py --model checkpoints/axl_micro_lion --port 8880 --name axl-micro-lion

# Then call the OpenAI-compatible endpoint:
curl http://localhost:8880/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"prompt": "def fibonacci(n):", "max_tokens": 100}'

This works with any OpenAI-compatible tool (Continue.dev, LlamaIndex, LangChain, Cursor).

Run a Model via Ollama (Degraded Quality)

Warning: Ollama GGUF files use only the fine-scale encoder (1/3 of the AXL architecture). The reported PPL values apply to the full multi-scale model. For full quality, use the Python API above.

cd AXL/HuggingFace/AXL-Micro-Lion
ollama create axl-micro-lion -f Modelfile
ollama run axl-micro-lion "def fibonacci(n):"

Train Your Own Model (3 minutes)

# 1. Generate training data
python scripts/generate_all_training_data.py --skip-hf

# 2. Train AXL-Micro with Lion optimizer
python scripts/retrain_all_lion.py --models micro

# 3. Your model is saved in checkpoints/axl_micro_lion/

Python Inference

import torch
from multiscale_transformer.model.config import load_config, ModelConfig
from multiscale_transformer.model.model import MultiScaleTransformer
from multiscale_transformer.training.tokenizer import ByteTokenizer

# Load model
ckpt = torch.load("checkpoints/axl_micro_lion/axl_micro_lion.pt", map_location="cpu")
cfg = ckpt["config"]
config = ModelConfig(
    vocab_size=cfg.get("vocab_size", 258), d_model=cfg.get("d_model", 256),
    n_heads=cfg.get("n_heads", 4), d_ff=cfg.get("d_ff", 688),
    n_layers_per_scale=cfg.get("n_layers_per_scale", 3),
    n_cross_attn_layers=cfg.get("n_cross_attn_layers", 1),
    max_seq_len=cfg.get("max_seq_len", 256),
)
model = MultiScaleTransformer(config)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

# Generate
tokenizer = ByteTokenizer()
ids = torch.tensor([tokenizer.encode("def fibonacci(n):\n")], dtype=torch.long)
with torch.no_grad():
    out = model.generate(ids, max_new_tokens=100, temperature=0.8, top_k=40)
print(tokenizer.decode(out[0].tolist()))

What is AXL?

AXL is a multi-scale transformer architecture designed by Koinic for CPU-first training and inference. It processes token sequences at three parallel resolution scales — fine (1x), medium (2x), and coarse (4x) — each with a dedicated encoder stack, cross-scale attention, and adaptive gating fusion.

Why CPU-first? Training a 1B-parameter model on GPU costs $10,000+. AXL trains the same model on a consumer CPU for $0.004 in electricity.

Key innovations:

Lion optimizer: 20x faster convergence than SGD, 50% less memory than AdamW
Byte-level tokenizer (vocab=258): No vocabulary training, works with any language
GGUF export: Real Q4_K_M quantization via llama.cpp
Progressive training: Scale to 1B+ params on 16GB RAM

Project Structure

AXL/                              # Distribution package
├── GitHub/                       # Code for others to train AXL
│   ├── multiscale_transformer/   # Core library
│   │   ├── model/                # Attention, blocks, config, model, layers
│   │   ├── training/             # Lion, GaLore, viz, distill, streaming, dataset
│   │   ├── data/                 # Quality filter, data mixture
│   │   ├── export/               # GGUF export, HF compat, Ollama server
│   │   ├── axl_v2/               # Agentic extensions (tool router, self-debug, vision)
│   │   ├── benchmarks/           # HumanEval, MBPP, Perplexity, CodeBLEU
│   │   └── tests/                # Test suite
│   ├── scripts/                  # All training, export, benchmark, utility scripts
│   ├── configs/                  # YAML training configs + BPE tokenizer
│   ├── data/                     # Training datasets
│   ├── docs/                     # Paper, references, quickstart
│   ├── examples/                 # Inference and training examples
│   ├── llama.cpp/                # Quantization tools (Windows binaries)
│   ├── README.md                 # This file
│   ├── LICENSE                   # Apache 2.0
│   ├── CHANGELOG.md              # Version history
│   ├── requirements.txt          # Python dependencies
│   ├── pyproject.toml            # Package config
│   ├── Dockerfile                # Container build
│   └── docker-compose.yml        # Multi-service orchestration
│
└── HuggingFace/                  # Ready-to-use models
    ├── README.md                 # Organization overview
    ├── paper_axl.tex             # Research paper
    ├── references.bib            # Bibliography
    ├── AXL_ARCHITECTURE.md       # Architecture documentation
    ├── AXL-Code-1B-Lion/         # 27 model directories
    ├── AXL-Reasoning-Lion/
    ├── AXL-Micro-Lion/
    ├── ...
    └── AXL-Vision-v2/

How to Create Training Data

Option 1: Download from HuggingFace

python scripts/download_training_data.py --max_gb 5
# Downloads Python code from HuggingFace datasets

Option 2: Generate Synthetic Data

python scripts/generate_all_training_data.py --skip-hf
# Generates training data for all model types

Option 3: Use Your Own Data

Place your text files in data/ (any .txt or .jsonl file)
Point the training script to your file:

python scripts/train_axl_micro.py --data_path data/my_code.txt --max_time 600

Data Format

Text files (.txt): Raw text, one file per dataset
JSONL files (.jsonl): JSON lines with source_code/target_code fields (for translation)
Byte-level tokenizer: Any file works — no preprocessing needed

Data Quality Filtering

from multiscale_transformer.data.quality_filter import DataFilter
filter = DataFilter(min_lines=2, max_lines=500, require_syntax_valid=True)
clean = filter.filter_file("data/raw_code.txt", "data/clean_code.txt")

How to Train a Model

Step 1: Choose a Config

Configs are in configs/. Each defines architecture:

axl_micro.yaml — 12.8M params, fastest training (3 min)
axl_code_1b.yaml — 318M params, largest model (20 min)
Or use the auto-builder: from multiscale_transformer.training.model_builder import ModelBuilder

Step 2: Train

# Train one model with Lion optimizer (recommended)
python scripts/retrain_all_lion.py --models micro    # 3 minutes
python scripts/retrain_all_lion.py --models code_1b  # 20 minutes

# Train all models sequentially
python scripts/retrain_all_lion.py                    # ~50 minutes total

# Train with custom data
python scripts/train_axl_code_1b_lion.py --max_time 1200

Step 3: Export to GGUF

python scripts/quantize_all_models.py --models code_1b_lion
# Produces F16 + Q4_K_M GGUF files in checkpoints/

Step 4: Deploy

# Primary: Python API server (full multi-scale quality)
python AXL/API/serve_model.py --model checkpoints/axl_code_1b_lion --port 8880 --name axl-code-1b
# OpenAI-compatible: http://localhost:8880/v1/completions
# Works with Continue.dev, LlamaIndex, LangChain, Cursor

# Alternative: Ollama (degraded quality - uses only fine-scale encoder)
cd checkpoints/axl_code_1b_lion
ollama create axl-code-1b -f Modelfile
ollama run axl-code-1b "def quicksort(arr):"

Training Time Estimates

Model	Params	Time (Lion)	RAM
AXL-Comment-Lion	7.2M	2 min	30 MB
AXL-Micro-Lion	12.8M	3 min	107 MB
AXL-Reasoning-Lion	70M	10 min	414 MB
AXL-Code-1B-Lion	318M	20 min	2 GB

How to Modify the Code

Change Model Architecture

Edit multiscale_transformer/model/config.py — ModelConfig dataclass controls:

d_model: Model dimension (64–1024)
n_heads: Number of attention heads
d_ff: Feed-forward dimension
n_layers_per_scale: Transformer layers per scale (3 scales total)
max_seq_len: Context window in bytes

Or use the auto-builder:

from multiscale_transformer.training.model_builder import ModelBuilder, suggest_models
suggestions = suggest_models(available_ram_gb=16.0)
for s in suggestions:
    print(f"{s['name']}: {s['actual_params']/1e6:.0f}M params, {s['estimated_ram_gb']:.1f}GB RAM")

Add a New Training Script

Copy scripts/train_axl_micro.py as a template
Change the ModelConfig to your architecture
Point get_data() to your dataset
Choose optimizer: Lion (recommended) or torch.optim.SGD
Run: python scripts/my_new_trainer.py

Use the Distillation Module

from multiscale_transformer.training.distill import DistillationTrainer
trainer = DistillationTrainer(
    teacher_path="checkpoints/axl_code_1b_lion/axl_code_1b_lion.pt",
    student_config=your_student_config,
    temperature=2.0, alpha=0.7,
)
trainer.train(text_data, max_time=600, output_dir="checkpoints/distilled")

Use the Training Logger

from multiscale_transformer.training.viz import TrainLogger
logger = TrainLogger(log_dir="runs/my_experiment", backend="csv")
for step in range(1000):
    logger.log(step, loss=0.5, lr=1e-4)
logger.close()

Model Family (27 Models)

Lion-Optimized (Recommended)

Model	Params	PPL	tok/s	Q4_K_M	Training
AXL-Code-1B-Lion	318M	1.90	6.1	188 MB	20 min
AXL-Reasoning-Lion	70M	1.03	22.4	44 MB	10 min
AXL-Refactor-Lion	19.1M	1.02	52.2	12 MB	3 min
AXL-TestGen-Lion	15.2M	1.02	57.3	18 MB	3 min
AXL-Chat-Lion	9.9M	1.03	73.4	7 MB	3 min
AXL-Micro-Lion	12.8M	1.04	66.2	15 MB	3 min
AXL-Secure-Lion	11.7M	1.03	63.5	8 MB	3 min
AXL-Docs-Lion	9.9M	1.01	72.8	7 MB	2 min
AXL-Comment-Lion	7.2M	1.02	75.8	5 MB	2 min

SGD Models (Original)

Model	Params	PPL	Focus	GGUF
AXL-Micro-600K	600K	63.08	Demo	1 MB
AXL-Micro-8M	12.8M	3.13	Code gen	25 MB
AXL-Coder-15M	26.0M	5.97	Agentic	50 MB
AXL-Debugger-8M	14.1M	6.60	Bug fixing	27 MB
AXL-Fixer-12M	20.9M	5.90	Debug	40 MB
AXL-Reasoning-70M	70M	1.93	CoT	134 MB
AXL-300M	322M	5.98	Flagship	616 MB
AXL-Chat-10M	9.9M	1.02	Dialogue	19 MB
AXL-TestGen-15M	15.2M	1.01	Test gen	30 MB
AXL-Refactor-20M	19.1M	1.01	Refactoring	37 MB
AXL-Docs-8M	9.9M	1.03	Docstrings	19 MB
AXL-Comment-5M	7.2M	1.01	Comments	14 MB
AXL-Secure-10M	11.7M	1.01	Security	23 MB

Specialized Models

Model	Params	PPL	Focus	GGUF
AXL-Code-1B	318M	31.22	Code gen (SGD)	606 MB
AXL-Chat-Pro	12.8M	3.42	Advanced chat	25 MB
AXL-Translate	15.2M	1.86	Code translation	29 MB
AXL-Vision-0.8M	1M	—	Vision encoder	—
AXL-Vision-v2	4.1M	—	UI vision	—

Deploy with Docker

# Build and start all services
docker-compose up --build

# Services:
# - api-server (port 8880): OpenAI-compatible API
# - mesh-registry (port 8900): Model mesh discovery
# - axl-coder (port 8901): Coder model server
# - axl-debugger (port 8903): Debugger model server

Quantization with llama.cpp

Real Q4_K_M quantization via llama.cpp:

# Quantize a single model
./llama.cpp/llama-quantize.exe checkpoints/model-f16.gguf checkpoints/model-q4_k_m.gguf Q4_K_M

# Quantize all models
python scripts/quantize_all_models.py

Architecture

AXL processes sequences at three parallel resolution scales:

Fine (1x): Processes all tokens. Attention cost: O(N^2 d)
Medium (2x): Tokens grouped in pairs via learned downsampling. Cost: O(N^2 d/4)
Coarse (4x): Tokens grouped in quadruplets. Cost: O(N^2 d/16)

Cross-scale attention connects all scale pairs, and adaptive gating fusion combines representations at fine resolution. The Lion optimizer (sign-based momentum) provides 20x faster convergence than SGD.

Full details: Research Paper | Architecture Doc

Research Paper

Full LaTeX paper: docs/paper_axl.tex

Covers: multi-scale architecture, Lion optimizer, related work (MEGABYTE, ByT5, EvaByte), uniqueness comparison, 27-model benchmark results, training cost analysis ($0.004 for Code-1B).

Citation

@misc{axl_2026,
  title={AXL: Multi-Scale Agentic Transformer for CPU-Optimized Code Generation},
  author={Koinic},
  year={2026},
  url={https://github.com/Koinic/AXL}
}

License

Apache 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
API		API
Training/results		Training/results
axl-org-card		axl-org-card
configs		configs
data		data
examples		examples
multiscale_transformer		multiscale_transformer
scripts		scripts
AXL_MODEL_FAMILY.md		AXL_MODEL_FAMILY.md
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
ResearchPaper.pdf		ResearchPaper.pdf
axl_chat_pairs.txt		axl_chat_pairs.txt
axl_comment_pairs.txt		axl_comment_pairs.txt
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

AXL — Multi-Scale Agentic Transformer for CPU-Optimized Code Generation

Quick Start

Installation

Run a Model (Full Quality via Python API)

Run a Model via Ollama (Degraded Quality)

Train Your Own Model (3 minutes)

Python Inference

What is AXL?

Project Structure

How to Create Training Data

Option 1: Download from HuggingFace

Option 2: Generate Synthetic Data

Option 3: Use Your Own Data

Data Format

Data Quality Filtering

How to Train a Model

Step 1: Choose a Config

Step 2: Train

Step 3: Export to GGUF

Step 4: Deploy

Training Time Estimates

How to Modify the Code

Change Model Architecture

Add a New Training Script

Use the Distillation Module

Use the Training Logger

Model Family (27 Models)

Lion-Optimized (Recommended)

SGD Models (Original)

Specialized Models

Deploy with Docker

Quantization with llama.cpp

Architecture

Research Paper

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Contributors

Uh oh!

Languages