Skip to content

Koinic-Labs/AXL

Repository files navigation

AXL — Multi-Scale Agentic Transformer for CPU-Optimized Code Generation

License HuggingFace GitHub

AXL (Architecture eXperimental Lab) — A family of 27 agentic coding models optimized for training and running on consumer CPUs. No GPU required. Train on a Ryzen 5 5600G. Deploy via Python API (full quality) or Ollama (degraded quality).


Quick Start

Installation

git clone https://github.com/Koinic/AXL.git
cd AXL
pip install -e .

Run a Model (Full Quality via Python API)

# Start the API server (full AXL multi-scale quality)
python AXL/API/serve_model.py --model checkpoints/axl_micro_lion --port 8880 --name axl-micro-lion

# Then call the OpenAI-compatible endpoint:
curl http://localhost:8880/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"prompt": "def fibonacci(n):", "max_tokens": 100}'

This works with any OpenAI-compatible tool (Continue.dev, LlamaIndex, LangChain, Cursor).

Run a Model via Ollama (Degraded Quality)

Warning: Ollama GGUF files use only the fine-scale encoder (1/3 of the AXL architecture). The reported PPL values apply to the full multi-scale model. For full quality, use the Python API above.

cd AXL/HuggingFace/AXL-Micro-Lion
ollama create axl-micro-lion -f Modelfile
ollama run axl-micro-lion "def fibonacci(n):"

Train Your Own Model (3 minutes)

# 1. Generate training data
python scripts/generate_all_training_data.py --skip-hf

# 2. Train AXL-Micro with Lion optimizer
python scripts/retrain_all_lion.py --models micro

# 3. Your model is saved in checkpoints/axl_micro_lion/

Python Inference

import torch
from multiscale_transformer.model.config import load_config, ModelConfig
from multiscale_transformer.model.model import MultiScaleTransformer
from multiscale_transformer.training.tokenizer import ByteTokenizer

# Load model
ckpt = torch.load("checkpoints/axl_micro_lion/axl_micro_lion.pt", map_location="cpu")
cfg = ckpt["config"]
config = ModelConfig(
    vocab_size=cfg.get("vocab_size", 258), d_model=cfg.get("d_model", 256),
    n_heads=cfg.get("n_heads", 4), d_ff=cfg.get("d_ff", 688),
    n_layers_per_scale=cfg.get("n_layers_per_scale", 3),
    n_cross_attn_layers=cfg.get("n_cross_attn_layers", 1),
    max_seq_len=cfg.get("max_seq_len", 256),
)
model = MultiScaleTransformer(config)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

# Generate
tokenizer = ByteTokenizer()
ids = torch.tensor([tokenizer.encode("def fibonacci(n):\n")], dtype=torch.long)
with torch.no_grad():
    out = model.generate(ids, max_new_tokens=100, temperature=0.8, top_k=40)
print(tokenizer.decode(out[0].tolist()))

What is AXL?

AXL is a multi-scale transformer architecture designed by Koinic for CPU-first training and inference. It processes token sequences at three parallel resolution scales — fine (1x), medium (2x), and coarse (4x) — each with a dedicated encoder stack, cross-scale attention, and adaptive gating fusion.

Why CPU-first? Training a 1B-parameter model on GPU costs $10,000+. AXL trains the same model on a consumer CPU for $0.004 in electricity.

Key innovations:

  • Lion optimizer: 20x faster convergence than SGD, 50% less memory than AdamW
  • Byte-level tokenizer (vocab=258): No vocabulary training, works with any language
  • GGUF export: Real Q4_K_M quantization via llama.cpp
  • Progressive training: Scale to 1B+ params on 16GB RAM

Project Structure

AXL/                              # Distribution package
├── GitHub/                       # Code for others to train AXL
│   ├── multiscale_transformer/   # Core library
│   │   ├── model/                # Attention, blocks, config, model, layers
│   │   ├── training/             # Lion, GaLore, viz, distill, streaming, dataset
│   │   ├── data/                 # Quality filter, data mixture
│   │   ├── export/               # GGUF export, HF compat, Ollama server
│   │   ├── axl_v2/               # Agentic extensions (tool router, self-debug, vision)
│   │   ├── benchmarks/           # HumanEval, MBPP, Perplexity, CodeBLEU
│   │   └── tests/                # Test suite
│   ├── scripts/                  # All training, export, benchmark, utility scripts
│   ├── configs/                  # YAML training configs + BPE tokenizer
│   ├── data/                     # Training datasets
│   ├── docs/                     # Paper, references, quickstart
│   ├── examples/                 # Inference and training examples
│   ├── llama.cpp/                # Quantization tools (Windows binaries)
│   ├── README.md                 # This file
│   ├── LICENSE                   # Apache 2.0
│   ├── CHANGELOG.md              # Version history
│   ├── requirements.txt          # Python dependencies
│   ├── pyproject.toml            # Package config
│   ├── Dockerfile                # Container build
│   └── docker-compose.yml        # Multi-service orchestration
│
└── HuggingFace/                  # Ready-to-use models
    ├── README.md                 # Organization overview
    ├── paper_axl.tex             # Research paper
    ├── references.bib            # Bibliography
    ├── AXL_ARCHITECTURE.md       # Architecture documentation
    ├── AXL-Code-1B-Lion/         # 27 model directories
    ├── AXL-Reasoning-Lion/
    ├── AXL-Micro-Lion/
    ├── ...
    └── AXL-Vision-v2/

How to Create Training Data

Option 1: Download from HuggingFace

python scripts/download_training_data.py --max_gb 5
# Downloads Python code from HuggingFace datasets

Option 2: Generate Synthetic Data

python scripts/generate_all_training_data.py --skip-hf
# Generates training data for all model types

Option 3: Use Your Own Data

  1. Place your text files in data/ (any .txt or .jsonl file)
  2. Point the training script to your file:
python scripts/train_axl_micro.py --data_path data/my_code.txt --max_time 600

Data Format

  • Text files (.txt): Raw text, one file per dataset
  • JSONL files (.jsonl): JSON lines with source_code/target_code fields (for translation)
  • Byte-level tokenizer: Any file works — no preprocessing needed

Data Quality Filtering

from multiscale_transformer.data.quality_filter import DataFilter
filter = DataFilter(min_lines=2, max_lines=500, require_syntax_valid=True)
clean = filter.filter_file("data/raw_code.txt", "data/clean_code.txt")

How to Train a Model

Step 1: Choose a Config

Configs are in configs/. Each defines architecture:

  • axl_micro.yaml — 12.8M params, fastest training (3 min)
  • axl_code_1b.yaml — 318M params, largest model (20 min)
  • Or use the auto-builder: from multiscale_transformer.training.model_builder import ModelBuilder

Step 2: Train

# Train one model with Lion optimizer (recommended)
python scripts/retrain_all_lion.py --models micro    # 3 minutes
python scripts/retrain_all_lion.py --models code_1b  # 20 minutes

# Train all models sequentially
python scripts/retrain_all_lion.py                    # ~50 minutes total

# Train with custom data
python scripts/train_axl_code_1b_lion.py --max_time 1200

Step 3: Export to GGUF

python scripts/quantize_all_models.py --models code_1b_lion
# Produces F16 + Q4_K_M GGUF files in checkpoints/

Step 4: Deploy

# Primary: Python API server (full multi-scale quality)
python AXL/API/serve_model.py --model checkpoints/axl_code_1b_lion --port 8880 --name axl-code-1b
# OpenAI-compatible: http://localhost:8880/v1/completions
# Works with Continue.dev, LlamaIndex, LangChain, Cursor

# Alternative: Ollama (degraded quality - uses only fine-scale encoder)
cd checkpoints/axl_code_1b_lion
ollama create axl-code-1b -f Modelfile
ollama run axl-code-1b "def quicksort(arr):"

Training Time Estimates

Model Params Time (Lion) RAM
AXL-Comment-Lion 7.2M 2 min 30 MB
AXL-Micro-Lion 12.8M 3 min 107 MB
AXL-Reasoning-Lion 70M 10 min 414 MB
AXL-Code-1B-Lion 318M 20 min 2 GB

How to Modify the Code

Change Model Architecture

Edit multiscale_transformer/model/config.pyModelConfig dataclass controls:

  • d_model: Model dimension (64–1024)
  • n_heads: Number of attention heads
  • d_ff: Feed-forward dimension
  • n_layers_per_scale: Transformer layers per scale (3 scales total)
  • max_seq_len: Context window in bytes

Or use the auto-builder:

from multiscale_transformer.training.model_builder import ModelBuilder, suggest_models
suggestions = suggest_models(available_ram_gb=16.0)
for s in suggestions:
    print(f"{s['name']}: {s['actual_params']/1e6:.0f}M params, {s['estimated_ram_gb']:.1f}GB RAM")

Add a New Training Script

  1. Copy scripts/train_axl_micro.py as a template
  2. Change the ModelConfig to your architecture
  3. Point get_data() to your dataset
  4. Choose optimizer: Lion (recommended) or torch.optim.SGD
  5. Run: python scripts/my_new_trainer.py

Use the Distillation Module

from multiscale_transformer.training.distill import DistillationTrainer
trainer = DistillationTrainer(
    teacher_path="checkpoints/axl_code_1b_lion/axl_code_1b_lion.pt",
    student_config=your_student_config,
    temperature=2.0, alpha=0.7,
)
trainer.train(text_data, max_time=600, output_dir="checkpoints/distilled")

Use the Training Logger

from multiscale_transformer.training.viz import TrainLogger
logger = TrainLogger(log_dir="runs/my_experiment", backend="csv")
for step in range(1000):
    logger.log(step, loss=0.5, lr=1e-4)
logger.close()

Model Family (27 Models)

Lion-Optimized (Recommended)

Model Params PPL tok/s Q4_K_M Training
AXL-Code-1B-Lion 318M 1.90 6.1 188 MB 20 min
AXL-Reasoning-Lion 70M 1.03 22.4 44 MB 10 min
AXL-Refactor-Lion 19.1M 1.02 52.2 12 MB 3 min
AXL-TestGen-Lion 15.2M 1.02 57.3 18 MB 3 min
AXL-Chat-Lion 9.9M 1.03 73.4 7 MB 3 min
AXL-Micro-Lion 12.8M 1.04 66.2 15 MB 3 min
AXL-Secure-Lion 11.7M 1.03 63.5 8 MB 3 min
AXL-Docs-Lion 9.9M 1.01 72.8 7 MB 2 min
AXL-Comment-Lion 7.2M 1.02 75.8 5 MB 2 min

SGD Models (Original)

Model Params PPL Focus GGUF
AXL-Micro-600K 600K 63.08 Demo 1 MB
AXL-Micro-8M 12.8M 3.13 Code gen 25 MB
AXL-Coder-15M 26.0M 5.97 Agentic 50 MB
AXL-Debugger-8M 14.1M 6.60 Bug fixing 27 MB
AXL-Fixer-12M 20.9M 5.90 Debug 40 MB
AXL-Reasoning-70M 70M 1.93 CoT 134 MB
AXL-300M 322M 5.98 Flagship 616 MB
AXL-Chat-10M 9.9M 1.02 Dialogue 19 MB
AXL-TestGen-15M 15.2M 1.01 Test gen 30 MB
AXL-Refactor-20M 19.1M 1.01 Refactoring 37 MB
AXL-Docs-8M 9.9M 1.03 Docstrings 19 MB
AXL-Comment-5M 7.2M 1.01 Comments 14 MB
AXL-Secure-10M 11.7M 1.01 Security 23 MB

Specialized Models

Model Params PPL Focus GGUF
AXL-Code-1B 318M 31.22 Code gen (SGD) 606 MB
AXL-Chat-Pro 12.8M 3.42 Advanced chat 25 MB
AXL-Translate 15.2M 1.86 Code translation 29 MB
AXL-Vision-0.8M 1M Vision encoder
AXL-Vision-v2 4.1M UI vision

Deploy with Docker

# Build and start all services
docker-compose up --build

# Services:
# - api-server (port 8880): OpenAI-compatible API
# - mesh-registry (port 8900): Model mesh discovery
# - axl-coder (port 8901): Coder model server
# - axl-debugger (port 8903): Debugger model server

Quantization with llama.cpp

Real Q4_K_M quantization via llama.cpp:

# Quantize a single model
./llama.cpp/llama-quantize.exe checkpoints/model-f16.gguf checkpoints/model-q4_k_m.gguf Q4_K_M

# Quantize all models
python scripts/quantize_all_models.py

Architecture

AXL processes sequences at three parallel resolution scales:

  • Fine (1x): Processes all tokens. Attention cost: O(N^2 d)
  • Medium (2x): Tokens grouped in pairs via learned downsampling. Cost: O(N^2 d/4)
  • Coarse (4x): Tokens grouped in quadruplets. Cost: O(N^2 d/16)

Cross-scale attention connects all scale pairs, and adaptive gating fusion combines representations at fine resolution. The Lion optimizer (sign-based momentum) provides 20x faster convergence than SGD.

Full details: Research Paper | Architecture Doc


Research Paper

Full LaTeX paper: docs/paper_axl.tex

Covers: multi-scale architecture, Lion optimizer, related work (MEGABYTE, ByT5, EvaByte), uniqueness comparison, 27-model benchmark results, training cost analysis ($0.004 for Code-1B).


Citation

@misc{axl_2026,
  title={AXL: Multi-Scale Agentic Transformer for CPU-Optimized Code Generation},
  author={Koinic},
  year={2026},
  url={https://github.com/Koinic/AXL}
}

License

Apache 2.0 — see LICENSE.