Vehicle Ontology Resolution Through EXplanatory Compression
A 25MB transformer that learned how VINs encode vehicle identity.
Make accuracy: 100%
Model accuracy: 99%
Trim accuracy: 90%
- 6-layer decoder-only transformer
- 6.6M parameters
- 256-dim embeddings
- Trained on 50K examples in 8 minutes
from llm.model import VortexLLM
from llm.tokenizer import VortexTokenizer
import torch
# Load
tokenizer = VortexTokenizer.load("models/vortex_llm.tokenizer.json")
checkpoint = torch.load("models/vortex_llm.pt", map_location="cpu")
model = VortexLLM(
vocab_size=checkpoint['vocab_size'],
dim=checkpoint['dim'],
n_layers=checkpoint['n_layers'],
pad_id=tokenizer.pad_id
)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Decode a VIN
nano_vin = "1FTFW1E8*NF"
# ... encode and generateThe model learned:
- Position 10 = year (N→2022, P→2023, R→2024)
- VIN prefixes map to makes (1FT=Ford, 5YJ=Tesla)
- Embeddings cluster by vehicle category
- Similar vehicles → similar embeddings (~0.99)
vortex/
├── llm/
│ ├── model.py # 6-layer transformer
│ ├── tokenizer.py # VIN tokenizer
│ └── train.py # Training script
├── models/
│ ├── vortex_llm.pt # Trained model (25MB)
│ └── vortex_llm.tokenizer.json
└── vis_explorer.html # Embedding visualization