Skip to content

cardog-ai/vortex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VORTEX

Vehicle Ontology Resolution Through EXplanatory Compression

A 25MB transformer that learned how VINs encode vehicle identity.

Results

Make accuracy:   100%
Model accuracy:  99%
Trim accuracy:   90%

Architecture

  • 6-layer decoder-only transformer
  • 6.6M parameters
  • 256-dim embeddings
  • Trained on 50K examples in 8 minutes

Usage

from llm.model import VortexLLM
from llm.tokenizer import VortexTokenizer
import torch

# Load
tokenizer = VortexTokenizer.load("models/vortex_llm.tokenizer.json")
checkpoint = torch.load("models/vortex_llm.pt", map_location="cpu")

model = VortexLLM(
    vocab_size=checkpoint['vocab_size'],
    dim=checkpoint['dim'],
    n_layers=checkpoint['n_layers'],
    pad_id=tokenizer.pad_id
)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Decode a VIN
nano_vin = "1FTFW1E8*NF"
# ... encode and generate

Key Discovery

The model learned:

  • Position 10 = year (N→2022, P→2023, R→2024)
  • VIN prefixes map to makes (1FT=Ford, 5YJ=Tesla)
  • Embeddings cluster by vehicle category
  • Similar vehicles → similar embeddings (~0.99)

Files

vortex/
├── llm/
│   ├── model.py      # 6-layer transformer
│   ├── tokenizer.py  # VIN tokenizer
│   └── train.py      # Training script
├── models/
│   ├── vortex_llm.pt           # Trained model (25MB)
│   └── vortex_llm.tokenizer.json
└── vis_explorer.html  # Embedding visualization

About

25MB of weights that learned to decode VINs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages