GrillCheese

GrillCheese is a neuromorphic, multimodal AI system built on Grilly with Vulkan-first execution.

The current language core includes a new Grilly-native SSM path inspired by Mamba2, integrated with hippocampal memory, amygdala/endocrine modulation, VSA, and specialist routing.

Current core architecture

Orchestrator and brain routing:
- thalamic routing
- specialist selection
- amygdala + endocrine modulation
- dream/consolidation phases
Memory:
- hippocampal/capsule memory flow
- semantic memory backend
- long-term persistence and replay
Language backends:
- Grilly native transformer
- Grilly native SSM (native_ssm, Mamba2-inspired)
Learning:
- phase-based training (affect, snn, conversations, instructions, factual, multilanguage, tools_usage, tools_crafting)
- pretraining pipeline with checkpoints/resume
- VSA Reasoning Head training with cached hidden states and fused Vulkan shaders

Mamba2-inspired native SSM

Primary implementation:

grillcheese/language/grilly_ssm.py
grillcheese/language/grilly_native.py
grillcheese/pipelines/pretrain_pipeline.py

Model shape:

token embedding
N x selective scan block
final norm
LM head
optional NLMS residual adaptation head

Selective scan block flow:

norm
in_proj to split gate and value
selective scan recurrence with learned decay
out_proj
residual add

Supported scan implementations:

vectorized (default)
loop
vectorized + fused Vulkan scan math (auto when available)

Set with:

GRILLCHEESE_SSM_SCAN_IMPL=vectorized|loop
GRILLCHEESE_SSM_USE_FUSED_SCAN=1|0 (default 1)
GRILLCHEESE_SSM_FUSED_SCAN_MAX_SEQ_LEN (default 1024)

VulkanTensor integration

native_ssm now integrates Grilly VulkanTensor in SSM projection hotspots with safe fallback:

enabled by default for native SSM path
if GPU tensor conversion fails, it falls back to numpy automatically

Toggle:

GRILLCHEESE_NATIVE_USE_VULKAN_TENSOR=1|0

Other relevant native SSM env vars:

GRILLCHEESE_NATIVE_SSM_VOCAB_SIZE (default 32768)
GRILLCHEESE_NATIVE_SSM_D_MODEL (default 768)
GRILLCHEESE_NATIVE_SSM_N_LAYERS (default 12)
GRILLCHEESE_NATIVE_USE_SNN_RMSNORM=1|0
GRILLCHEESE_NATIVE_USE_NLMS_HEAD=1|0
GRILLCHEESE_NATIVE_NLMS_TOPK
GRILLCHEESE_NATIVE_NLMS_SCALE
GRILLCHEESE_NATIVE_NLMS_MAX_ENTRIES
GRILLCHEESE_NATIVE_NLMS_LR
GRILLCHEESE_NATIVE_NLMS_MU_DECAY
GRILLCHEESE_NATIVE_NLMS_MU_MIN

During pretrain --mode native_ssm, tqdm now shows scan=... in postfix so you can verify whether SSM scan is running on vk_fused_math or numpy fallback.

Quick start

uv sync
uv run grillcheese doctor
uv run grillcheese chat "hello" --latency-mode instant

Native SSM chat (Vulkan via Grilly):

uv run grillcheese --native --native-mode native_ssm chat "hello"

Training pipeline

Run all standard train phases:

uv run grillcheese train --phase full --batch-size 32

Run one phase:

uv run grillcheese train --phase tools_usage --dataset training_data/jsonl/tool_usage_training_data.jsonl

Live status:

uv run grillcheese train-status

Pretraining (Mamba2-inspired path)

Start native SSM pretraining:

uv run grillcheese pretrain --mode native_ssm --data-dir training_data/unified --batch-size 4 --epochs 1

Resume:

uv run grillcheese pretrain --mode native_ssm --data-dir training_data/unified --resume-latest

Checkpoints are written under:

~/.grillcheese/runtime/checkpoints/pretrain/

VSA Reasoning Head Training

Train the VSA head (projects SSM hidden states to 10,000-dim SVC space):

python scripts/run_vsa_training.py --head-only --epochs 1 --batch-size 64 --lr 5e-4 --lr-schedule cosine

This runs a two-phase pipeline:

Pre-compute hidden states: Forward-only backbone pass, cached as memmap (~1.44 GB)
Cached head training: Fused Linear+tanh Vulkan shader + AdamW GPU optimizer on cached vectors

Skip re-computation if cache already exists:

python scripts/run_vsa_training.py --head-only --skip-precompute --skip-hidden-cache

VSA head checkpoints are written under:

~/.grillcheese/runtime/checkpoints/pretrain_vsa/

Detailed pretraining guide:

PRETRAINING.md

Unified dataset merge

Merge pretraining JSONL sources:

uv run python scripts/merge_pretraining_dataset.py

Default output/report:

tmp/pretrain_ready_merged.v1.jsonl
tmp/pretrain_ready_merged.v1.report.json

Runtime modes

local: all local GPU compute
hybrid: local-first, optional cloud offload
cloud: thin client mode

Set with:

GRILLCHEESE_RUNTIME_MODE=local|hybrid|cloud

Documentation map

ARCHITECTURE.md - full system architecture, design decisions, and process diagrams
OVERVIEW.md - architecture and system overview
TRAINING_PIPELINE_DIAGRAM.md - visual training/pretraining flow
TRAINING_PIPELINE_TECHNICAL.md - method-level technical pipeline
PRETRAINING.md - new detailed pretraining guide
BRAIN_ARCHITECTURE_VSA.md - VSA architecture direction
BRAIN_ARCHITECTURE_RECOMMENDATIONS.md - architecture recommendations

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.claude/agents		.claude/agents
.github		.github
cubemind_results		cubemind_results
grillcheese		grillcheese
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHAT_TURN_EXAMPLE_JSON.md		CHAT_TURN_EXAMPLE_JSON.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
OVERVIEW.md		OVERVIEW.md
PAPER.md		PAPER.md
PRETRAINING.md		PRETRAINING.md
README.md		README.md
TRAINING_PIPELINE_DIAGRAM.md		TRAINING_PIPELINE_DIAGRAM.md
TRAINING_PIPELINE_TECHNICAL.md		TRAINING_PIPELINE_TECHNICAL.md
index.html		index.html
pretrain_hybrid.bat		pretrain_hybrid.bat
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GrillCheese

Current core architecture

Mamba2-inspired native SSM

VulkanTensor integration

Quick start

Training pipeline

Pretraining (Mamba2-inspired path)

VSA Reasoning Head Training

Unified dataset merge

Runtime modes

Documentation map

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

GrillCheese

Current core architecture

Mamba2-inspired native SSM

VulkanTensor integration

Quick start

Training pipeline

Pretraining (Mamba2-inspired path)

VSA Reasoning Head Training

Unified dataset merge

Runtime modes

Documentation map

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages