GrillCheese is a neuromorphic, multimodal AI system built on Grilly with Vulkan-first execution.
The current language core includes a new Grilly-native SSM path inspired by Mamba2, integrated with hippocampal memory, amygdala/endocrine modulation, VSA, and specialist routing.
- Orchestrator and brain routing:
- thalamic routing
- specialist selection
- amygdala + endocrine modulation
- dream/consolidation phases
- Memory:
- hippocampal/capsule memory flow
- semantic memory backend
- long-term persistence and replay
- Language backends:
- Grilly native transformer
- Grilly native SSM (
native_ssm, Mamba2-inspired)
- Learning:
- phase-based training (
affect,snn,conversations,instructions,factual,multilanguage,tools_usage,tools_crafting) - pretraining pipeline with checkpoints/resume
- VSA Reasoning Head training with cached hidden states and fused Vulkan shaders
- phase-based training (
Primary implementation:
grillcheese/language/grilly_ssm.pygrillcheese/language/grilly_native.pygrillcheese/pipelines/pretrain_pipeline.py
Model shape:
- token embedding
- N x selective scan block
- final norm
- LM head
- optional NLMS residual adaptation head
Selective scan block flow:
- norm
in_projto splitgateandvalue- selective scan recurrence with learned decay
out_proj- residual add
Supported scan implementations:
vectorized(default)loopvectorized+ fused Vulkan scan math (auto when available)
Set with:
GRILLCHEESE_SSM_SCAN_IMPL=vectorized|loopGRILLCHEESE_SSM_USE_FUSED_SCAN=1|0(default1)GRILLCHEESE_SSM_FUSED_SCAN_MAX_SEQ_LEN(default1024)
native_ssm now integrates Grilly VulkanTensor in SSM projection hotspots with safe fallback:
- enabled by default for native SSM path
- if GPU tensor conversion fails, it falls back to numpy automatically
Toggle:
GRILLCHEESE_NATIVE_USE_VULKAN_TENSOR=1|0
Other relevant native SSM env vars:
GRILLCHEESE_NATIVE_SSM_VOCAB_SIZE(default32768)GRILLCHEESE_NATIVE_SSM_D_MODEL(default768)GRILLCHEESE_NATIVE_SSM_N_LAYERS(default12)GRILLCHEESE_NATIVE_USE_SNN_RMSNORM=1|0GRILLCHEESE_NATIVE_USE_NLMS_HEAD=1|0GRILLCHEESE_NATIVE_NLMS_TOPKGRILLCHEESE_NATIVE_NLMS_SCALEGRILLCHEESE_NATIVE_NLMS_MAX_ENTRIESGRILLCHEESE_NATIVE_NLMS_LRGRILLCHEESE_NATIVE_NLMS_MU_DECAYGRILLCHEESE_NATIVE_NLMS_MU_MIN
During pretrain --mode native_ssm, tqdm now shows scan=... in postfix
so you can verify whether SSM scan is running on vk_fused_math or numpy fallback.
uv sync
uv run grillcheese doctor
uv run grillcheese chat "hello" --latency-mode instantNative SSM chat (Vulkan via Grilly):
uv run grillcheese --native --native-mode native_ssm chat "hello"Run all standard train phases:
uv run grillcheese train --phase full --batch-size 32Run one phase:
uv run grillcheese train --phase tools_usage --dataset training_data/jsonl/tool_usage_training_data.jsonlLive status:
uv run grillcheese train-statusStart native SSM pretraining:
uv run grillcheese pretrain --mode native_ssm --data-dir training_data/unified --batch-size 4 --epochs 1Resume:
uv run grillcheese pretrain --mode native_ssm --data-dir training_data/unified --resume-latestCheckpoints are written under:
~/.grillcheese/runtime/checkpoints/pretrain/
Train the VSA head (projects SSM hidden states to 10,000-dim SVC space):
python scripts/run_vsa_training.py --head-only --epochs 1 --batch-size 64 --lr 5e-4 --lr-schedule cosineThis runs a two-phase pipeline:
- Pre-compute hidden states: Forward-only backbone pass, cached as memmap (~1.44 GB)
- Cached head training: Fused
Linear+tanhVulkan shader + AdamW GPU optimizer on cached vectors
Skip re-computation if cache already exists:
python scripts/run_vsa_training.py --head-only --skip-precompute --skip-hidden-cacheVSA head checkpoints are written under:
~/.grillcheese/runtime/checkpoints/pretrain_vsa/
Detailed pretraining guide:
PRETRAINING.md
Merge pretraining JSONL sources:
uv run python scripts/merge_pretraining_dataset.pyDefault output/report:
tmp/pretrain_ready_merged.v1.jsonltmp/pretrain_ready_merged.v1.report.json
local: all local GPU computehybrid: local-first, optional cloud offloadcloud: thin client mode
Set with:
GRILLCHEESE_RUNTIME_MODE=local|hybrid|cloud
ARCHITECTURE.md- full system architecture, design decisions, and process diagramsOVERVIEW.md- architecture and system overviewTRAINING_PIPELINE_DIAGRAM.md- visual training/pretraining flowTRAINING_PIPELINE_TECHNICAL.md- method-level technical pipelinePRETRAINING.md- new detailed pretraining guideBRAIN_ARCHITECTURE_VSA.md- VSA architecture directionBRAIN_ARCHITECTURE_RECOMMENDATIONS.md- architecture recommendations