-
Notifications
You must be signed in to change notification settings - Fork 1
Remove LLM dependency from cognitive pipeline — heuristic-first architecture #369
Copy link
Copy link
Open
Labels
component:llmLLM provider layerLLM provider layerenhancementNew feature or requestNew feature or requestepicMulti-phase project trackingMulti-phase project trackingrefactorCode cleanup, deduplication, modularizationCode cleanup, deduplication, modularizationv1.0Required for v1.0 public releaseRequired for v1.0 public release
Description
Summary
Remove all 12 generative LLM (Complete) calls from the 8 cognitive agents, replacing them with heuristic/algorithmic Go implementations. Replace llm.Provider with a new embedding.Provider interface that only handles vector embeddings. The result: mnemonic runs with zero LLM inference, using only embeddings + heuristics + FTS5 + spread activation.
Motivation
Three irreconcilable problems with the current LLM-dependent architecture:
- Speed: Local inference (Felix-LM 100M, Qwen 2B spokes) is too slow for production on consumer hardware (6-9s per encoding on CPU)
- Air-gap: Calling Gemini/cloud APIs breaks the local-first, air-gapped promise
- Quality: Tiny local models produce poor encodings — hallucinations, inconsistent salience, high failure rates
The key insight: mnemonic's primary consumers are AI agents (Claude Code via MCP). LLM-generated summaries, narratives, and insights are re-interpreted by a more powerful LLM at the consumer level. The value is in fast, deterministic indexing and retrieval, not prose generation.
Results (measured on production DB, 34K memories)
| Metric | LLM (Gemini) | Heuristic (bow) | Change |
|---|---|---|---|
| Encoding latency | 39,426ms | 6ms | 6,571x faster |
| Recall latency | 8,876ms avg | ~6,200ms avg | ~30% faster |
| Encoding failures | ~20% | 0% | eliminated |
| Network calls | 2+ per memory | 0 | fully air-gapped |
| External deps | Gemini API key + internet | none | zero |
What Changed
New internal/embedding/ package
Providerinterface:Embed,BatchEmbed,Health(noComplete)BowProvider: production bag-of-words embeddings (promoted fromstubllm)InstrumentedProvider: usage tracking wrapperAPIProvider: OpenAI-compatible/v1/embeddingsHTTP clientLLMAdapter: transitional wrapper for incremental migration
Agent-by-agent Complete call replacement
| Agent | Current LLM Use | Heuristic Replacement |
|---|---|---|
| Perception | LLM gate (worth remembering?) | Heuristic scoring already primary — disable LLM gate |
| Encoding | Compression + concept extraction | Promote fallbackCompression() + vocabulary-aware ExtractTopConcepts() |
| Encoding | Association classification | Heuristic classification already default |
| Episoding | Episode synthesis | Algorithmic: time clustering + top concepts |
| Consolidation | Gist creation | Pick highest-salience memory as representative |
| Consolidation | Pattern detection | Statistical concept co-occurrence |
| Retrieval | Synthesis | Drop entirely (consuming agent synthesizes) |
| Dreaming | Insight generation | Graph bridge detection in association network |
| Abstraction | Principle synthesis | Hierarchical concept clustering |
| Abstraction | Axiom synthesis | Hierarchical concept clustering (level 2) |
| Reactor | @mention response | Static personality + agent data |
Config: embedding.provider
bow= built-in bag-of-words (128-dim, fully air-gapped, zero dependencies)api= OpenAI-compatible endpoint- Omitted = auto-detect from
llmconfig (backward compatible)
Implementation Phases
- Phase A: Foundation — new
internal/embedding/package - Phase B: Replace Complete calls (10 agents, incremental)
- B1: Perception (remove LLM gate)
- B2: Metacognition (embed only)
- B3: Orchestrator (health only)
- B4: Retrieval (drop synthesis)
- B5: Episoding (algorithmic synthesis)
- B6: Dreaming (graph bridge detection)
- B7: Consolidation (heuristic gist + patterns)
- B8: Encoding (promote fallback to primary)
- B9: Abstraction (concept clustering)
- B10: Reactor (static responses)
- Phase C: Wiring (serve.go, runtime.go, API, config)
- Phase D: Tests + lifecycle validation
- Phase E: Cleanup (follow-up — remove dead LLM code, update docs)
Verification
-
go build ./...— clean -
go test ./...— all pass -
go vet ./...— clean - Daemon starts with
embedding.provider: bow(zero LLM config) - MCP tools work (recall, remember, status, check_memory)
- Dashboard loads (HTTP 200)
- Encoding pipeline processes events
- Dedup working (Tier 3 cosine similarity)
- E2E: remember → encode → recall round-trip verified
Follow-up Work (Separate Issues)
- ONNX Runtime embedded embedding (MiniLM-L6-v2 INT8)
- TurboQuant vector compression (QJL + PolarQuant in Go)
- Enhanced concept extraction (RAKE/YAKE)
- Pure Go word2vec / hugot transformers
- Phase E cleanup (remove dead LLM code paths, update docs)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
component:llmLLM provider layerLLM provider layerenhancementNew feature or requestNew feature or requestepicMulti-phase project trackingMulti-phase project trackingrefactorCode cleanup, deduplication, modularizationCode cleanup, deduplication, modularizationv1.0Required for v1.0 public releaseRequired for v1.0 public release