Skip to content

Remove LLM dependency from cognitive pipeline — heuristic-first architecture #369

@CalebisGross

Description

@CalebisGross

Summary

Remove all 12 generative LLM (Complete) calls from the 8 cognitive agents, replacing them with heuristic/algorithmic Go implementations. Replace llm.Provider with a new embedding.Provider interface that only handles vector embeddings. The result: mnemonic runs with zero LLM inference, using only embeddings + heuristics + FTS5 + spread activation.

Motivation

Three irreconcilable problems with the current LLM-dependent architecture:

  1. Speed: Local inference (Felix-LM 100M, Qwen 2B spokes) is too slow for production on consumer hardware (6-9s per encoding on CPU)
  2. Air-gap: Calling Gemini/cloud APIs breaks the local-first, air-gapped promise
  3. Quality: Tiny local models produce poor encodings — hallucinations, inconsistent salience, high failure rates

The key insight: mnemonic's primary consumers are AI agents (Claude Code via MCP). LLM-generated summaries, narratives, and insights are re-interpreted by a more powerful LLM at the consumer level. The value is in fast, deterministic indexing and retrieval, not prose generation.

Results (measured on production DB, 34K memories)

Metric LLM (Gemini) Heuristic (bow) Change
Encoding latency 39,426ms 6ms 6,571x faster
Recall latency 8,876ms avg ~6,200ms avg ~30% faster
Encoding failures ~20% 0% eliminated
Network calls 2+ per memory 0 fully air-gapped
External deps Gemini API key + internet none zero

What Changed

New internal/embedding/ package

  • Provider interface: Embed, BatchEmbed, Health (no Complete)
  • BowProvider: production bag-of-words embeddings (promoted from stubllm)
  • InstrumentedProvider: usage tracking wrapper
  • APIProvider: OpenAI-compatible /v1/embeddings HTTP client
  • LLMAdapter: transitional wrapper for incremental migration

Agent-by-agent Complete call replacement

Agent Current LLM Use Heuristic Replacement
Perception LLM gate (worth remembering?) Heuristic scoring already primary — disable LLM gate
Encoding Compression + concept extraction Promote fallbackCompression() + vocabulary-aware ExtractTopConcepts()
Encoding Association classification Heuristic classification already default
Episoding Episode synthesis Algorithmic: time clustering + top concepts
Consolidation Gist creation Pick highest-salience memory as representative
Consolidation Pattern detection Statistical concept co-occurrence
Retrieval Synthesis Drop entirely (consuming agent synthesizes)
Dreaming Insight generation Graph bridge detection in association network
Abstraction Principle synthesis Hierarchical concept clustering
Abstraction Axiom synthesis Hierarchical concept clustering (level 2)
Reactor @mention response Static personality + agent data

Config: embedding.provider

  • bow = built-in bag-of-words (128-dim, fully air-gapped, zero dependencies)
  • api = OpenAI-compatible endpoint
  • Omitted = auto-detect from llm config (backward compatible)

Implementation Phases

  • Phase A: Foundation — new internal/embedding/ package
  • Phase B: Replace Complete calls (10 agents, incremental)
    • B1: Perception (remove LLM gate)
    • B2: Metacognition (embed only)
    • B3: Orchestrator (health only)
    • B4: Retrieval (drop synthesis)
    • B5: Episoding (algorithmic synthesis)
    • B6: Dreaming (graph bridge detection)
    • B7: Consolidation (heuristic gist + patterns)
    • B8: Encoding (promote fallback to primary)
    • B9: Abstraction (concept clustering)
    • B10: Reactor (static responses)
  • Phase C: Wiring (serve.go, runtime.go, API, config)
  • Phase D: Tests + lifecycle validation
  • Phase E: Cleanup (follow-up — remove dead LLM code, update docs)

Verification

  • go build ./... — clean
  • go test ./... — all pass
  • go vet ./... — clean
  • Daemon starts with embedding.provider: bow (zero LLM config)
  • MCP tools work (recall, remember, status, check_memory)
  • Dashboard loads (HTTP 200)
  • Encoding pipeline processes events
  • Dedup working (Tier 3 cosine similarity)
  • E2E: remember → encode → recall round-trip verified

Follow-up Work (Separate Issues)

  1. ONNX Runtime embedded embedding (MiniLM-L6-v2 INT8)
  2. TurboQuant vector compression (QJL + PolarQuant in Go)
  3. Enhanced concept extraction (RAKE/YAKE)
  4. Pure Go word2vec / hugot transformers
  5. Phase E cleanup (remove dead LLM code paths, update docs)

Metadata

Metadata

Assignees

No one assigned

    Labels

    component:llmLLM provider layerenhancementNew feature or requestepicMulti-phase project trackingrefactorCode cleanup, deduplication, modularizationv1.0Required for v1.0 public release

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions