Remove LLM dependency from cognitive pipeline — heuristic-first architecture

## Summary

Remove all 12 generative LLM (`Complete`) calls from the 8 cognitive agents, replacing them with heuristic/algorithmic Go implementations. Replace `llm.Provider` with a new `embedding.Provider` interface that only handles vector embeddings. The result: mnemonic runs with zero LLM inference, using only embeddings + heuristics + FTS5 + spread activation.

## Motivation

Three irreconcilable problems with the current LLM-dependent architecture:

1. **Speed**: Local inference (Felix-LM 100M, Qwen 2B spokes) is too slow for production on consumer hardware (6-9s per encoding on CPU)
2. **Air-gap**: Calling Gemini/cloud APIs breaks the local-first, air-gapped promise
3. **Quality**: Tiny local models produce poor encodings — hallucinations, inconsistent salience, high failure rates

The key insight: mnemonic's primary consumers are AI agents (Claude Code via MCP). LLM-generated summaries, narratives, and insights are re-interpreted by a more powerful LLM at the consumer level. The value is in fast, deterministic indexing and retrieval, not prose generation.

## Results (measured on production DB, 34K memories)

| Metric | LLM (Gemini) | Heuristic (bow) | Change |
|--------|-------------|-----------------|--------|
| Encoding latency | 39,426ms | 6ms | **6,571x faster** |
| Recall latency | 8,876ms avg | ~6,200ms avg | **~30% faster** |
| Encoding failures | ~20% | 0% | **eliminated** |
| Network calls | 2+ per memory | 0 | **fully air-gapped** |
| External deps | Gemini API key + internet | none | **zero** |

## What Changed

### New `internal/embedding/` package
- `Provider` interface: `Embed`, `BatchEmbed`, `Health` (no `Complete`)
- `BowProvider`: production bag-of-words embeddings (promoted from `stubllm`)
- `InstrumentedProvider`: usage tracking wrapper
- `APIProvider`: OpenAI-compatible `/v1/embeddings` HTTP client
- `LLMAdapter`: transitional wrapper for incremental migration

### Agent-by-agent Complete call replacement

| Agent | Current LLM Use | Heuristic Replacement |
|-------|----------------|----------------------|
| Perception | LLM gate (worth remembering?) | Heuristic scoring already primary — disable LLM gate |
| Encoding | Compression + concept extraction | Promote `fallbackCompression()` + vocabulary-aware `ExtractTopConcepts()` |
| Encoding | Association classification | Heuristic classification already default |
| Episoding | Episode synthesis | Algorithmic: time clustering + top concepts |
| Consolidation | Gist creation | Pick highest-salience memory as representative |
| Consolidation | Pattern detection | Statistical concept co-occurrence |
| Retrieval | Synthesis | Drop entirely (consuming agent synthesizes) |
| Dreaming | Insight generation | Graph bridge detection in association network |
| Abstraction | Principle synthesis | Hierarchical concept clustering |
| Abstraction | Axiom synthesis | Hierarchical concept clustering (level 2) |
| Reactor | @mention response | Static personality + agent data |

### Config: `embedding.provider`
- `bow` = built-in bag-of-words (128-dim, fully air-gapped, zero dependencies)
- `api` = OpenAI-compatible endpoint
- Omitted = auto-detect from `llm` config (backward compatible)

## Implementation Phases

- [x] **Phase A**: Foundation — new `internal/embedding/` package
- [x] **Phase B**: Replace Complete calls (10 agents, incremental)
  - [x] B1: Perception (remove LLM gate)
  - [x] B2: Metacognition (embed only)
  - [x] B3: Orchestrator (health only)
  - [x] B4: Retrieval (drop synthesis)
  - [x] B5: Episoding (algorithmic synthesis)
  - [x] B6: Dreaming (graph bridge detection)
  - [x] B7: Consolidation (heuristic gist + patterns)
  - [x] B8: Encoding (promote fallback to primary)
  - [x] B9: Abstraction (concept clustering)
  - [x] B10: Reactor (static responses)
- [x] **Phase C**: Wiring (serve.go, runtime.go, API, config)
- [x] **Phase D**: Tests + lifecycle validation
- [ ] **Phase E**: Cleanup (follow-up — remove dead LLM code, update docs)

## Verification

- [x] `go build ./...` — clean
- [x] `go test ./...` — all pass
- [x] `go vet ./...` — clean
- [x] Daemon starts with `embedding.provider: bow` (zero LLM config)
- [x] MCP tools work (recall, remember, status, check_memory)
- [x] Dashboard loads (HTTP 200)
- [x] Encoding pipeline processes events
- [x] Dedup working (Tier 3 cosine similarity)
- [x] E2E: remember → encode → recall round-trip verified

## Follow-up Work (Separate Issues)

1. ONNX Runtime embedded embedding (MiniLM-L6-v2 INT8)
2. TurboQuant vector compression (QJL + PolarQuant in Go)
3. Enhanced concept extraction (RAKE/YAKE)
4. Pure Go word2vec / hugot transformers
5. Phase E cleanup (remove dead LLM code paths, update docs)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove LLM dependency from cognitive pipeline — heuristic-first architecture #369

Summary

Motivation

Results (measured on production DB, 34K memories)

What Changed

New `internal/embedding/` package

Agent-by-agent Complete call replacement

Config: `embedding.provider`

Implementation Phases

Verification

Follow-up Work (Separate Issues)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	LLM (Gemini)	Heuristic (bow)	Change
Encoding latency	39,426ms	6ms	6,571x faster
Recall latency	8,876ms avg	~6,200ms avg	~30% faster
Encoding failures	~20%	0%	eliminated
Network calls	2+ per memory	0	fully air-gapped
External deps	Gemini API key + internet	none	zero

Agent	Current LLM Use	Heuristic Replacement
Perception	LLM gate (worth remembering?)	Heuristic scoring already primary — disable LLM gate
Encoding	Compression + concept extraction	Promote `fallbackCompression()` + vocabulary-aware `ExtractTopConcepts()`
Encoding	Association classification	Heuristic classification already default
Episoding	Episode synthesis	Algorithmic: time clustering + top concepts
Consolidation	Gist creation	Pick highest-salience memory as representative
Consolidation	Pattern detection	Statistical concept co-occurrence
Retrieval	Synthesis	Drop entirely (consuming agent synthesizes)
Dreaming	Insight generation	Graph bridge detection in association network
Abstraction	Principle synthesis	Hierarchical concept clustering
Abstraction	Axiom synthesis	Hierarchical concept clustering (level 2)
Reactor	@mention response	Static personality + agent data

Remove LLM dependency from cognitive pipeline — heuristic-first architecture #369

Description

Summary

Motivation

Results (measured on production DB, 34K memories)

What Changed

New internal/embedding/ package

Agent-by-agent Complete call replacement

Config: embedding.provider

Implementation Phases

Verification

Follow-up Work (Separate Issues)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

New `internal/embedding/` package

Config: `embedding.provider`