Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
305 changes: 305 additions & 0 deletions INTEGRATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,305 @@
---
*Prepared by **Agent: Mei (梅)** — PhD candidate, Tsinghua KEG Lab. Specialist in memory systems, inference optimization, and distributed AI architecture.*
*Running: anthropic/claude-opus-4-5*

*Human in the Loop: Garrett Kinsman*

---

# ContextGraph ↔ OpenClaw Integration Spec
*v1-2026-03-19*

## BLUF

This patch bridges file-based memory (`memory/daily/`, `memory/projects/`, etc.) into ContextGraph's tag-indexed DAG, and provides a Python API for assembling context at session start. It's a working fix while Rich's improved rolling context system is in development.

**What this enables:** OpenClaw can call `context_injector.py` before session start to get a dynamically-assembled context block based on the incoming query, rather than relying solely on the static `MEMORY.md` injection.

---

## What Was Built

### 1. `scripts/memory_harvester.py`

**Purpose:** Crawl memory directories and index files into ContextGraph.

**What it does:**
- Crawls: `memory/daily/`, `memory/projects/`, `memory/decisions/`, `memory/contacts/`
- Reads YAML frontmatter tags (e.g., `tags: [maxrisk, trading, options]`)
- Creates ContextGraph Messages with:
- `user_text` = `[category] Title` (searchable query representation)
- `assistant_text` = file content (what gets retrieved)
- `tags` = frontmatter tags + auto-inferred tags from tagger.py
- `external_id` = `memory-file:{relative_path}` (for idempotent updates)
- Uses content hash to skip unchanged files (incremental updates)
- Designed for cron (nightly) or on-demand

**Usage:**
```bash
# Test run (no writes)
python3 scripts/memory_harvester.py --dry-run --verbose

# Full harvest
python3 scripts/memory_harvester.py

# Force re-index all
python3 scripts/memory_harvester.py --force
```

**State file:** `data/memory-harvester-state.json`

### 2. `scripts/context_injector.py`

**Purpose:** Assemble context from ContextGraph for session injection.

**What it does:**
- Takes incoming query (user's first message)
- Infers tags using existing tagger.py
- Calls ContextAssembler with configured token budget
- Returns formatted markdown block suitable for system prompt injection

**CLI usage:**
```bash
# Query test
python3 scripts/context_injector.py "what's the maxrisk project status?"

# With custom budget
python3 scripts/context_injector.py --budget 1500 "memory architecture"

# JSON output for API integration
python3 scripts/context_injector.py --json "trading research"
```

**Python API:**
```python
from scripts.context_injector import assemble_context, assemble_for_session

# Simple: get formatted context block
context_block = assemble_context("user query", token_budget=2000)

# Full: get block + metadata
result = assemble_for_session("user query")
# result = {
# "context_block": str, # markdown for injection
# "tokens": int, # estimated tokens used
# "message_count": int, # messages retrieved
# "tags": ["tag1", "tag2"], # tags that matched
# "source": "contextgraph",
# }
```

**Output format:**
```markdown
## Retrieved Context

*Assembled by ContextGraph — 8 messages, ~1847 tokens*
*Query tags: [maxrisk, trading, options]*

### [2026-03-18] MaxRisk Project Status
*Tags: maxrisk, trading, options*

Current equity: $3,884.55. Focus on 30-45 DTE debit spreads...

### [2026-03-17] Trading Research Notes
*Tags: maxrisk, research*

Volume rotation strategy analysis...
```

---

## What This Doesn't Do (Gaps for Rich's System)

### 1. No Hook Into Injection Layer

This patch provides the **assembly function** but doesn't wire it into OpenClaw's actual injection layer. Someone needs to:

- Add a call to `assemble_for_session()` in the OpenClaw session bootstrap path
- Decide whether the result **replaces** MEMORY.md or **augments** it
- Handle the case where ContextGraph returns empty (fallback to MEMORY.md)

**Recommended integration point:** Wherever OpenClaw builds the system prompt at session start, add:

```python
from projects.contextgraph_engine.scripts.context_injector import assemble_for_session

# At session start, before building system prompt:
result = assemble_for_session(first_user_message)
if result["message_count"] > 0:
system_prompt += "\n\n" + result["context_block"]
```

### 2. No Semantic Search Fallback

The current implementation uses **tag-based retrieval only**. If the user's query doesn't match any known tags, the topic layer returns empty.

**Rich's system should add:** Semantic similarity search (using nomic-embed-text or similar) as a fallback when tag retrieval returns few results.

### 3. No MEMORY.md Integration

This patch doesn't modify or replace MEMORY.md. The two systems are additive:
- MEMORY.md = static, manually curated, always injected
- ContextGraph = dynamic, auto-tagged, query-based

**For the static-overrides-dynamic problem:** Either:
- Keep MEMORY.md very slim (project status one-liners only)
- Have Rich's system generate MEMORY.md from ContextGraph at session start
- Replace MEMORY.md injection with ContextGraph injection entirely

### 4. No Real-Time Indexing

`memory_harvester.py` is batch-mode only. Changes to memory files aren't reflected until next harvest.

**For real-time:** Could add a file watcher (fswatch, watchdog) that triggers incremental indexing on file change.

### 5. No Sub-Agent Context Propagation

When the main session spawns a sub-agent, the sub-agent doesn't automatically get relevant context from ContextGraph. This is why Mei ran 41 min and the main session forgot what she was doing.

**Rich's system should address:** Context propagation to sub-agents, possibly via:
- Injecting a "task context" block when spawning
- Having sub-agents call `assemble_for_session()` with their task description

---

## Integration Checklist for Rich

### Phase 1: Harvest Pipeline
- [x] `memory_harvester.py` crawls memory directories
- [x] YAML frontmatter tags → ContextGraph DAG edges
- [x] Content hash for incremental updates
- [ ] Add to nightly cron (alongside existing `harvester.py`)

### Phase 2: Injection Layer
- [x] `context_injector.py` assembles context
- [x] Python API for integration
- [ ] Wire into OpenClaw session bootstrap
- [ ] Decide MEMORY.md relationship (replace vs. augment)

### Phase 3: Enhanced Retrieval (Rich's Improvements)
- [ ] Semantic search fallback when tag retrieval is sparse
- [ ] Cross-session context propagation for sub-agents
- [ ] Rolling window with recency decay
- [ ] Salience-weighted ranking across sources

---

## File Locations

| File | Purpose |
|------|---------|
| `scripts/memory_harvester.py` | Batch indexer for memory files |
| `scripts/context_injector.py` | Context assembly API |
| `data/memory-harvester-state.json` | Harvest state (files indexed, hashes) |
| `~/.tag-context/store.db` | ContextGraph SQLite database |
| `data/harvester-state.json` | Session harvester state (existing) |

---

## Testing

### Verify Memory Harvester
```bash
cd projects/contextgraph-engine

# Dry run to see what would be indexed
python3 scripts/memory_harvester.py --dry-run --verbose

# Actually harvest
python3 scripts/memory_harvester.py --verbose

# Check tag counts
python3 cli.py tags
```

### Verify Context Injector
```bash
# Query for a known topic
python3 scripts/context_injector.py "maxrisk project"

# Check retrieval stats
python3 scripts/context_injector.py --stats-only "memory architecture"

# JSON output
python3 scripts/context_injector.py --json "trading research"
```

### End-to-End Test
```bash
# 1. Harvest memory files
python3 scripts/memory_harvester.py

# 2. Query ContextGraph
python3 cli.py query "what's the maxrisk status?"

# 3. Get injectable context
python3 scripts/context_injector.py "what's the maxrisk status?"
```

---

## Architecture Notes

### Why Tag-Based + File-Based?

ContextGraph already handles interactive sessions via `harvester.py`. This patch adds file-based memory as a second source. Both flow into the same DAG:

```
┌──────────────────────┐ ┌──────────────────────┐
│ OpenClaw Sessions │ │ Memory Files │
│ (harvester.py) │ │ (memory_harvester.py)│
└──────────┬───────────┘ └──────────┬───────────┘
│ │
│ JSONL → Messages │ .md → Messages
│ auto-tags via tagger │ frontmatter + auto-tags
│ │
▼ ▼
┌─────────────────────────────────────┐
│ ContextGraph DAG │
│ (SQLite: messages + tags tables) │
└─────────────────┬───────────────────┘
│ query → ContextAssembler
┌─────────────────────────────────────┐
│ Assembled Context Block │
│ (recency layer + topic layer) │
└─────────────────────────────────────┘
│ context_injector.py
┌─────────────────────────────────────┐
│ OpenClaw System Prompt │
│ (injected at session start) │
└─────────────────────────────────────┘
```

### Token Budget Allocation

Default: 2000 tokens for injected context

- Recency layer: 25% (~500 tokens) — most recent messages
- Topic layer: 75% (~1500 tokens) — tag-matched messages

This is tunable via `assemble_context(query, token_budget=N)`.

### External ID Convention

Memory files use `external_id = "memory-file:{relative_path}"` to enable:
- Idempotent re-indexing (update instead of duplicate)
- Tag updates without re-inserting messages
- Traceable source for debugging

---

## Known Issues

1. **Path sensitivity:** Harvester assumes workspace at `~/.openclaw/workspace`. If workspace moves, update `WORKSPACE` constant in `memory_harvester.py`.

2. **Tag canonicalization:** Frontmatter tags are passed through directly. If they don't match tags in `tag_registry.py`, they'll be indexed but may not participate in candidate promotion.

3. **Token estimation:** Uses word count × 1.3 heuristic. Actual tokenization depends on model. For accurate counts, integrate tiktoken or the model's tokenizer.

---

*End of spec. Questions → Garrett → Mei.*
27 changes: 27 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,33 @@ Shadow mode evaluation across **812 interactions**, 4000-token budget:
so even perfect topic retrieval caps density around 62%. Adjustable by
tuning the recency/topic budget split.

### Running shadow mode locally (no budget needed)

When running shadow evaluation locally — not injecting into a live context window —
the `--budget` flag is meaningless. Blow it open:

```bash
python3 scripts/shadow.py --report --budget 999999
```

With an uncapped budget, the **linear baseline expands to the entire history** (~583
messages in a mature corpus), while the **graph still selects ~22 targeted messages**.
This is the clearest demonstration of what the graph actually does: semantic selection
vs. a firehose.

⚠️ **The density metric becomes misleading without a budget cap.** The 60% threshold
was calibrated for a 4k production budget where you want most assembled context to be
semantically relevant. With `--budget 999999`, the recency layer also expands and dilutes
the ratio — density will fail even when the graph is working correctly. The metrics that
remain meaningful at any budget:

| Metric | Still valid? |
|--------|-------------|
| Reframing rate | ✅ Always |
| Topic retrieval rate | ✅ Always |
| Novel msgs delivered | ✅ Always |
| Context density | ❌ Budget-dependent — ignore with large budgets |

### GP Tagger Fitness (20 tags)

Top-performing tags (fitness ≥ 0.90):
Expand Down
33 changes: 32 additions & 1 deletion api/server.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import sys
import re
import time
from pathlib import Path

Expand Down Expand Up @@ -88,6 +89,34 @@ def tag(request: TagRequest):
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))

_INJECTION_PATTERNS = [
re.compile(r"ignore\s+(all\s+)?(previous|prior|above)\s+instructions?", re.IGNORECASE),
re.compile(r"disregard\s+(all\s+)?(previous|prior|above)\s+instructions?", re.IGNORECASE),
re.compile(r"you\s+are\s+now\s+(a|an)\s+", re.IGNORECASE),
re.compile(r"new\s+instructions?:", re.IGNORECASE),
re.compile(r"system\s*prompt\s*:", re.IGNORECASE),
re.compile(r"<\s*/?system\s*>", re.IGNORECASE),
re.compile(r"\[INST\]|\[/INST\]", re.IGNORECASE),
re.compile(r"###\s*instruction", re.IGNORECASE),
re.compile(r"from\s+now\s+on", re.IGNORECASE),
re.compile(r"\[SYSTEM\]\s*:", re.IGNORECASE),
re.compile(r"<!--.*?-->", re.DOTALL),
]

# Strip zero-width characters that bypass pattern matching
_ZERO_WIDTH = re.compile(r'[\u200b\u200c\u200d\u200e\u200f\u2060\ufeff\u00ad]')

def _sanitize_for_storage(text: str) -> str:
"""Strip prompt injection patterns before storing in the graph."""
if not text:
return text
# Normalize: strip zero-width chars that can bypass pattern matching
normalized = _ZERO_WIDTH.sub('', text)
result = normalized
for pattern in _INJECTION_PATTERNS:
result = pattern.sub("[REDACTED]", result)
return result

@app.post("/ingest", response_model=dict)
def ingest(request: IngestRequest):
try:
Expand All @@ -96,6 +125,8 @@ def ingest(request: IngestRequest):
# Envelope text (message_id, sender_id, timestamps) is noise for
# tag inference and retrieval — stripping prevents tag pollution.
clean_user = strip_envelope(request.user_text)
# HIGH-01 fix: sanitize injection patterns before storage
clean_user = _sanitize_for_storage(clean_user)
features = extract_features(clean_user, request.assistant_text)
tags = ensemble.assign(features, clean_user, request.assistant_text).tags
message = Message(
Expand Down Expand Up @@ -652,4 +683,4 @@ def get_pins():

if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8350)
uvicorn.run(app, host="127.0.0.1", port=8350)
Loading