Aingram (Lite)

Agent memory that finds the right context. Every time. Locally.

pip install aingram

Most AI memory systems are filing cabinets with a search bar. Store everything, search later, hope the right thing comes back.

Aingram is different. It runs three retrieval signals simultaneously — full-text search, semantic vector search, and knowledge graph traversal — and fuses them into a single ranked result using Reciprocal Rank Fusion. Everything lives in one SQLite file on your machine. No cloud. No API key. No vendor to trust with your agents' memory.

On LongMemEval — the most rigorous public benchmark for AI memory — Aingram's retrieval pipeline finds the correct context in the top 3 results for every single query when the evidence is present. On the real benchmark with full noisy conversation histories, it surfaces the right sessions in the top 10 for 95.5% of queries.

from aingram import MemoryStore

with MemoryStore('./agent_memory.db') as mem:
    mem.remember('The API rate limit is 100 req/min. Exceeding it causes silent drops.')
    mem.remember('Deployment takes ~3 min. Always run migrations before the container swap.')

    results = mem.recall('what do I need to know before deploying?', limit=5)
    for r in results:
        print(r.score, r.entry.content)

Numbers

Benchmarked on LongMemEval (Wu et al., ICLR 2025) — 500 hand-curated questions across multi-session conversation histories averaging 115,000 tokens. All runs on RTX 4060 8GB. No reranking model. No LLM in the retrieval loop.

Benchmark	Metric	Score
LongMemEval (oracle split)	recall_any@3	1.000
LongMemEval (oracle split)	ndcg@10	0.994
LongMemEval-S (real retrieval)	recall_any@10	0.955
LongMemEval-S (real retrieval)	recall_any@3	0.902
LongMemEval-S (real retrieval)	ndcg@10	0.836
BEIR SciFact	nDCG@10	0.703

What these numbers mean:

recall_any@3 = 1.000 (oracle): When the evidence exists, Aingram puts the right session in the top 3 results for every query across 500 instances. The correct context is always available to your agent.
recall_any@10 = 0.955 (real): On real noisy conversation histories — ~40 sessions of noise per query — the right sessions appear in the top 10 for 95.5% of queries. This sets the ceiling for any downstream LLM accuracy.
22ms median retrieval latency — pure local pipeline, no network round-trip.

Retrieval speed scales with corpus size. Vector search is the dominant cost at scale — Aingram's QJL two-pass compression keeps it manageable:

Entries	Full recall	Embedding	Vector search
1K	~16ms	~8ms	~3ms
10K	~47ms	~9ms	~34ms
50K	~222ms	~11ms	~160ms
100K	~347ms	~11ms	~320ms

QJL's two-pass approach (compressed candidates → float32 rerank) breaks even against brute-force at ~30K entries and delivers meaningful speedup above that threshold.

Why not just use a vector database?

Single-signal retrieval misses. Semantic similarity is powerful but breaks on:

Exact terminology — a query about "the 100 req/min rate limit" might not semantically match a memory that says "hard cap: 100/min" without FTS
Entity relationships — "what did Alice decide about auth?" needs graph traversal, not cosine similarity
Keyword-first queries — agents often search with specific technical terms where BM25 outperforms dense retrieval

Aingram runs all three signals and fuses them. The hybrid consistently outperforms any single signal, especially on the kinds of precise, domain-specific queries agents actually make.

How It Works

Agent query
    │
    ├──▶ FTS5 (keyword)                        ─┐
    ├──▶ sqlite-vec + QJL two-pass (semantic)  ─┤──▶ RRF fusion ──▶ ranked results
    └──▶ Knowledge graph (entity)              ─┘

FTS5 full-text search — SQLite's native full-text index. Fast, no embedding required, excellent for exact terminology and technical strings.

sqlite-vec vector search — Dense semantic retrieval using nomic-embed-text-v1.5 running locally via ONNX. 768-dimensional embeddings, CPU or GPU. No external API.

QJL two-pass vector search — At larger corpus sizes, vector search dominates retrieval latency. Aingram uses a Quantized Johnson-Lindenstrauss (QJL) two-pass approach: a fast first pass over compressed quantized vectors narrows the candidate pool, then a precise second pass over full float32 vectors reranks the survivors. This trades a small fraction of recall for significantly lower latency at scale — the break-even point is around 30K entries, above which QJL is faster than brute-force float32 search with no meaningful quality loss.

Knowledge graph traversal — Entities and relationships extracted from memory entries. Multi-hop queries resolved via CTE. "What did Alice decide about auth?" finds the entity, traverses relationships, returns relevant entries — even if the query didn't match the entry verbatim.

Reciprocal Rank Fusion — Results from all three signals are combined and re-ranked. Each signal's rank position, not raw score, contributes to the final order. This makes the fusion robust to scale differences between signals.

Everything in One File

Your entire agent memory — entries, embeddings, entity graph, signing chain — lives in a single SQLite file. No separate vector database to manage. No graph database. No external embedding service. No Docker containers.

agent_memory.db     ← that's it

Copy it. Back it up with cp. Inspect it with any SQLite client. Share it between agents. Export it to JSON. Import it somewhere else.

This is a deliberate design choice. Memory that requires infrastructure to operate is fragile. Memory that's a file is durable.

Cryptographic Integrity

Every memory entry is Ed25519-signed and linked in a tamper-evident hash chain. You can verify that a memory hasn't been modified since it was written — useful when memory is shared between agents or stored across trust boundaries.

result = mem.verify()
# {'chain_valid': True, 'entries_verified': 1247, 'broken_at': None}

aingram --db ./agent_memory.db verify

Entity Extraction and Knowledge Graph

Install aingram[extraction] to enable GLiNER-based entity extraction. Entities and relationships are automatically extracted from memory entries and stored in the knowledge graph as you write.

# After 'User Alice approved the migration to Clerk on Jan 15.'
# is stored, the graph contains:
#   Alice ─[approved]─▶ migration (valid_from: 2026-01-15)
#   migration ─[uses]─▶ Clerk

results = mem.recall('what did Alice decide?')
# Returns entries linked to Alice via graph traversal,
# not just entries that mention "Alice" by text

Query the graph directly:

aingram --db ./agent_memory.db graph "Alice"
aingram --db ./agent_memory.db entities

MCP Server

Install aingram[mcp] and connect any MCP-compatible agent (Claude, Cursor, Windsurf, Cline) to your memory store.

aingram --db ./agent_memory.db mcp

Tools exposed: remember, recall, reference, verify, get_experiment_context, and more. Optional bearer-token auth middleware included.

Add to your MCP config:

{
  "mcpServers": {
    "aingram": {
      "command": "aingram",
      "args": ["--db", "/path/to/agent_memory.db", "mcp"]
    }
  }
}

Quick Start

pip install aingram

Python API:

from aingram import MemoryStore

with MemoryStore('./agent_memory.db') as mem:
    # Store a memory
    mem.remember('Deploy always requires a migration run first.')

    # Recall with hybrid search
    results = mem.recall('deployment checklist', limit=5)
    for r in results:
        print(f'{r.score:.3f}  {r.entry.content}')

CLI:

aingram --db ./agent_memory.db status
aingram --db ./agent_memory.db add "API rate limit is 100 req/min"
aingram --db ./agent_memory.db search "rate limiting"
aingram --db ./agent_memory.db entities
aingram --db ./agent_memory.db graph "Alice"
aingram --db ./agent_memory.db export ./backup.json
aingram --db ./agent_memory.db import ./backup.json
aingram --db ./agent_memory.db verify

GPU embeddings (optional):

pip uninstall -y onnxruntime
pip install onnxruntime-gpu
pip install "aingram[gpu]"
export AINGRAM_ONNX_PROVIDER=cuda

Install

pip install aingram                   # core — CPU embeddings
pip install "aingram[extraction]"     # + GLiNER entity extraction
pip install "aingram[mcp]"            # + MCP server
pip install "aingram[llm]"            # + Ollama/local LLM client
pip install "aingram[api]"            # + Anthropic API extractor
pip install "aingram[gpu]"            # + CUDA ONNX Runtime wheels
pip install "aingram[all]"            # everything

Configuration

Precedence: constructor kwargs → env vars → ~/.aingram/config.toml → defaults.

Env var	Default	Meaning
`AINGRAM_MODELS_DIR`	`~/.aingram/models`	Model cache directory
`AINGRAM_EMBEDDING_DIM`	`768`	Embedding width for new DBs
`AINGRAM_LLM_URL`	`http://localhost:11434`	Ollama base URL
`AINGRAM_LLM_MODEL`	—	Default LLM model name
`AINGRAM_ONNX_PROVIDER`	auto	`cpu`, `cuda`, or `npu`
`AINGRAM_EXTRACTOR_MODE`	`none`	`none`, `local`, or `sonnet`
`AINGRAM_WORKER_ENABLED`	`true`	Background consolidation worker
`AINGRAM_TELEMETRY_ENABLED`	`true`	Anonymous CLI usage (opt-out below)

~/.aingram/config.toml example:

embedding_dim = 768
worker_enabled = true
llm_url = "http://localhost:11434"
llm_model = "mistral"
extractor_mode = "local"
telemetry_enabled = false

Privacy and Telemetry

No memory content ever leaves your machine. The SQLite database, embeddings, and entity graph stay local.

The CLI may send anonymous usage events by default: a random install ID, the top-level command name (e.g. search, add), and the package version. No memory text, query content, file paths, or personal data.

Opt out:

--no-telemetry on any command
telemetry_enabled = false in ~/.aingram/config.toml
AINGRAM_TELEMETRY_ENABLED=false environment variable

Comparison

	Aingram	Mem0	Zep	MemPalace
Retrieval signals	FTS5 + vector + graph	Vector only	KG + vector	Vector + heuristics
Vector compression	✅ QJL two-pass	✗	✗	✗
Storage	SQLite (one file)	Cloud / self-host	Neo4j / cloud	ChromaDB
Cryptographic signing	✅ Ed25519 + hash chains	✗	✗	✗
Knowledge graph	✅ Built-in	✗	✅ (Neo4j)	✗
Local-only	✅	Optional	Optional	✅
No API key required	✅	✗	✗	✅
Python package	✅	✅	✅	✅
MCP server	✅	✗	✗	✅
License	Apache 2.0	Commercial	Commercial	MIT

Export / Import

# Export everything — entries, graph, vectors
mem.export_json('./backup.json')

# Import into a fresh database
with MemoryStore('./new_memory.db') as fresh:
    fresh.import_json('./backup.json')

# Merge into an existing database (skips duplicates)
with MemoryStore('./existing.db') as existing:
    existing.import_json('./backup.json', merge=True)

Benchmarks

Reproduce the retrieval benchmarks from a clone of this repo:

# Create synthetic benchmark databases
python scripts/seed_bench_db.py

# Run retrieval and embedding timing
python scripts/bench.py

The scripts/bench.py output gives per-database timing breakdowns for embedding cost, vector search, FTS5, and full hybrid recall at 1K, 10K, 50K, and 100K entries.

Development

git clone https://github.com/bozbuilds/AIngram
cd AIngram
pip install -e ".[dev,all]"
pytest
ruff check aingram/ && ruff format --check aingram/

Python 3.11+. See CONTRIBUTING.md for guidelines.

What's Next

Aingram Lite is the open-source retrieval and storage foundation. Aingram Pro adds systems built on top of it — GPU-resident neural caching, biological memory consolidation with spaced-repetition scheduling, and multi-agent synchronization primitives, among other additions. Join the waitlist, or watch this repo for updates.

Community

Discord · aingram.dev · Discussions

License

Apache-2.0. See LICENSE and NOTICE.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.claude		.claude
.github		.github
aingram		aingram
examples		examples
landing_page		landing_page
scripts		scripts
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
llms.txt		llms.txt
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aingram (Lite)

Numbers

Why not just use a vector database?

How It Works

Everything in One File

Cryptographic Integrity

Entity Extraction and Knowledge Graph

MCP Server

Quick Start

Install

Configuration

Privacy and Telemetry

Comparison

Export / Import

Benchmarks

Development

What's Next

Community

License

About

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Aingram (Lite)

Numbers

Why not just use a vector database?

How It Works

Everything in One File

Cryptographic Integrity

Entity Extraction and Knowledge Graph

MCP Server

Quick Start

Install

Configuration

Privacy and Telemetry

Comparison

Export / Import

Benchmarks

Development

What's Next

Community

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages