RAG-based library upgrade assistant that crawls documentation, stores it in a vector database, and uses an LLM to generate accurate upgrade guidance. It learns from past failures via an error memory store and can autonomously upgrade a Rails application with verification via RSpec.
┌─────────────────────────────────────────────────────────────┐
│ Knowledge Pipeline │
│ │
│ WebCrawler ──► DocumentChunker ──► ChromaDB │
│ (crawl4ai) (langchain) upgrade_knowledge │
│ │
│ ChromaDB │
│ upgrade_errors ◄── CI/CD hook │
│ (error memory) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Upgrade Agent │
│ │
│ User Request ──► HybridRetriever ──► LLM ──► Response │
│ (KB + errors, │
│ verified-fix boost) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Upgrader (autonomous) │
│ │
│ RepoReader ──► plan_agent (LLM+RAG) ──► UpgradePlan │
│ │ │
│ apply_to(repo) │
│ │ │
│ RailsExecutor │
│ (bundle, rspec) │
└─────────────────────────────────────────────────────────────┘
src/gemlift/
├── config.py # All settings via pydantic-settings + .env
├── interfaces.py # Protocol definitions for all components
│
├── ingestion/ # Crawl → chunk → store
│ ├── crawler.py # crawl4ai async crawler (SOURCES registry)
│ ├── chunker.py # Markdown-header + recursive text splitting
│ └── pipeline.py # Orchestrates crawl→chunk→embed→upsert
│
├── indexing/ # Embedding + vector storage
│ ├── embeddings.py # SentenceTransformer (local, 384-dim)
│ └── vector_store.py # ChromaDB wrapper (cosine similarity, upsert dedup)
│
├── memory/
│ └── error_store.py # Stores upgrade failures; marks verified fixes
│
├── retrieval/
│ └── retriever.py # HybridRetriever: KB + error memory, score boosting
│
├── agent/ # LangGraph upgrade agent
│ ├── state.py # UpgradeState TypedDict
│ ├── nodes.py # retrieve_node + generate_node factories
│ └── graph.py # StateGraph: retrieve → generate → END
│
├── upgrader/ # Autonomous repo upgrader
│ ├── repo_reader.py # Reads Gemfile, .ruby-version, config files
│ ├── plan.py # UpgradePlan + FileChange dataclasses
│ ├── plan_agent.py # LLM+RAG → structured JSON upgrade plan
│ └── executor.py # RailsExecutor: bundle/rspec via subprocess + rbenv
│
├── scheduler/
│ └── refresh.py # Daily re-crawl daemon
│
└── cli.py # Click CLI entry point
| Layer | Tool |
|---|---|
| Crawling | crawl4ai (async, JS-capable, playwright) |
| Chunking | langchain MarkdownHeaderTextSplitter + RecursiveCharacterTextSplitter |
| Embeddings | sentence-transformers all-MiniLM-L6-v2 (local, free) |
| Vector DB | ChromaDB (embedded, persistent) |
| Agent | LangGraph StateGraph |
| LLM | LiteLLM proxy or direct Anthropic (langchain-openai / langchain-anthropic) |
| CLI | click |
| Package manager | uv |
- Python 3.11+
uv— install- Ruby +
rbenv(for e2e Rails upgrade tests)
git clone https://github.com/nhatbui-eh/gemlift
cd gemlift
uv syncuv run crawl4ai-setupCopy .env.example to .env and fill in credentials:
cp .env.example .env# Option A — LiteLLM proxy (preferred)
LITELLM_API_KEY=sk-...
LITELLM_BASE_URL=https://your-litellm-proxy.com
LITELLM_MODEL=claude-sonnet-4-5-20250929
# Option B — Direct Anthropic
ANTHROPIC_API_KEY=sk-ant-...The real_llm fixture auto-detects which one is set, preferring LiteLLM.
# Ingest all known sources (rails, ruby, bundler)
uv run gemlift ingest
# Ingest a specific library
uv run gemlift ingest --library rails
# Ingest a custom URL
uv run gemlift ingest --url https://guides.rubyonrails.org/upgrading_ruby_on_rails.htmluv run gemlift ask "How do I upgrade from Rails 7.1 to 8.0?" \
--library rails --from-version 7.1 --to-version 8.0uv run gemlift report-error \
--library rails --from-version 7.1 --to-version 8.0 \
--error "ActiveRecord::StrictLoadingViolationError in production" \
--context "has_many :posts, strict_loading: true"uv run gemlift confirm-fix <error-id> "Add strict_loading: false to the association"uv run gemlift refreshtests/
├── conftest.py # Shared fixtures (real_embedding_fn, real_llm, …)
├── unit/ # 42 tests — no I/O, mocks only
│ ├── test_crawler.py
│ ├── test_chunker.py
│ ├── test_embeddings.py
│ ├── test_vector_store.py
│ ├── test_error_store.py
│ └── test_retriever.py
├── integration/ # 12 tests — real ChromaDB + sentence-transformers
│ ├── test_ingestion_pipeline.py # Crawls real URLs (marked network)
│ └── test_query_pipeline.py # Retrieval + error memory
└── e2e/ # 7 tests — real LLM + real Rails subprocess
├── test_upgrade_workflow.py # gemlift agent end-to-end
└── test_rails_upgrade.py # Upgrade hw-rails-intro app Rails 7.1→8.0
# Unit tests only (fast, no network, no LLM)
uv run pytest tests/unit/ -v
# Integration — real embeddings + ChromaDB, no crawling
uv run pytest -m "integration and not network" -v
# Integration — includes real web crawling
uv run pytest -m integration -v
# E2E — no LLM (error memory + retrieval only)
uv run pytest -m "e2e and not network" -v
# E2E — full (needs LITELLM_API_KEY or ANTHROPIC_API_KEY, network)
uv run pytest -m e2e -v
# Everything
uv run pytest -v| Marker | Meaning |
|---|---|
unit |
Pure unit tests, no external dependencies |
integration |
Real ChromaDB + sentence-transformers |
e2e |
Full pipeline including LLM |
network |
Makes real HTTP requests (crawler or LLM) |
slow |
Takes more than a few seconds |
tests/e2e/test_rails_upgrade.py runs a complete end-to-end upgrade of the bundled hw-rails-intro Rails app:
- Copies the app to a temp directory
- Runs
bundle install+db:setupwith Ruby 3.3.9 / Rails 7.1 - Asserts all 15 RSpec specs pass (baseline)
- Calls the gemlift
plan_agentwith RAG context → LLM returns a structured JSON upgrade plan - Applies the file changes (
Gemfile,.ruby-version,config/application.rb) - Runs
bundle updatewith Ruby 3.4.4 - Asserts all 15 RSpec specs still pass on Rails 8.0
# Requires rbenv with Ruby 3.3.9 + 3.4.4 and LLM credentials
uv run pytest tests/e2e/test_rails_upgrade.py -v -sEvery failed upgrade attempt can be stored in the upgrade_errors ChromaDB collection. Verified fixes receive a +0.2 score boost during hybrid retrieval, surfacing them above unverified errors.
CI pipeline fails
│
▼
gemlift report-error --library rails --error "..." --context "..."
│
▼
Error stored in ChromaDB (UNVERIFIED)
│
Developer fixes it
│
▼
gemlift confirm-fix <id> "the fix that worked"
│
▼
Document updated → FIX STATUS: VERIFIED WORKING
Future queries retrieve this with +0.2 boost
| Path | Contents |
|---|---|
data/chroma_db/ |
Persistent ChromaDB (knowledge + error memory) |
data/models/ |
Local sentence-transformer model (optional, falls back to HuggingFace download) |
data/raw/ |
Raw crawl outputs (optional) |
data/eval/ |
Evaluation datasets (optional) |