Skip to content

nhatbui-eh/gemlift

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gemlift

RAG-based library upgrade assistant that crawls documentation, stores it in a vector database, and uses an LLM to generate accurate upgrade guidance. It learns from past failures via an error memory store and can autonomously upgrade a Rails application with verification via RSpec.


System Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Knowledge Pipeline                       │
│                                                              │
│  WebCrawler ──► DocumentChunker ──► ChromaDB                 │
│  (crawl4ai)     (langchain)         upgrade_knowledge        │
│                                                              │
│                              ChromaDB                        │
│                              upgrade_errors  ◄── CI/CD hook  │
│                              (error memory)                  │
└─────────────────────────────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│                    Upgrade Agent                             │
│                                                              │
│  User Request ──► HybridRetriever ──► LLM ──► Response      │
│                   (KB + errors,                              │
│                    verified-fix boost)                       │
└─────────────────────────────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│                  Upgrader (autonomous)                       │
│                                                              │
│  RepoReader ──► plan_agent (LLM+RAG) ──► UpgradePlan        │
│                                              │               │
│                                         apply_to(repo)       │
│                                              │               │
│                                         RailsExecutor        │
│                                         (bundle, rspec)      │
└─────────────────────────────────────────────────────────────┘

Module Overview

src/gemlift/
├── config.py               # All settings via pydantic-settings + .env
├── interfaces.py           # Protocol definitions for all components
│
├── ingestion/              # Crawl → chunk → store
│   ├── crawler.py          # crawl4ai async crawler (SOURCES registry)
│   ├── chunker.py          # Markdown-header + recursive text splitting
│   └── pipeline.py         # Orchestrates crawl→chunk→embed→upsert
│
├── indexing/               # Embedding + vector storage
│   ├── embeddings.py       # SentenceTransformer (local, 384-dim)
│   └── vector_store.py     # ChromaDB wrapper (cosine similarity, upsert dedup)
│
├── memory/
│   └── error_store.py      # Stores upgrade failures; marks verified fixes
│
├── retrieval/
│   └── retriever.py        # HybridRetriever: KB + error memory, score boosting
│
├── agent/                  # LangGraph upgrade agent
│   ├── state.py            # UpgradeState TypedDict
│   ├── nodes.py            # retrieve_node + generate_node factories
│   └── graph.py            # StateGraph: retrieve → generate → END
│
├── upgrader/               # Autonomous repo upgrader
│   ├── repo_reader.py      # Reads Gemfile, .ruby-version, config files
│   ├── plan.py             # UpgradePlan + FileChange dataclasses
│   ├── plan_agent.py       # LLM+RAG → structured JSON upgrade plan
│   └── executor.py         # RailsExecutor: bundle/rspec via subprocess + rbenv
│
├── scheduler/
│   └── refresh.py          # Daily re-crawl daemon
│
└── cli.py                  # Click CLI entry point

Stack

Layer Tool
Crawling crawl4ai (async, JS-capable, playwright)
Chunking langchain MarkdownHeaderTextSplitter + RecursiveCharacterTextSplitter
Embeddings sentence-transformers all-MiniLM-L6-v2 (local, free)
Vector DB ChromaDB (embedded, persistent)
Agent LangGraph StateGraph
LLM LiteLLM proxy or direct Anthropic (langchain-openai / langchain-anthropic)
CLI click
Package manager uv

Setup

Prerequisites

  • Python 3.11+
  • uvinstall
  • Ruby + rbenv (for e2e Rails upgrade tests)

Install

git clone https://github.com/nhatbui-eh/gemlift
cd gemlift
uv sync

Playwright (required for web crawling)

uv run crawl4ai-setup

Environment

Copy .env.example to .env and fill in credentials:

cp .env.example .env
# Option A — LiteLLM proxy (preferred)
LITELLM_API_KEY=sk-...
LITELLM_BASE_URL=https://your-litellm-proxy.com
LITELLM_MODEL=claude-sonnet-4-5-20250929

# Option B — Direct Anthropic
ANTHROPIC_API_KEY=sk-ant-...

The real_llm fixture auto-detects which one is set, preferring LiteLLM.


CLI Usage

Ingest documentation

# Ingest all known sources (rails, ruby, bundler)
uv run gemlift ingest

# Ingest a specific library
uv run gemlift ingest --library rails

# Ingest a custom URL
uv run gemlift ingest --url https://guides.rubyonrails.org/upgrading_ruby_on_rails.html

Ask the upgrade agent

uv run gemlift ask "How do I upgrade from Rails 7.1 to 8.0?" \
  --library rails --from-version 7.1 --to-version 8.0

Report a CI/CD failure to error memory

uv run gemlift report-error \
  --library rails --from-version 7.1 --to-version 8.0 \
  --error "ActiveRecord::StrictLoadingViolationError in production" \
  --context "has_many :posts, strict_loading: true"

Confirm a fix (promotes it with a score boost)

uv run gemlift confirm-fix <error-id> "Add strict_loading: false to the association"

Start the daily refresh daemon

uv run gemlift refresh

Testing

Test layout

tests/
├── conftest.py                     # Shared fixtures (real_embedding_fn, real_llm, …)
├── unit/                           # 42 tests — no I/O, mocks only
│   ├── test_crawler.py
│   ├── test_chunker.py
│   ├── test_embeddings.py
│   ├── test_vector_store.py
│   ├── test_error_store.py
│   └── test_retriever.py
├── integration/                    # 12 tests — real ChromaDB + sentence-transformers
│   ├── test_ingestion_pipeline.py  # Crawls real URLs (marked network)
│   └── test_query_pipeline.py      # Retrieval + error memory
└── e2e/                            # 7 tests — real LLM + real Rails subprocess
    ├── test_upgrade_workflow.py     # gemlift agent end-to-end
    └── test_rails_upgrade.py        # Upgrade hw-rails-intro app Rails 7.1→8.0

Run by scope

# Unit tests only (fast, no network, no LLM)
uv run pytest tests/unit/ -v

# Integration — real embeddings + ChromaDB, no crawling
uv run pytest -m "integration and not network" -v

# Integration — includes real web crawling
uv run pytest -m integration -v

# E2E — no LLM (error memory + retrieval only)
uv run pytest -m "e2e and not network" -v

# E2E — full (needs LITELLM_API_KEY or ANTHROPIC_API_KEY, network)
uv run pytest -m e2e -v

# Everything
uv run pytest -v

Markers

Marker Meaning
unit Pure unit tests, no external dependencies
integration Real ChromaDB + sentence-transformers
e2e Full pipeline including LLM
network Makes real HTTP requests (crawler or LLM)
slow Takes more than a few seconds

The Rails upgrade e2e test

tests/e2e/test_rails_upgrade.py runs a complete end-to-end upgrade of the bundled hw-rails-intro Rails app:

  1. Copies the app to a temp directory
  2. Runs bundle install + db:setup with Ruby 3.3.9 / Rails 7.1
  3. Asserts all 15 RSpec specs pass (baseline)
  4. Calls the gemlift plan_agent with RAG context → LLM returns a structured JSON upgrade plan
  5. Applies the file changes (Gemfile, .ruby-version, config/application.rb)
  6. Runs bundle update with Ruby 3.4.4
  7. Asserts all 15 RSpec specs still pass on Rails 8.0
# Requires rbenv with Ruby 3.3.9 + 3.4.4 and LLM credentials
uv run pytest tests/e2e/test_rails_upgrade.py -v -s

Error Memory & Learning

Every failed upgrade attempt can be stored in the upgrade_errors ChromaDB collection. Verified fixes receive a +0.2 score boost during hybrid retrieval, surfacing them above unverified errors.

CI pipeline fails
      │
      ▼
gemlift report-error --library rails --error "..." --context "..."
      │
      ▼
Error stored in ChromaDB (UNVERIFIED)
      │
Developer fixes it
      │
      ▼
gemlift confirm-fix <id> "the fix that worked"
      │
      ▼
Document updated → FIX STATUS: VERIFIED WORKING
Future queries retrieve this with +0.2 boost

Data

Path Contents
data/chroma_db/ Persistent ChromaDB (knowledge + error memory)
data/models/ Local sentence-transformer model (optional, falls back to HuggingFace download)
data/raw/ Raw crawl outputs (optional)
data/eval/ Evaluation datasets (optional)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors