Add ChromaDB as a persistent Docker service#173
Merged
Conversation
- Add congress_chromadb service (chromadb/chroma:0.6.3) to dev compose with a named volume (chromadb-volume) for data persistence - Add healthcheck against /api/v1/heartbeat - Add restart: unless-stopped in prod compose overlay - Wire CHROMA_HOST=congress_chromadb into congress_parser_fastapi so it finds ChromaDB via the Docker network instead of a bare IP - Add CHROMA_HOST env var support in uscode.py handler, falling back to LLM_HOST then 10.0.0.120 to preserve existing prod behaviour https://claude.ai/code/session_011LABnV4F5UKzgwKhWj5ND6
Adds a system for users to describe their policy interests in natural language and automatically maps them to relevant USC sections via ChromaDB semantic search. Bills that amend those sections are then surfaced throughout the UI. Backend (Python/FastAPI): - Add UserInterest and UserInterestUscContent SQLAlchemy models (sensitive schema) with Alembic migration - New interest.py handler: save interest text, run ChromaDB search (search_chroma, n=50), upsert auto-matched sections, toggle/add sections manually, query legislation via USCContentDiff join chain - Add interest routes to user.py router (GET/POST /user/interest, PATCH/POST /user/interest/section, GET /user/interest/legislation) - search_chroma now includes usc_ident in each result dict Frontend (hillstack Next.js / tRPC): - Add user_interest and user_interest_usc_content Prisma models - Five new tRPC procedures on userRouter: interestGet, interestSave (calls FastAPI /uscode/search for ChromaDB then stores via Prisma), interestToggleSection, interestAddSection, interestLegislation ($queryRawUnsafe multi-join), interestBillMatch - Dashboard widget: InterestFeed replaces USC Tracking placeholder, shows up to 8 bills touching matched sections; prompts login/setup - Bill layout: InterestBadge client component shows green chip when the bill touches any of the user's active interest sections - New page /user/interests: textarea + save button, grouped section list with checkbox-toggle and manual-add support - Add FASTAPI_URL env var to congress_hillstack Docker service https://claude.ai/code/session_011LABnV4F5UKzgwKhWj5ND6
Creates backend/congress_parser/importers/chroma_uscode.py, a standalone
async script that reads top-level US Code sections from PostgreSQL and
upserts them into the ChromaDB 'uscode' collection, enabling the
interest-based semantic search feature to find relevant sections.
Features:
- Auto-detects the latest USC release version_id from usc_release table
(or accepts --version-id for an explicit override)
- Filters to top-level section identifiers (/us/usc/t{n}/s{identifier})
matching the resolution path used in search_chroma()
- Builds rich document text: title name + section heading + content_str
(truncated to 8 000 chars) for high-quality embeddings
- Stores metadata: title number, section number, display label, heading
- Idempotent: uses collection.upsert() so safe to re-run
- --reset flag to wipe and rebuild the collection from scratch
- --dry-run flag to count eligible sections without writing
- --batch-size to tune throughput (default 200)
- Creates the 'congress-dev' tenant and 'usc-chat' database via the
ChromaDB REST API if they don't already exist
- Graceful error messages when ChromaDB is unreachable or DB has no data
Usage:
python3 -m congress_parser.importers.chroma_uscode
python3 -m congress_parser.importers.chroma_uscode --reset
python3 -m congress_parser.importers.chroma_uscode --dry-run
python3 -m congress_parser.importers.chroma_uscode --version-id 74573
https://claude.ai/code/session_011LABnV4F5UKzgwKhWj5ND6
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
with a named volume (chromadb-volume) for data persistence
finds ChromaDB via the Docker network instead of a bare IP
LLM_HOST then 10.0.0.120 to preserve existing prod behaviour
https://claude.ai/code/session_011LABnV4F5UKzgwKhWj5ND6