triage tool for repos drowning in PRs. finds dupes, ranks quality, checks vision alignment across repos.
built this because i saw someone staring at 3000+ open PRs and losing their mind trying to figure out which ones were duplicates. turns out 40% of them were. ran pr-prism on 6K+ items across a real repo, found 594 duplicate clusters.
who this is for: maintainers, triage teams, anyone staring at a 4-digit PR count and wondering where to start.
brew tap stresstestor/tap && brew install prism-triage
prism init
prism scan
prism triageor with npm: npm install -g prism-triage
scan ─── fetch PRs + issues via GraphQL
│
embed ── vectorize titles + bodies
│
├── dupes ── cluster by cosine similarity
├── rank ─── score by quality signals
├── vision ─ check alignment with VISION.md
└── review ─ LLM deep review
│
triage ─ run everything in one shot
- zero cost default: ollama + opencode zen + github PAT. $0 to run
- local-first: sqlite + sqlite-vec, everything stays on your machine
- multi-repo:
repos: [a/b, c/d]in config, cross-repo dupe detection - multi-provider: ollama, jina, openai, voyage, kimi for embeddings. opencode, openai, anthropic, kimi, ollama for LLM
- cross-repo dupes: the thing no other free tool does. finds duplicates across different repos
- incremental: only re-embeds new/changed items, crash-recoverable
- read-only by default: won't touch your repo unless you pass
--apply-labels
| command | what it does |
|---|---|
prism scan |
fetch PRs + issues into local db (GraphQL default, --rest fallback) |
prism dupes |
cluster duplicates, show best picks |
prism rank |
score and rank by quality signals |
prism vision |
check alignment against VISION.md or README |
prism review <n> |
LLM review of a specific PR or issue |
prism triage |
full pipeline in one shot |
prism report |
generate markdown triage report |
prism stats |
database stats, embedding coverage, provider info |
prism doctor |
check config, providers, db health |
prism init |
auto-detect providers, generate config |
prism re-embed |
re-embed with current provider (no github fetch) |
prism compare <n1> <n2> |
compare two PRs/issues for similarity |
prism benchmark |
compare embedding models for quality + speed |
prism reset |
wipe database and start fresh |
prism scan --state all # open + closed items
prism scan --since 7d # only items updated in last 7 days
prism dupes --threshold 0.9 # stricter similarity
prism dupes --cluster 3 # inspect specific cluster
prism rank --top 50 --explain # top 50 with signal breakdown
prism vision --stats # histogram + section breakdown
prism vision --detail # per-item alignment table
prism review --top 10 # batch review top 10
prism review 42 --type issue # review an issue
prism review --show 42 # show saved review
prism triage --output markdown # markdown output for github issues
prism dupes --json | jq '.bestPick' # machine-readable NDJSON
prism compare 42 99 # check similarity between two items
prism benchmark --repo sst/opencode # compare default models
prism benchmark --models nomic-embed-text,qwen3-embedding:0.6b # specify modelsrun the whole thing for free:
- embeddings: ollama +
nomic-embed-text(local, 768 dims, 3.7x faster than qwen3) - LLM: opencode zen (kimi-k2.5-free, $0)
- github: 5000 GraphQL points/hr with a PAT (~36 queries for 3500+ PRs)
brew install ollama
ollama pull nomic-embed-textcloud alternative: jina gives 10M free tokens per key, no account needed.
version: 1
# single repo
repo: owner/repo
# or multi-repo
repos:
- owner/repo1
- owner/repo2
- owner/repo3
# per-repo vision docs
vision_docs:
owner/repo1: ./VISION_1.md
owner/repo2: ./VISION_2.mdcross-repo dupe display: [owner/repo1] #1234 <-> [owner/repo2] #567
| type | provider | cost | notes |
|---|---|---|---|
| embedding | ollama | free | local, default (nomic-embed-text) |
| embedding | jina | free tier | 10M tokens/key |
| embedding | openai | paid | text-embedding-3-small |
| embedding | voyageai | paid | |
| embedding | kimi | free tier | |
| LLM | opencode | free | kimi-k2.5-free, default |
| LLM | openai | paid | gpt-4o-mini |
| LLM | anthropic | paid | |
| LLM | kimi | free tier | |
| LLM | ollama | free | local |
prism can label your github PRs/issues but won't unless you say so:
prism dupes --apply-labels # mark dupes + best picks
prism vision --apply-labels # aligned/drifting/off-vision
prism dupes --dry-run # preview firstread-only by default. always.
run pr-prism automatically on every PR:
# .github/workflows/prism-triage.yml
name: PR Triage
on:
pull_request:
types: [opened, reopened]
schedule:
- cron: '0 0 * * 1' # weekly full scan
jobs:
triage:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: StressTestor/pr-prism@main
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
embedding-provider: jina
embedding-api-key: ${{ secrets.JINA_API_KEY }}on pull_request events, prism checks if the new PR is a duplicate and comments with matches. on schedule, it does a full triage scan.
docker build -t pr-prism .
docker run --rm -v $(pwd):/work -w /work --env-file .env pr-prism triagerun pr-prism as a GitHub App that auto-triages every new issue and PR in real time.
what it does:
- comments on new issues with duplicate matches and source-of-truth links
- optionally auto-closes obvious dupes (>95% similarity, opt-in)
- suggests code owners via CODEOWNERS parsing
- posts a weekly triage digest
- runs a full backlog scan when first installed on a repo
self-hosted setup:
- register a GitHub App at github.com/settings/apps/new
- permissions: issues (read+write), pull requests (read), metadata (read), contents (read)
- webhook URL: your server's public URL + /webhook
- subscribe to events: issues, pull_request, installation, installation_repositories
- provision a server (Oracle ARM free tier works)
- run the setup script:
git clone https://github.com/StressTestor/pr-prism.git /opt/prism-bot cd /opt/prism-bot && ./server/deploy/setup.sh
- copy your .env and private key, then restart the service
per-repo config: drop a .prism.json in your repo root to customize:
{
"autoClose": true,
"autoCloseThreshold": 0.95,
"similarityThreshold": 0.85,
"weeklyDigest": true,
"smartRouting": true
}embeds every PR/issue title+body into a vector, stores in sqlite-vec, computes cosine similarity across all pairs. anything above 0.85 gets clustered. each cluster picks a "best" based on quality score (tests, CI, diff size, reviews, recency, description quality). rest get flagged as dupes.
for repos with 5000+ items, automatically switches from brute-force to ANN pre-filtering via sqlite-vec, then verifies with exact cosine similarity.
pipeline functions are independently importable:
import { createPipelineContext, runScan, runDupes, runRank } from "prism-triage";
const ctx = await createPipelineContext();
await runScan(ctx, { json: true });
const clusters = await runDupes(ctx, { json: true });
ctx.store.close();- matryoshka truncation:
EMBEDDING_DIMENSIONS=512in .env halves storage with 91% clustering agreement - ANN pre-filtering kicks in at 5000+ items automatically
- incremental scan: only embeds new/changed items
- crash recovery: resumes from last embedded item
- batch size: configurable via
batch_sizein config (default 50)
- first scan of ~3500 PRs via GraphQL: ~3 min (pagination + author history)
- embedding ~10000 items with nomic-embed-text locally: ~13 min on M1 Air (was ~47 min with qwen3)
- after that it's incremental
- switching providers:
prism re-embedorprism reset - sqlite-vec pinned at 0.1.7-alpha.2 (alpha but stable in our testing)
see CONTRIBUTING.md
MIT