Multi-agent AI system that autonomously solves CTF challenges and hunts bug bounties.
Claude Code-native core with Codex/OMX + Gemini coordination — 25 specialized agents orchestrated through sequential pipelines, shared coordination/ state, and digest-first context compaction.
| CTF Solved | Bug Bounty Targets | AI Agents | MCP Servers | Pipeline Skills | Knowledge Docs | Security Tools |
|---|---|---|---|---|---|---|
| 23 | 30+ | 25 | 14 | 8 | 280K+ | 40+ |
English | 한국어
You: "Solve pwnable.kr fd. SSH: fd@pwnable.kr -p2222 (pw: guest)"
Terminator:
-> spawns @reverser -> analyzes binary, produces attack map
-> spawns @chain -> builds exploit from attack map
-> spawns @critic -> cross-verifies offsets with gdb
-> spawns @verifier -> runs exploit 3x locally, then remote
-> FLAG_FOUND: mama, are you prout of me?
You: "Hunt high-critical bugs on Immunefi until you find one"
Terminator:
-> spawns @target-evaluator -> scores ROI, returns GO
-> spawns @scout + @analyst -> parallel recon + CVE matching
-> spawns @exploiter -> develops working PoC
-> spawns @critic -> fact-checks report
-> spawns @triager-sim -> attacks report before submission
-> SUBMIT: CWE-306 ATO chain, CVSS 7.4 High
Terminator is not a single model prompt. It is a team of 25 AI agents coordinated by an orchestrator through sequential pipelines.
- Adaptive pipeline selection -- the orchestrator picks the right agent sequence based on challenge type (pwn, reversing, web, firmware, smart contract)
- Structured handoffs -- each agent produces a typed artifact (attack map, trigger report, exploit script) that feeds into the next stage
- Verification-first -- every exploit is tested 3x locally before remote execution; every bug bounty report requires a working PoC
- Anti-hallucination -- a dedicated critic agent cross-verifies all addresses, offsets, and constants with independent tool runs (gdb)
- Crash recovery -- checkpoint protocol lets agents resume from exact point of failure after context compaction
- Automated quality gates -- 6 pipeline skills (v6) automatically block OOS findings, weak PoCs, unrealistic threat models, and AI-generated template language before submission
Terminator now keeps Claude Code, Codex/OMX, and Gemini on the same state contract instead of re-reading the same long context on every handoff.
coordination/is the shared source of truth -- manifests, digests, artifacts, checkpoints, and handoffs live undercoordination/sessions/<session_id>/- Claude remains native --
.claude/hooks/*.shpublish session knowledge, checkpoints, and artifact validation intocoordination/ - Codex/OMX remains native --
.omx/hooks/*.mjsbootstrap Codex sessions and mirror.omx/state, notepad, and plans into the same session record - Gemini stays helper-only --
tools/context_digest.py --prefer-geminicompacts large files, directories, and logs into reusable digests - Leader switches are structured --
write-handoff/consume-handoffreplace freeform “re-read everything” transfers
One-time install to make plain omx auto-enable repo hooks:
./scripts/install_omx_wrapper.sh
omx hooks status # In this repo: Plugins enabled: yesOutside repos that expose .omx/hooks/ + tools/coordination_cli.py, the wrapper falls back to the real OMX binary unchanged.
Validated on March 6, 2026 in this repository with real claude, codex, and plain omx sessions.
- Claude custom agents --
reverser,target-evaluator,triager-sim, andfw-profilerwere spawned live and completed successfully - Knowledge injection --
Task|Agenthook path producedtask_knowledgedigests andtask_knowledge_injectedcoordination events during live subagent runs - Claude skills -- the
ctfskill was loaded through the nativeSkilltool and returned the expected pipeline instructions - Codex/OMX -- plain
omxbooted with repo hook plugins enabled; Codex read the repo instructions and returnedcoordination/as the shared source of truth - MCP / knowledge -- live
mcp__git__git_statusandmcp__knowledge-fts__knowledge_statscalls succeeded from Claude; Knowledge FTS responded with the indexed corpus metadata - Optional failure tolerated --
pentest-thinkingmay still fail during startup, but it is treated as non-blocking and does not block core Terminator workflows
┌─────────────────────────┐
│ Claude Code CLI │
│ Orchestrator (Lead) │
└────────────┬────────────┘
│
┌──────────────────┼──────────────────┐
│ │
┌────────▼─────────┐ ┌─────────▼────────┐
│ CTF Pipeline │ │ Bug Bounty v12 │
│ (Sequential) │ │ (Kill Gate) │
└────────┬─────────┘ └─────────┬────────┘
│ │
┌────────────┼────────────┐ ┌──────────────┼──────────────┐
│ │ │ │ │ │
┌───▼───┐ ┌────▼────┐ ┌───▼───┐ ┌───▼────┐ ┌─────▼─────┐ ┌────▼────┐
│Reverser│→ │ Chain/ │→ │Critic │ │ Scout │→ │ Exploiter │→ │ Triager │
│ │ │ Solver │ │ │ │+Analyst│ │ │ │ Sim │
└───────┘ └─────────┘ └───┬───┘ └────────┘ └───────────┘ └─────────┘
┌────▼────┐
│Verifier │→ FLAG_FOUND
└─────────┘
┌──────────────────────────────────────────────────────────┐
│ Infrastructure Layer │
├──────────┬──────────┬───────────┬──────────┬─────────────┤
│ 14 MCP │Knowledge │ Dashboard │ 40+ │ Anti- │
│ Servers │ DB 280K+ │ (Web UI) │ Tools │ Hallucinate │
└──────────┴──────────┴───────────┴──────────┴─────────────┘
Agents communicate through structured artifact passing -- no context is lost between stages:
[HANDOFF from @reverser to @chain]
- Artifact: reversal_map.md
- Confidence: PASS
- Key Result: BOF in read_input(), 64-byte overflow, canary disabled
- Next Action: Build leak + ROP chain targeting system("/bin/sh")
- Claude Code CLI with Anthropic API key
- Codex CLI + oh-my-codex (
omx) for Codex-native sessions - Python 3.10+ with pwntools, z3-solver, angr
- gdb with pwndbg or GEF
- Docker (optional, for full infrastructure stack)
# One-time wrapper install for plain `omx`
cd Terminator && ./scripts/install_omx_wrapper.sh
# Codex/OMX native (wrapper auto-enables repo hook plugins)
cd Terminator && omx
# Claude Code native
cd Terminator && claude
# CTF:
# "Solve pwnable.kr fd. SSH: fd@pwnable.kr -p2222 (pw: guest)"
# Bug Bounty:
# "Hunt high-critical bugs on Immunefi"./terminator.sh ctf /path/to/challenge.zip # CTF
./terminator.sh bounty https://target.com # Bug bounty
./terminator.sh status # Monitor# Standalone (no Docker)
cd web && pip install -r requirements.txt && uvicorn app:app --port 3000
# Full stack (6 Docker services)
docker compose up -d
# Open http://localhost:3000| Condition | Pipeline | Agents |
|---|---|---|
| Trivial -- source provided, 1-3 line bug | Direct solve | 0 |
| Reversing / Crypto -- math inverse needed | reverser -> solver -> critic -> verifier -> reporter |
5 |
| Pwn (clear vuln) -- obvious overflow/fmt | reverser -> chain -> critic -> verifier -> reporter |
5 |
| Pwn (unclear vuln) -- crash discovery needed | reverser -> trigger -> chain -> critic -> verifier -> reporter |
6 |
| Web -- injection, SSRF, auth bypass | scout -> analyst -> exploiter -> reporter |
4 |
| Firmware -- ARM binary diff, emulated PoC | fw-profiler -> fw-inventory -> fw-surface -> fw-validator -> reporter |
5 |
Important
Iron Rule: No Exploit, No Report. Findings without a working PoC are automatically discarded.
Kill Gate Pipeline Details (v12)
Phase 0 @target-evaluator GO / NO-GO assessment + Hard NO-GO rules
oos-check skill OOS pattern pre-screening (12 patterns)
--- GO gate --------------------------------------------------------
Phase 0.2 bb_preflight.py Program rules generation + validation (MANDATORY)
Phase 0.5 @scout Automated tool scan (Slither, Semgrep, Mythril)
Phase 1 @scout + @analyst Parallel recon + OOS cross-check per finding
@threat-modeler Trust boundary mapping, role matrix, state machine extraction (parallel)
@patch-hunter Incomplete fix + variant hunting from security commits (parallel)
coverage-gate skill 80%+ endpoint coverage required
Phase 1.5 @analyst (N parallel) OWASP-category hunting (large codebases only)
@workflow-auditor Business workflow state transition mapping + anomaly detection
@web-tester Web application workflow pack testing
workflow-check Workflow state coverage validation (v12 NEW)
fresh-surface-check New attack surface from recent commits (v12 NEW)
★ Gate 1 @triager-sim (sonnet) Finding viability: 5-Question Destruction Test (KILL/GO)
risk-weighted coverage + workflow-check + fresh-surface-check required
Phase 2 @exploiter PoC development + poc-tier skill (Tier 1-2 only) + Evidence Tier (E1-E4)
threat-model-check Attack prerequisite validation
evidence-tier-check Evidence quality classification (E1-E4) (v12 NEW)
★ Gate 2 @triager-sim (opus) PoC destruction: evidence quality + triager objections + duplicate-graph-check (KILL/GO)
duplicate-graph-check Semantic duplicate detection via knowledge graph (v12 NEW)
Phase 3 @reporter Report draft + CVSS
Phase 4 @critic Fact-check (streamlined, Gate 2 handles viability)
Phase 4.5 @triager-sim Final consistency check (KILL here = Gate bug → feedback loop)
slop-check skill AI slop score (<=2 PASS, 3-5 STRENGTHEN, >5 KILL)
Phase 5 @reporter Final report + ZIP packaging
Phase 6 TeamDelete Cleanup
6 automated pipeline skills + 2 Kill Gates (v12):
| Skill | Gate | Blocks |
|---|---|---|
oos-check |
Phase 0 + per-finding | OOS patterns (oracle staleness, admin-gated, etc.) |
coverage-gate |
Phase 1->2 | <80% endpoint coverage (100% for small targets) |
workflow-check |
Gate 1->2 | Incomplete workflow state coverage (v12 NEW) |
fresh-surface-check |
Gate 1->2 | Missed attack surface from recent commits (v12 NEW) |
poc-tier |
Phase 2->3 | Tier 3-4 PoC (no live execution capture) |
evidence-tier-check |
Phase 2->3 | Evidence below E2 tier (v12 NEW) |
threat-model-check |
Phase 2 | Unrealistic attack prerequisites (2+ controlled) |
duplicate-graph-check |
Gate 2 | Semantic duplicates via knowledge graph (v12 NEW) |
slop-check |
Phase 4.5 | AI template language score >5 |
checkpoint-validate |
Any phase | Fake idle / fake completion detection |
Additional quality gates:
- Phase 0 Hard NO-GO: 3+ audits, 2+ reputable audits, 100+ reports, 3yr+, source inaccessible
- Phase 0.2 Program rules must pass validation before any agent spawns
- Phase 4.5 triager-sim outputs structured JSON for automated reporter feedback loop
- ★ Gate 1 + Gate 2: Kill Gates block findings before PoC dev and before report writing
- ★ v12 Explore Lane: threat-modeler + patch-hunter run parallel in Phase 1; workflow-auditor + web-tester in Phase 1.5
25 specialized agents defined in .claude/agents/ (~8,100 lines of definitions including reference docs).
CTF Agents (8)
| Agent | Role | Model | Output |
|---|---|---|---|
| reverser | Binary analysis, protection detection, attack surface mapping | Sonnet | reversal_map.md |
| trigger | Crash discovery, input minimization, primitive identification | Sonnet | trigger_report.md |
| solver | Reverse computation for reversing/crypto challenges | Opus | solve.py |
| chain | Multi-stage exploit: leak -> overwrite -> shell | Opus | solve.py |
| critic | Security Council deliberation (5 archetypes) + cross-verification | Opus | critic_review.md |
| verifier | Local 3x reproduction -> remote execution | Sonnet | FLAG_FOUND |
| reporter | Writeup with failed attempts and techniques | Sonnet | knowledge/challenges/<name>.md |
| ctf-solver | Legacy single-agent for trivial challenges | Sonnet | solve.py |
Bug Bounty Agents (10)
| Agent | Role | Model | Output |
|---|---|---|---|
| target-evaluator | Program ROI scoring, GO/NO-GO gate | Sonnet | target_assessment.md |
| scout | Recon + duplicate pre-screen + automated tool scanning | Sonnet | recon_report.json |
| analyst | CVE matching, source->sink tracing, confidence scoring | Sonnet | vulnerability_candidates.md |
| threat-modeler | Trust boundary mapping, role matrix, state machine and invariant extraction | Sonnet | threat_model.md |
| patch-hunter | Incomplete fix and variant vulnerability hunting from security commits | Sonnet | patch_analysis.md |
| exploiter | PoC development, quality tier classification, Evidence Tier (E1-E4) | Opus | PoC scripts + evidence |
| workflow-auditor | Business workflow state transition mapping and anomaly detection | Sonnet | workflow_audit.md |
| triager-sim | Adversarial triage -- 3 modes: finding-viability (Gate 1), PoC-destruction (Gate 2), report-review | Sonnet/Opus | SUBMIT / STRENGTHEN / KILL |
| source-auditor | Deep source code audit, cross-file taint analysis | Opus | audit_findings.md |
| defi-auditor | Smart contract analysis, DeFi-specific vulnerability patterns | Opus | defi_audit.md |
Firmware Agents (4)
| Agent | Role | Model | Output |
|---|---|---|---|
| fw-profiler | Firmware image profiling, architecture detection | Sonnet | firmware_profile.md |
| fw-inventory | Binary inventory, version extraction, CVE matching | Sonnet | firmware_inventory.md |
| fw-surface | Attack surface mapping, binary diff analysis | Sonnet | attack_surface.md |
| fw-validator | QEMU emulation, dynamic PoC validation | Sonnet | validation_results.md |
Specialized Agents (3)
| Agent | Role | Model | Output |
|---|---|---|---|
| mobile-analyst | Android/iOS app analysis, API interception | Sonnet | mobile_findings.md |
| recon-scanner | Automated reconnaissance, subdomain/port discovery | Sonnet | recon_results.json |
| web-tester | Web application testing, auth bypass, injection | Sonnet | web_findings.md |
Agent Resilience -- Checkpoint Protocol
All work agents implement a checkpoint protocol for crash/compaction recovery:
- checkpoint.json -- Agent writes state at every phase transition (status, completed steps, critical facts)
- Fake Idle Detection -- Orchestrator reads checkpoint status;
in_progress+ idle = re-spawn with context - Resume on Re-spawn -- New agent reads existing checkpoint, skips completed phases
- Error Reporting --
status: "error"with description; orchestrator fixes environment before re-spawn
[!NOTE] Never assume "artifact exists = work complete." Only
checkpoint.status == "completed"is trustworthy.
A unified full-text search over 280K+ security documents -- zero external dependencies, built on SQLite FTS5 with BM25 ranking and progressive query relaxation.
| Source | Documents | Content |
|---|---|---|
| Internal techniques | 82 | Attack patterns, CTF writeups |
| External repos (47) | 12,896 | HackTricks, GTFOBins, PayloadsAllTheThings, how2heap, OWASP, SecLists, InternalAllTheThings + 40 more |
| ExploitDB | 46,960 | Exploit descriptions, platforms, CVEs |
| Nuclei templates | 14,871 | Vulnerability detection templates with severity |
| PoC-in-GitHub | 18,235 | CVE proof-of-concept repositories |
| trickest-cve | 155,121 | CVE details with products, CWE, PoC links |
| Web articles | 30+ | Crawled security blog posts, writeups, OWASP cheatsheets |
Agents query via the knowledge-fts MCP server:
smart_search("QNAP buffer overflow strcpy") # RECOMMENDED: auto-relaxes (AND → OR → top-terms)
technique_search("heap tcache poisoning") # top 5 technique docs
technique_search("IDOR") # auto-expands to "insecure direct object reference"
exploit_search("CVE-2021-44228") # CVE routed to trickest-cve + PoC first
search_all("race condition double spend") # all 7 tables, cross-table ranked33 security abbreviations auto-expand: uaf, bof, sqli, ssrf, toctou, xxe, ssti, idor, rce, lpe, cmdinjection, etc.
Auto-Rebuild, Web Fetcher, and CLI
A PostToolUse hook automatically re-indexes when knowledge/techniques/ or knowledge/challenges/ files are modified. Full rebuild: ~60 seconds. Incremental update: <1 second.
python tools/knowledge_indexer.py build # Full rebuild
python tools/knowledge_indexer.py smart-search "heap uaf" # Relaxed cross-table search
python tools/knowledge_indexer.py stats # Row counts per table
# Web content fetcher (adds to web_articles table)
python tools/knowledge_fetcher.py fetch <url> # Single URL via jina.ai
python tools/knowledge_fetcher.py bulk knowledge/sources/blogs.md # Bulk from URL list
python tools/knowledge_fetcher.py update # Re-fetch stale (>30 days)
python tools/knowledge_fetcher.py stats # Web articles breakdown14 MCP servers give agents direct programmatic access to security tools.
Optional user-level MCPs may appear in local claude/omx startup logs; if pentest-thinking is unavailable, core Terminator pipelines still run.
All 14 MCP Servers
| Server | Capability |
|---|---|
| mcp-gdb | Breakpoints, memory inspection, stepping, backtrace |
| radare2-mcp | Disassembly, decompilation, xrefs, function analysis |
| ghidra-mcp | Headless decompilation, structures, enums |
| frida-mcp | Dynamic instrumentation, hooking, process spawning |
| pentest-mcp | nmap, gobuster, nikto, john, hashcat |
| nuclei-mcp | 12K+ vulnerability detection templates |
| codeql-mcp | Semantic taint tracking, variant analysis |
| semgrep-mcp | Pattern-based static analysis |
| playwright | Browser automation for web exploitation |
| context7 | Up-to-date library documentation lookup |
| graphrag-security | Security knowledge graph: exploit lookup, similar findings, drift detection |
| knowledge-fts | 280K+ document BM25 search with smart_search relaxation, 33 synonyms, web_articles, cross-table ranking |
| lightpanda | Lightweight headless browser (9x less memory, 11x faster): page fetch, markdown, links, JS eval, semantic tree |
| browser-use | AI-driven browser automation: natural language web tasks, data extraction, screenshots |
| opendataloader-pdf | PDF parsing: markdown, JSON with bounding boxes, table extraction, HTML |
Security Tools (40+)
Reverse Engineering & Exploit Dev
- Disassembly: radare2, objdump, strings, readelf, nm
- Decompilation: Ghidra (MCP), jadx
- Debugging: gdb + pwndbg + GEF (93 commands), strace
- Symbolic: angr, unicorn, z3-solver, keystone
- Exploit: pwntools, ROPgadget, ropper, one_gadget
- Crypto: pycryptodome, sympy, z3-solver
Web Security
- Injection: sqlmap, commix, dalfox (XSS)
- SSRF: SSRFmap (18+ modules)
- Recon: ffuf, subfinder, katana, httpx, gau, waybackurls, arjun
- Scanning: nuclei (12K+ templates), trufflehog (800+ secret types)
- Crawling: crawl4ai (Playwright-based, JS rendering, stealth mode)
Code Analysis & Smart Contracts
- Semantic: CodeQL (taint tracking, variant analysis)
- Static: Semgrep (custom rule authoring)
- Smart Contract: Slither (100+ detectors), Mythril (EVM symbolic), Foundry 1.5.1
- AI: Gemini CLI (gemini-3-pro-preview)
Firmware Analysis
- QEMU ARM user-mode emulation, rootfs mounting
- Binary diff across firmware versions
- Architecture detection, library inventory
Reference Databases
- ExploitDB (47K+ exploits), PoC-in-GitHub (18K+ CVE PoCs)
- PayloadsAllTheThings (70+ vuln categories), trickest-cve (154K+ CVE PoCs)
- HackTricks + GTFOBins, SecLists
Skill Plugins (Trail of Bits, Sentry, Anthropic)
| Plugin | Skills | Purpose |
|---|---|---|
| static-analysis | semgrep, codeql, sarif-parsing | Automated static analysis |
| variant-analysis | variant-analysis | CVE variant pattern search |
| testing-handbook | aflpp, libfuzzer, harness-writing + 12 more | Fuzzing (Trail of Bits) |
| insecure-defaults | insecure-defaults | Hardcoded secrets, weak auth |
| sharp-edges | sharp-edges | Dangerous API detection |
| audit-context | audit-context-building | Pre-audit architecture mapping |
| dwarf-expert | dwarf-expert | DWARF debug format |
| yara-authoring | yara-rule-authoring | YARA rule creation |
| differential-review | differential-review | Git diff security review |
| sentry-skills | find-bugs, security-review, code-review | Bug detection |
| Category | Count | Techniques Used |
|---|---|---|
| Pwn (heap, stack, ROP) | 10 | pwntools, ROP chains, GOT overwrite, shellcode |
| Reversing (VM, obfuscation) | 6 | GDB Oracle, DFA extraction, z3, custom VM analysis |
| Crypto | 2 | AES-ECB, z3 constraint solving |
| Misc (logic, filter bypass) | 2 | Operator precedence, binary search |
| Metric | Count |
|---|---|
| Programs assessed | 30+ |
| Platforms | Immunefi, HackerOne, Bugcrowd, PSIRT |
| Categories | Smart Contract (DeFi), Web App, VPN, IoT/Firmware, AI/SDK |
| Smart contracts analyzed | 50+ |
| Vulnerability leads investigated | 100+ |
| Findings with working PoC | 15+ |
Specific targets and findings are kept private until disclosure is complete.
Research Foundations
v7 -- LLM Prompting Research Applied (13 papers)
Agent prompts were systematically improved based on 13 LLM prompting research papers. Core 12 agents reduced from 5,391 to 3,515 lines (35% reduction) while adding higher-quality reasoning structures:
| Technique | Paper | Applied To | Effect |
|---|---|---|---|
| IRON RULES primacy+recency | Lost in the Middle (Liu et al.) | All 12 agents | Critical rules at top + recap at bottom, +22%p recall |
| Structured Reasoning (OBSERVED/INFERRED/ASSUMED/RISK/DECISION) | Chain-of-Thought (Wei et al.) | All 12 agents | Replaces unstructured Think-Before-Act |
| Self-Verification (CoVe) | Chain-of-Verification (Dhuliawala et al.) | chain, solver, exploiter, trigger | Independent fact-check before submission, -77% hallucination |
| Few-Shot examples | The Prompt Report + CoT | critic, reverser, triager-sim, solver | APPROVED/REJECTED, reversal_map, SUBMIT/KILL, z3 modeling examples |
| Tree of Thoughts branching | Tree of Thoughts (Yao et al.) | chain, solver | Top-3 strategy evaluation before coding |
| ReAct loops (THOUGHT→ACTION→OBSERVATION) | ReAct (Yao et al.) | reverser, scout, trigger | Forced strategy updates on contradicting observations |
| Self-Consistency (multi-solution detection) | Self-Consistency (Wang et al.) | solver | Detects under-constrained z3 models |
| Aggressive pruning + reference split | APE (Zhou et al.) | scout (-76%), analyst (-61%), exploiter (-45%) | Content moved to _reference/ directory |
Dual-Approach trigger reduced from 3 to 2 failures (ToT evaluates alternatives on first attempt).
Agent definitions also incorporate patterns from 10+ LLM security frameworks:
| Pattern | Origin | Implemented In |
|---|---|---|
| Variant Analysis -- CVE patch diffs as seeds | Google Big Sleep (Project Zero + DeepMind) | analyst |
| LLM-first PoV Generation | RoboDuck (AIxCC 3rd place) | chain, solver |
| Symbolic + Neural Hybrid | ATLANTIS (AIxCC 1st place) | solver |
| No Exploit, No Report | Shannon, XBOW | Orchestrator gate |
| Iterative Context Gathering -- 3-pass backtracing | Vulnhuntr | analyst |
| Dual-Approach Parallel -- 2 strategies after 2 failures | RoboDuck | Orchestrator |
| OWASP Parallel Hunters | Shannon | analyst (Phase 1.5) |
| PoC Quality Tier Gate (1-4) | XBOW | exploiter |
| Adversarial Triage Simulation | Internal | triager-sim |
| Prompt Injection Guardrails | CAI (300+ LLM agents) | All agents |
| 4-Layer Validation | NeuroSploit | critic, triager-sim |
| Security-Aware Compression | CyberStrikeAI | All agents (context preservation) |
| Exploit Chain Rules | NeuroSploit | exploiter (web targets) |
| Security Council (5-archetype deliberation) | Consciousness Council (K-Dense) | critic |
Anti-Hallucination System -- The critic agent runs a Security Council deliberation with 5 adversarial archetypes before any verdict:
| Archetype | Role |
|---|---|
| The Interrogator | Adversarial triager -- demands live evidence for every claim ("Show me the GDB output, or it didn't happen") |
| The Empiricist | Data-driven verification -- no evidence = no approval |
| The Architect | Structural analysis -- does the chain design hold under all conditions? |
| The Triager | Platform reviewer simulation -- "What's the first reason I'd close this?" |
| The Historian | Pattern matching against past failures from knowledge base |
The Interrogator has asymmetric veto power: any critical claim without live evidence = automatic REJECT. Combined with a 6-point validation:
- Evidence Check -- Every claim must cite specific output (exact string, header, timing)
- Negative Controls -- Baseline comparison mandatory (normal vs payload response)
- Proof of Execution -- Per-vuln-type: XSS must fire JS, SQLi must extract DB content
- Speculative Language Detection -- "could be", "might be", "potentially" auto-flagged
- Severity Calibration -- 200 OK without data is not High
- Confidence Score -- 0-100, below 70 = REJECT
Competitor-Adopted Patterns -- Ported from 10 open-source security AI frameworks:
| Pattern | Source | Implementation |
|---|---|---|
| Web Exploit Chain Engine | NeuroSploit | tools/web_chain_engine.py -- SSRF->internal, SQLi->DB-type auto-chain |
| Flag Pattern Detector | PentestGPT | tools/flag_detector.py -- 8+ regex patterns, strict validation |
| Anti-Hallucination Prompts | NeuroSploit | tools/validation_prompts.py -- 8 composable prompts, 0-100 confidence |
| MITRE Auto-Mapping | RedAmon | tools/mitre_mapper.py -- 36 CWE->CAPEC->ATT&CK mappings |
Project Structure
Terminator/
├── .claude/
│ ├── agents/ # 25 agent definitions (~8,100 lines)
│ │ ├── reverser.md # Binary analysis
│ │ ├── chain.md # Exploit chain building
│ │ ├── critic.md # Cross-verification + Security Council
│ │ ├── target_evaluator.md # GO/NO-GO + Hard NO-GO rules
│ │ ├── triager_sim.md # Adversarial triage + JSON feedback
│ │ ├── threat-modeler.md # Trust boundary mapping, role matrix, state machine extraction
│ │ ├── workflow-auditor.md # Business workflow state transition mapping
│ │ ├── patch-hunter.md # Incomplete fix + variant hunting from security commits
│ │ ├── fw_*.md # Firmware analysis (4 agents)
│ │ ├── _reference/ # Shared reference docs (commands, patterns, tools)
│ │ │ └── workflow_packs.md # Workflow pack definitions (v12 NEW)
│ │ └── ... # + 16 more specialists
│ ├── rules/ # Pipeline procedure documents
│ │ ├── bb_pipeline_v12.md # Bug Bounty v12 Kill Gate + Explore Lane (NEW)
│ │ └── ctf_pipeline.md # CTF pipeline procedure
│ └── skills/ # 8 pipeline skills (v6 NEW)
│ ├── bounty/ # Bug bounty pipeline orchestration
│ ├── ctf/ # CTF pipeline orchestration
│ ├── oos-check/ # Out-of-Scope pre-screening (12 patterns)
│ ├── poc-tier/ # PoC quality classification (Tier 1-4)
│ ├── coverage-gate/ # Endpoint coverage gate (80%+)
│ ├── threat-model-check/ # Attack prerequisite validation
│ ├── slop-check/ # AI slop detection (0-10 score)
│ └── checkpoint-validate/ # Agent idle/completion verification
├── knowledge/ # Accumulated experience
│ ├── index.md # Master index
│ ├── knowledge.db # FTS5 search DB (280K+ docs, 7 tables, ~259MB)
│ ├── challenges/ # Per-challenge writeups
│ ├── techniques/ # Reusable attack patterns + competitor analysis
│ └── triage_objections/ # Triager objection patterns by vuln category (v12 NEW)
├── research/ # LLM security framework analysis (14 docs)
├── tools/ # Pipeline tooling
│ ├── bb_preflight.py # Pipeline gate validator (rules, coverage, workflow-check, fresh-surface-check, evidence-tier-check, duplicate-graph-check, --json)
│ ├── knowledge_indexer.py # FTS5 DB builder (7 tables, smart_search, zero dependencies)
│ ├── knowledge_fetcher.py # Web content fetcher (jina.ai → web_articles table)
│ ├── web_chain_engine.py # Web exploit chain engine (10 rules)
│ ├── flag_detector.py # CTF flag pattern detector (8+ formats)
│ ├── validation_prompts.py# Anti-hallucination prompt library
│ ├── mitre_mapper.py # CVE->CWE->CAPEC->ATT&CK (36 CWEs)
│ ├── attack_graph/ # Neo4j + filesystem attack surface graphs
│ ├── dag_orchestrator/ # DAG pipeline scheduling + Claude CLI handler
│ ├── sarif_generator.py # SARIF 2.1.0 output
│ └── mcp-servers/ # nuclei, codeql, semgrep, knowledge-fts, graphrag
├── web/ # FastAPI + D3 dashboard (standalone + Docker)
│ ├── app.py # REST API + WebSocket backend
│ └── static/index.html # Single-page dashboard (5 tabs)
├── targets/ # Bug bounty workspaces (30+ missions)
├── tests/ # CTF files + E2E replay benchmarks
├── CLAUDE.md # Orchestrator instructions (v12)
├── terminator.sh # Autonomous mode launcher
├── docker-compose.yml # Full stack infrastructure
└── README.md
This system is designed exclusively for authorized security work:
- CTF / Wargames -- Practice environments designed for learning
- Bug bounty programs -- Only targets with explicit authorization
- Security research -- Controlled environments with proper scope
All findings follow responsible disclosure. Prompt injection guardrails protect agents from malicious code in analysis targets.