Every finding has a hash-anchored evidence chain.
SCOUT does not emit a finding without a file path, byte offset, SHA-256 hash, and rationale. Artifacts are immutable and traceable from firmware blob to final verdict. No black-box scoring.
Static-only findings capped at 0.60 -- we don't inflate.
If a vulnerability hasn't been dynamically validated, its confidence is hard-capped. Promotion to
confirmedrequires at least one dynamic verification artifact. Honest confidence beats high numbers.
SARIF + CycloneDX VEX + SLSA provenance -- not another custom format.
Findings export to SARIF 2.1.0 for GitHub Code Scanning and VS Code. SBOM ships with CycloneDX 1.6 + Vulnerability Exploitability eXchange. Analysis artifacts carry SLSA Level 2 in-toto attestations.
| Feature | Description |
|---|---|
| SARIF 2.1.0 Export | Standard findings output for GitHub Code Scanning, VS Code SARIF Viewer, and CI/CD integration |
| CycloneDX VEX | Vulnerability Exploitability eXchange states (exploitable / affected / not_affected) embedded in SBOM |
| Precise .dynstr Detection | ELF dynamic import table parsing replaces naive byte-scan; FORTIFY_SOURCE coverage detection |
| 40+ SBOM Signatures | wolfSSL, mbedTLS, GoAhead, miniUPnPd, SQLite, U-Boot, lighttpd, and 30+ more (up from 8) |
| Ghidra Headless Scripts | 4 analysis scripts: decompile_all, xref_graph, dataflow_trace, string_refs |
| AFL++ Performance | CMPLOG, persistent mode, NVRAM faker, multi-instance campaigns, AFL_ENTRYPOINT support |
| Reachability-Aware CVE | CVE confidence auto-adjusted by BFS network reachability analysis |
| SLSA L2 Provenance | in-toto attestation for analysis artifacts, cosign-ready verification |
| Benchmark Runner | Corpus-based quality measurement with precision / recall / FPR tracking |
| Quality Gate Overrides | Configurable thresholds via environment variables for CI/CD pipelines |
| GitHub Actions CI | Automated pytest (3.10-3.12), ruff lint, and pyright type checking on every push/PR |
| Findings SHA-256 Manifest | stages/findings/stage.json now carries per-artifact SHA-256 hashes for full evidence chain coverage |
| Handoff Validation | firmware_handoff.json is validated via validate_handoff() before write -- missing keys are caught early |
| Exploit Stage Isolation | Each exploit stage has independent import error handling; a single missing dependency no longer skips all five |
| v2.0: 8 New Analysis Stages | Enhanced source detection, semantic classification, taint propagation, FP verification, adversarial triage, PoC refinement, chain construction, C-source identification (34 -> 42 stages) |
| v2.1: Known CVE Signatures | known_cve_signatures.py: 13 CVE patterns (NETGEAR, D-Link, Linksys, ASUS, TP-Link, TRENDnet, Zyxel, Belkin) -- vendor/model/binary matching without SBOM |
| v2.1: Web Server Auto-Detection | enhanced_source.py auto-identifies httpd/lighttpd/boa binaries; HTTP input sources classified as source_type: "http_input" for prioritized taint analysis |
| v2.1: Ghidra Auto-Detection | ./scout wrapper and ghidra_bridge.py probe /opt/ghidra_*, /usr/local/ghidra*, /usr/share/ghidra* -- AIEDGE_GHIDRA_HOME no longer required |
| v2.0: CLI Modularization | __main__.py split from ~4500 lines into 7 focused modules (~660 lines entry point) |
| v2.0: FirmAE Benchmarking | benchmark_firmae.sh for SCOUT vs FirmAE comparison; unpack_firmae_dataset.sh for dataset classification |
1. Drop 2. Analyze 3. Collect 4. Review
───────── ────────── ────────── ────────
firmware.bin ──> 42-stage pipeline ──> SARIF findings ──> Web viewer
runs automatically CycloneDX SBOM+VEX VS Code (SARIF)
Evidence chain GitHub Code Scanning
SLSA attestation TUI dashboard
Step 1 -- Point SCOUT at any firmware blob (or pre-extracted rootfs).
Step 2 -- The 42-stage pipeline runs end-to-end: unpacking, profiling, binary analysis, enhanced source detection, semantic classification, C-source identification, SBOM generation, CVE scanning, reachability analysis, taint propagation, FP verification, adversarial triage, security assessment, attack surface mapping, exploit chain construction, PoC refinement, optional Ghidra decompilation, optional AFL++ fuzzing.
Step 3 -- Outputs land in a structured run directory: SARIF 2.1.0 findings, CycloneDX 1.6 SBOM with VEX annotations, hash-anchored evidence chain, SLSA L2 provenance attestation, and executive Markdown report.
Step 4 -- Review results in the built-in web viewer, import SARIF into VS Code or GitHub Code Scanning, query artifacts via MCP server from Claude Code/Desktop, or inspect via TUI dashboard.
# Full analysis (all features enabled by default)
./scout analyze firmware.bin
# Deterministic only (no LLM)
./scout analyze firmware.bin --no-llm
# Pre-extracted rootfs (bypasses weak unpacking)
./scout analyze firmware.img --rootfs /path/to/extracted/rootfs
# Analysis-only profile (no exploit chain)
./scout analyze firmware.bin --profile analysis --no-llm
# SARIF export for CI/CD
./scout analyze firmware.bin --no-llm
# -> aiedge-runs/<run_id>/stages/findings/sarif.json
# MCP server for AI agents
./scout mcp --project-id aiedge-runs/<run_id>
# Web viewer
./scout serve aiedge-runs/<run_id> --port 8080| Feature | SCOUT | EMBA | FACT | FirmAE |
|---|---|---|---|---|
| SBOM (CycloneDX 1.6) | Yes + VEX | Yes | No | No |
| SARIF 2.1.0 Export | Yes | No | No | No |
| Hash-Anchored Evidence Chain | Yes | No | No | No |
| SLSA L2 Provenance | Yes | No | No | No |
| Reachability-Aware CVE | Yes | No | No | No |
| Confidence Caps (honest scoring) | Yes | No | No | No |
| Ghidra Headless Integration | Yes | Yes | No | No |
| AFL++ Fuzzing Pipeline | Yes | No | No | No |
| 3-Tier Emulation | Yes | Partial | No | Yes |
| MCP Server (AI agent integration) | Yes | No | No | No |
| LLM Triage + Synthesis | Yes | No | No | No |
| Web Report Viewer | Yes | Yes | Yes | No |
| Adversarial FP Reduction | Yes | No | No | No |
| Taint Propagation (LLM) | Yes | No | No | No |
| Zero pip Dependencies | Yes | No | No | No |
| Feature | Description | |
|---|---|---|
| 📦 | SBOM & CVE | CycloneDX 1.6 SBOM (40+ signatures) + NVD API 2.0 CVE scanning with VEX and reachability-aware confidence |
| 🔍 | Binary Analysis | ELF hardening audit (NX/PIE/RELRO/Canary) + precise .dynstr symbol detection + FORTIFY_SOURCE + optional Ghidra headless decompilation |
| 🎯 | Attack Surface | Source-to-sink tracing, IPC detection (5 types), credential auto-mapping |
| 🛡️ | Security Assessment | X.509 certificate scanning, boot service auditing, filesystem permission checks |
| 🧪 | Fuzzing (optional) | AFL++ pipeline with CMPLOG, persistent mode, NVRAM faker, binary scoring, harness generation, crash triage — requires Docker + AFL++ image |
| 🐛 | Emulation | 3-tier (FirmAE / QEMU user-mode / rootfs inspection) + GDB remote debugging |
| 🤖 | MCP Server | 12 tools exposed via Model Context Protocol for Claude Code/Desktop integration |
| 🧠 | LLM Drivers | Codex CLI + Claude API + Ollama -- with cost tracking and budget limits |
| 📊 | Web Viewer | Glassmorphism dashboard with KPI bar, IPC map, risk heatmap, graph visualization |
| 🔗 | Evidence Chain | Hash-anchored artifacts, confidence caps, exploit tiering, verified chain gating |
| 📜 | SARIF Export | SARIF 2.1.0 findings for GitHub Code Scanning, VS Code SARIF Viewer, CI/CD |
| 🔒 | SLSA Provenance | Level 2 in-toto attestation for analysis artifacts, cosign-ready |
| 📋 | Executive Reports | Auto-generated Markdown reports with top risks, SBOM/CVE tables, attack surface |
| 🔄 | Firmware Diff | Compare two analysis runs -- filesystem, hardening, and config security changes |
| 📈 | Benchmark Runner | Corpus-based quality measurement with precision/recall/FPR tracking |
| 🔌 | Cross-Binary IPC Chains | 5 IPC types (unix_socket, dbus, shm, pipe, exec_chain); shared .rodata string-based cross-binary communication detection |
| 🏷️ | Known CVE Signatures | 13 built-in CVE patterns (NETGEAR, D-Link, Linksys, ASUS, TP-Link, TRENDnet, Zyxel, Belkin) matched by vendor/model/binary without SBOM |
Firmware --> Unpack --> Profile --> Inventory --> [Ghidra] --> Semantic Classification
--> SBOM --> CVE Scan --> Reachability --> Endpoints --> Surfaces
--> Enhanced Source --> C-Source Identification --> Taint Propagation
--> FP Verification --> Adversarial Triage
--> Security Assessment --> Graph --> Attack Surface --> Findings
--> LLM Triage --> LLM Synthesis --> Emulation (3-tier) --> [Fuzzing]
--> PoC Refinement --> Chain Construction --> Exploit Chain --> PoC --> Verification
New in v2.0: enhanced_source, semantic_classification, taint_propagation, fp_verification, adversarial_triage, poc_refinement, chain_construction, csource_identification.
v2.0 Stage Details:
| Stage | Module | Purpose | LLM? | Cost |
|---|---|---|---|---|
enhanced_source |
enhanced_source.py |
Web server auto-detection + INPUT_APIS scan (21 APIs) | No | $0 |
semantic_classification |
semantic_classifier.py |
3-pass function classifier (static, haiku, sonnet) | Yes | Low |
taint_propagation |
taint_propagation.py |
HTTP-aware inter-procedural taint with call chain | Yes | Medium |
fp_verification |
fp_verification.py |
3-pattern FP removal (sanitizer/non-propagating/sysfile) | No | $0 |
adversarial_triage |
adversarial_triage.py |
Advocate/Critic LLM debate for FPR reduction | Yes | Medium |
poc_refinement |
poc_refinement.py |
Iterative PoC generation from fuzzing seeds (5 attempts) | Yes | Medium |
chain_construction |
chain_constructor.py |
Same-binary + cross-binary IPC exploit chains | No | $0 |
csource_identification |
csource_identification.py |
HTTP input source identification via static sentinel + QEMU | No | $0 |
Stages in [brackets] require optional external tools (Ghidra, AFL++/Docker).
+------------------------------------------------------------------+
| SCOUT (Evidence Engine) |
| |
| Firmware --> Unpack --> Profile --> Inventory --> SBOM --> CVE |
| (+ hardening) (NVD 2.0) |
| | |
| --> Security Assessment --> Surfaces --> Reachability --> Find |
| (cert/init/fs-perm) (BFS graph) |
| |
| --> [Ghidra] --> LLM Triage --> LLM Synthesis |
| --> Emulation --> [Fuzzing] --> Exploit --> PoC --> Verify |
| |
| 42 stages . stage.json manifests . SHA-256 hashed artifacts |
| Outputs: SARIF 2.1.0 + CycloneDX 1.6+VEX + SLSA L2 provenance |
+------------------------------------------------------------------+
| Handoff (firmware_handoff.json) |
+------------------------------------------------------------------+
| Terminator (Orchestrator) |
| Tribunal --> Validator --> Exploit Dev --> Verified Chain |
| (LLM judge) (emulation) (lab-gated) (dynamic evidence) |
+------------------------------------------------------------------+
| Layer | Role | Deterministic? |
|---|---|---|
| SCOUT | Evidence production (extraction, profiling, inventory, surfaces, findings) | Yes |
| Handoff | JSON contract between engine and orchestrator | Yes |
| Terminator | LLM tribunal, dynamic validation, exploit development, report promotion | No (auditable) |
Iron rule: no Confirmed without dynamic evidence.
| Level | Requirements | Placement |
|---|---|---|
dismissed |
Critic rebuttal strong or confidence < 0.5 | Appendix only |
candidate |
Confidence 0.5-0.8, evidence exists but chain incomplete | Report (flagged) |
high_confidence_static |
Confidence >= 0.8, strong static evidence, no dynamic | Report (highlighted) |
confirmed |
Confidence >= 0.8 AND >= 1 dynamic verification artifact | Report (top) |
verified_chain |
Confirmed AND PoC reproduced 3x in sandbox, complete chain | Exploit report |
CLI Reference
| Command | Description |
|---|---|
./scout analyze <firmware> |
Full firmware analysis pipeline |
./scout analyze-8mb <firmware> |
Truncated 8MB canonical track |
./scout stages <run_dir> |
Rerun specific stages on existing run |
./scout mcp --project-id <id> |
Start MCP stdio server |
./scout serve <run_dir> |
Launch web report viewer |
./scout tui <run_dir> |
Terminal UI dashboard |
./scout ti |
TUI --interactive mode (latest run) |
./scout tw <run_dir> -t 2 |
TUI --watch mode (auto-refresh) |
./scout to |
TUI --mode once (latest run) |
./scout t |
TUI latest run (default mode) |
./scout corpus-validate <run_dir> |
Validate corpus manifest |
./scout quality-metrics <run_dir> |
Compute quality metrics |
./scout quality-gate <run_dir> |
Check quality thresholds |
./scout release-quality-gate <run_dir> |
Unified release gate |
Exit codes: 0 success, 10 partial, 20 fatal, 30 policy violation
Environment Variables
| Variable | Default | Description |
|---|---|---|
AIEDGE_LLM_DRIVER |
codex |
LLM provider: codex / claude / ollama |
ANTHROPIC_API_KEY |
-- | API key for Claude driver |
AIEDGE_OLLAMA_URL |
http://localhost:11434 |
Ollama server URL |
AIEDGE_LLM_BUDGET_USD |
-- | LLM cost budget limit |
AIEDGE_PRIV_RUNNER |
-- | Privileged command prefix for dynamic stages |
AIEDGE_FEEDBACK_DIR |
aiedge-feedback |
Terminator feedback directory |
| Variable | Default | Description |
|---|---|---|
AIEDGE_NVD_API_KEY |
-- | NVD API key (optional, improves rate limits) |
AIEDGE_NVD_CACHE_DIR |
aiedge-nvd-cache |
Cross-run NVD response cache |
AIEDGE_SBOM_MAX_COMPONENTS |
500 |
Maximum SBOM components |
AIEDGE_CVE_SCAN_MAX_COMPONENTS |
50 |
Maximum components to CVE-scan |
AIEDGE_CVE_SCAN_TIMEOUT_S |
30 |
Per-request NVD API timeout |
| Variable | Default | Description |
|---|---|---|
AIEDGE_LLM_CHAIN_TIMEOUT_S |
180 |
LLM synthesis timeout |
AIEDGE_LLM_CHAIN_MAX_ATTEMPTS |
5 |
LLM synthesis max retries |
AIEDGE_AUTOPOC_LLM_TIMEOUT_S |
180 |
Auto-PoC LLM timeout |
AIEDGE_AUTOPOC_LLM_MAX_ATTEMPTS |
4 |
Auto-PoC max retries |
| Variable | Default | Description |
|---|---|---|
AIEDGE_GHIDRA_HOME |
-- | Ghidra installation path (auto-detected if not set) |
AIEDGE_GHIDRA_MAX_BINARIES |
20 |
Max binaries to analyze |
AIEDGE_GHIDRA_TIMEOUT_S |
300 |
Per-binary analysis timeout |
| Variable | Default | Description |
|---|---|---|
AIEDGE_AFLPP_IMAGE |
aflplusplus/aflplusplus |
AFL++ Docker image |
AIEDGE_FUZZ_BUDGET_S |
3600 |
Fuzzing time budget (seconds) |
AIEDGE_FUZZ_MAX_TARGETS |
5 |
Max fuzzing target binaries |
| Variable | Default | Description |
|---|---|---|
AIEDGE_EMULATION_IMAGE |
scout-emulation:latest |
Tier 1 Docker image |
AIEDGE_FIRMAE_ROOT |
/opt/FirmAE |
FirmAE installation path |
AIEDGE_QEMU_GDB_PORT |
1234 |
QEMU GDB remote port |
| Variable | Default | Description |
|---|---|---|
AIEDGE_MCP_MAX_OUTPUT_KB |
512 |
MCP response max size |
AIEDGE_PORTSCAN_TOP_K |
1000 |
Top-K ports to scan |
AIEDGE_PORTSCAN_WORKERS |
128 |
Concurrent scan workers |
AIEDGE_PORTSCAN_BUDGET_S |
120 |
Port scan time budget |
| Variable | Default | Description |
|---|---|---|
AIEDGE_QG_PRECISION_MIN |
0.9 |
Minimum precision threshold |
AIEDGE_QG_RECALL_MIN |
0.6 |
Minimum recall threshold |
AIEDGE_QG_FPR_MAX |
0.1 |
Maximum false positive rate |
AIEDGE_QG_ABSTAIN_MAX |
0.25 |
Maximum abstention rate |
Run Directory Structure
aiedge-runs/<run_id>/
├── manifest.json
├── firmware_handoff.json
├── provenance.intoto.jsonl # SLSA L2 attestation
├── input/firmware.bin
├── stages/
│ ├── tooling/
│ ├── extraction/
│ ├── firmware_profile/
│ ├── inventory/
│ │ └── binary_analysis.json # per-binary hardening data
│ ├── sbom/
│ │ ├── sbom.json # CycloneDX 1.6 + CPE index
│ │ └── vex.json # VEX exploitability annotations
│ ├── cve_scan/
│ │ └── cve_scan.json # NVD API CVE matches
│ ├── reachability/
│ │ └── reachability.json # BFS reachability classification
│ ├── surfaces/
│ │ └── source_sink_graph.json
│ ├── ghidra_analysis/ # optional
│ ├── findings/
│ │ ├── stage.json # SHA-256 manifest (evidence chain)
│ │ ├── pattern_scan.json
│ │ ├── credential_mapping.json
│ │ ├── chains.json
│ │ └── sarif.json # SARIF 2.1.0 export
│ ├── fuzzing/ # optional
│ │ └── fuzz_results.json
│ └── graph/
│ └── communication_graph.json
└── report/
├── report.json
├── analyst_digest.json
└── executive_report.md
Verification Scripts
# Evidence chain integrity
python3 scripts/verify_analyst_digest.py --run-dir aiedge-runs/<run_id>
python3 scripts/verify_verified_chain.py --run-dir aiedge-runs/<run_id>
# Report schema compliance
python3 scripts/verify_aiedge_final_report.py --run-dir aiedge-runs/<run_id>
python3 scripts/verify_aiedge_analyst_report.py --run-dir aiedge-runs/<run_id>
# Security invariants
python3 scripts/verify_run_dir_evidence_only.py --run-dir aiedge-runs/<run_id>
python3 scripts/verify_network_isolation.py --run-dir aiedge-runs/<run_id>
python3 scripts/verify_exploit_meaningfulness.py --run-dir aiedge-runs/<run_id>
# SLSA provenance verification
cosign verify-attestation --type slsaprovenance \
aiedge-runs/<run_id>/provenance.intoto.jsonl
# Quality gates
./scout quality-gate aiedge-runs/<run_id>
./scout release-quality-gate aiedge-runs/<run_id>
# FirmAE benchmarking (1,124 firmware images)
scripts/benchmark_firmae.sh --parallel 8 --time-budget 300 --cleanup
# Options: --dataset-dir, --results-dir, --parallel N, --time-budget S,
# --stages STAGES, --max-images N, --8mb, --full, --cleanup, --dry-run
scripts/unpack_firmae_dataset.sh # FirmAE dataset classifier| Document | Purpose |
|---|---|
| Blueprint | Full pipeline architecture and design rationale |
| Status | Current implementation status |
| Artifact Schema | Profiling + inventory artifact contracts |
| Adapter Contract | Terminator-SCOUT handoff protocol |
| Report Contract | Report structure and governance rules |
| Analyst Digest | Digest schema and verdict semantics |
| Verified Chain | Evidence requirements for verified chains |
| Duplicate Gate | Cross-run duplicate suppression rules |
| Determinism Policy | Replay gate rules and relaxation policy |
| Quality SLO | Precision, recall, FPR thresholds |
| Runbook | Operator flow for digest-first review |
| 8MB Track Runbook | 8MB truncated track operator guide |
| Known CVE Ground Truth | Known CVE ground truth for validation |
| Upgrade Plan v2 | Full v2.0 upgrade plan with appendices |
| LLM Agent Roadmap | LLM integration roadmap and strategy |
Authorized environments only.
SCOUT is intended for use in controlled environments with proper authorization:
- Contracted security audits -- vendor-coordinated firmware assessments
- Vulnerability research -- responsible disclosure with coordinated timelines
- CTF and training -- designated targets in lab environments
Dynamic validation runs in network-isolated sandbox containers. Exploit profile and lab attestation are enabled by default. No weaponized payloads are included.
Contributions are welcome. Before submitting a pull request:
- Read Blueprint for architecture context
- Run
pytest -q-- all tests must pass - Lint
ruff check src/-- zero lint violations - Check
pyright src/-- zero type errors - Follow the existing stage protocol (see
Stageinsrc/aiedge/stage.py) - Zero pip dependencies -- stdlib only for core modules
CI runs these checks automatically on every push and pull request via GitHub Actions.
For new pipeline stages, see the "Adding a New Pipeline Stage" section in CLAUDE.md.
MIT