Skip to content

R00T-Kim/SCOUT

Repository files navigation

SCOUT

SCOUT

Firmware-to-Exploit Evidence Engine

Drop a firmware blob. Get SARIF findings, CycloneDX SBOM+VEX, and a hash-anchored evidence chain.


CI

Python License Stages Zero Deps

SARIF SBOM SLSA

English (this file) | 한국어


Why SCOUT?

Every finding has a hash-anchored evidence chain.

SCOUT does not emit a finding without a file path, byte offset, SHA-256 hash, and rationale. Artifacts are immutable and traceable from firmware blob to final verdict. No black-box scoring.

Static-only findings capped at 0.60 -- we don't inflate.

If a vulnerability hasn't been dynamically validated, its confidence is hard-capped. Promotion to confirmed requires at least one dynamic verification artifact. Honest confidence beats high numbers.

SARIF + CycloneDX VEX + SLSA provenance -- not another custom format.

Findings export to SARIF 2.1.0 for GitHub Code Scanning and VS Code. SBOM ships with CycloneDX 1.6 + Vulnerability Exploitability eXchange. Analysis artifacts carry SLSA Level 2 in-toto attestations.


What's New

Feature Description
SARIF 2.1.0 Export Standard findings output for GitHub Code Scanning, VS Code SARIF Viewer, and CI/CD integration
CycloneDX VEX Vulnerability Exploitability eXchange states (exploitable / affected / not_affected) embedded in SBOM
Precise .dynstr Detection ELF dynamic import table parsing replaces naive byte-scan; FORTIFY_SOURCE coverage detection
40+ SBOM Signatures wolfSSL, mbedTLS, GoAhead, miniUPnPd, SQLite, U-Boot, lighttpd, and 30+ more (up from 8)
Ghidra Headless Scripts 4 analysis scripts: decompile_all, xref_graph, dataflow_trace, string_refs
AFL++ Performance CMPLOG, persistent mode, NVRAM faker, multi-instance campaigns, AFL_ENTRYPOINT support
Reachability-Aware CVE CVE confidence auto-adjusted by BFS network reachability analysis
SLSA L2 Provenance in-toto attestation for analysis artifacts, cosign-ready verification
Benchmark Runner Corpus-based quality measurement with precision / recall / FPR tracking
Quality Gate Overrides Configurable thresholds via environment variables for CI/CD pipelines
GitHub Actions CI Automated pytest (3.10-3.12), ruff lint, and pyright type checking on every push/PR
Findings SHA-256 Manifest stages/findings/stage.json now carries per-artifact SHA-256 hashes for full evidence chain coverage
Handoff Validation firmware_handoff.json is validated via validate_handoff() before write -- missing keys are caught early
Exploit Stage Isolation Each exploit stage has independent import error handling; a single missing dependency no longer skips all five
v2.0: 8 New Analysis Stages Enhanced source detection, semantic classification, taint propagation, FP verification, adversarial triage, PoC refinement, chain construction, C-source identification (34 -> 42 stages)
v2.1: Known CVE Signatures known_cve_signatures.py: 13 CVE patterns (NETGEAR, D-Link, Linksys, ASUS, TP-Link, TRENDnet, Zyxel, Belkin) -- vendor/model/binary matching without SBOM
v2.1: Web Server Auto-Detection enhanced_source.py auto-identifies httpd/lighttpd/boa binaries; HTTP input sources classified as source_type: "http_input" for prioritized taint analysis
v2.1: Ghidra Auto-Detection ./scout wrapper and ghidra_bridge.py probe /opt/ghidra_*, /usr/local/ghidra*, /usr/share/ghidra* -- AIEDGE_GHIDRA_HOME no longer required
v2.0: CLI Modularization __main__.py split from ~4500 lines into 7 focused modules (~660 lines entry point)
v2.0: FirmAE Benchmarking benchmark_firmae.sh for SCOUT vs FirmAE comparison; unpack_firmae_dataset.sh for dataset classification

How It Works

  1. Drop            2. Analyze              3. Collect               4. Review
  ─────────          ──────────              ──────────               ────────
  firmware.bin  ──>  42-stage pipeline  ──>  SARIF findings      ──>  Web viewer
                     runs automatically      CycloneDX SBOM+VEX      VS Code (SARIF)
                                             Evidence chain           GitHub Code Scanning
                                             SLSA attestation         TUI dashboard

Step 1 -- Point SCOUT at any firmware blob (or pre-extracted rootfs).

Step 2 -- The 42-stage pipeline runs end-to-end: unpacking, profiling, binary analysis, enhanced source detection, semantic classification, C-source identification, SBOM generation, CVE scanning, reachability analysis, taint propagation, FP verification, adversarial triage, security assessment, attack surface mapping, exploit chain construction, PoC refinement, optional Ghidra decompilation, optional AFL++ fuzzing.

Step 3 -- Outputs land in a structured run directory: SARIF 2.1.0 findings, CycloneDX 1.6 SBOM with VEX annotations, hash-anchored evidence chain, SLSA L2 provenance attestation, and executive Markdown report.

Step 4 -- Review results in the built-in web viewer, import SARIF into VS Code or GitHub Code Scanning, query artifacts via MCP server from Claude Code/Desktop, or inspect via TUI dashboard.


Quick Start

# Full analysis (all features enabled by default)
./scout analyze firmware.bin

# Deterministic only (no LLM)
./scout analyze firmware.bin --no-llm

# Pre-extracted rootfs (bypasses weak unpacking)
./scout analyze firmware.img --rootfs /path/to/extracted/rootfs

# Analysis-only profile (no exploit chain)
./scout analyze firmware.bin --profile analysis --no-llm

# SARIF export for CI/CD
./scout analyze firmware.bin --no-llm
# -> aiedge-runs/<run_id>/stages/findings/sarif.json

# MCP server for AI agents
./scout mcp --project-id aiedge-runs/<run_id>

# Web viewer
./scout serve aiedge-runs/<run_id> --port 8080

Comparison

Feature SCOUT EMBA FACT FirmAE
SBOM (CycloneDX 1.6) Yes + VEX Yes No No
SARIF 2.1.0 Export Yes No No No
Hash-Anchored Evidence Chain Yes No No No
SLSA L2 Provenance Yes No No No
Reachability-Aware CVE Yes No No No
Confidence Caps (honest scoring) Yes No No No
Ghidra Headless Integration Yes Yes No No
AFL++ Fuzzing Pipeline Yes No No No
3-Tier Emulation Yes Partial No Yes
MCP Server (AI agent integration) Yes No No No
LLM Triage + Synthesis Yes No No No
Web Report Viewer Yes Yes Yes No
Adversarial FP Reduction Yes No No No
Taint Propagation (LLM) Yes No No No
Zero pip Dependencies Yes No No No

Key Features

Feature Description
📦 SBOM & CVE CycloneDX 1.6 SBOM (40+ signatures) + NVD API 2.0 CVE scanning with VEX and reachability-aware confidence
🔍 Binary Analysis ELF hardening audit (NX/PIE/RELRO/Canary) + precise .dynstr symbol detection + FORTIFY_SOURCE + optional Ghidra headless decompilation
🎯 Attack Surface Source-to-sink tracing, IPC detection (5 types), credential auto-mapping
🛡️ Security Assessment X.509 certificate scanning, boot service auditing, filesystem permission checks
🧪 Fuzzing (optional) AFL++ pipeline with CMPLOG, persistent mode, NVRAM faker, binary scoring, harness generation, crash triage — requires Docker + AFL++ image
🐛 Emulation 3-tier (FirmAE / QEMU user-mode / rootfs inspection) + GDB remote debugging
🤖 MCP Server 12 tools exposed via Model Context Protocol for Claude Code/Desktop integration
🧠 LLM Drivers Codex CLI + Claude API + Ollama -- with cost tracking and budget limits
📊 Web Viewer Glassmorphism dashboard with KPI bar, IPC map, risk heatmap, graph visualization
🔗 Evidence Chain Hash-anchored artifacts, confidence caps, exploit tiering, verified chain gating
📜 SARIF Export SARIF 2.1.0 findings for GitHub Code Scanning, VS Code SARIF Viewer, CI/CD
🔒 SLSA Provenance Level 2 in-toto attestation for analysis artifacts, cosign-ready
📋 Executive Reports Auto-generated Markdown reports with top risks, SBOM/CVE tables, attack surface
🔄 Firmware Diff Compare two analysis runs -- filesystem, hardening, and config security changes
📈 Benchmark Runner Corpus-based quality measurement with precision/recall/FPR tracking
🔌 Cross-Binary IPC Chains 5 IPC types (unix_socket, dbus, shm, pipe, exec_chain); shared .rodata string-based cross-binary communication detection
🏷️ Known CVE Signatures 13 built-in CVE patterns (NETGEAR, D-Link, Linksys, ASUS, TP-Link, TRENDnet, Zyxel, Belkin) matched by vendor/model/binary without SBOM

Pipeline (42 Stages)

Firmware --> Unpack --> Profile --> Inventory --> [Ghidra] --> Semantic Classification
    --> SBOM --> CVE Scan --> Reachability --> Endpoints --> Surfaces
    --> Enhanced Source --> C-Source Identification --> Taint Propagation
    --> FP Verification --> Adversarial Triage
    --> Security Assessment --> Graph --> Attack Surface --> Findings
    --> LLM Triage --> LLM Synthesis --> Emulation (3-tier) --> [Fuzzing]
    --> PoC Refinement --> Chain Construction --> Exploit Chain --> PoC --> Verification

New in v2.0: enhanced_source, semantic_classification, taint_propagation, fp_verification, adversarial_triage, poc_refinement, chain_construction, csource_identification.

v2.0 Stage Details:

Stage Module Purpose LLM? Cost
enhanced_source enhanced_source.py Web server auto-detection + INPUT_APIS scan (21 APIs) No $0
semantic_classification semantic_classifier.py 3-pass function classifier (static, haiku, sonnet) Yes Low
taint_propagation taint_propagation.py HTTP-aware inter-procedural taint with call chain Yes Medium
fp_verification fp_verification.py 3-pattern FP removal (sanitizer/non-propagating/sysfile) No $0
adversarial_triage adversarial_triage.py Advocate/Critic LLM debate for FPR reduction Yes Medium
poc_refinement poc_refinement.py Iterative PoC generation from fuzzing seeds (5 attempts) Yes Medium
chain_construction chain_constructor.py Same-binary + cross-binary IPC exploit chains No $0
csource_identification csource_identification.py HTTP input source identification via static sentinel + QEMU No $0

Stages in [brackets] require optional external tools (Ghidra, AFL++/Docker).


Architecture

+------------------------------------------------------------------+
|                      SCOUT (Evidence Engine)                      |
|                                                                   |
|  Firmware --> Unpack --> Profile --> Inventory --> SBOM --> CVE    |
|                                      (+ hardening)  (NVD 2.0)    |
|                                                         |         |
|  --> Security Assessment --> Surfaces --> Reachability --> Find    |
|      (cert/init/fs-perm)                 (BFS graph)              |
|                                                                   |
|  --> [Ghidra] --> LLM Triage --> LLM Synthesis                    |
|  --> Emulation --> [Fuzzing] --> Exploit --> PoC --> Verify        |
|                                                                   |
|  42 stages . stage.json manifests . SHA-256 hashed artifacts      |
|  Outputs: SARIF 2.1.0 + CycloneDX 1.6+VEX + SLSA L2 provenance  |
+------------------------------------------------------------------+
|                   Handoff (firmware_handoff.json)                  |
+------------------------------------------------------------------+
|                    Terminator (Orchestrator)                       |
|  Tribunal --> Validator --> Exploit Dev --> Verified Chain         |
|  (LLM judge)  (emulation)   (lab-gated)    (dynamic evidence)    |
+------------------------------------------------------------------+
Layer Role Deterministic?
SCOUT Evidence production (extraction, profiling, inventory, surfaces, findings) Yes
Handoff JSON contract between engine and orchestrator Yes
Terminator LLM tribunal, dynamic validation, exploit development, report promotion No (auditable)

Exploit Promotion Policy

Iron rule: no Confirmed without dynamic evidence.

Level Requirements Placement
dismissed Critic rebuttal strong or confidence < 0.5 Appendix only
candidate Confidence 0.5-0.8, evidence exists but chain incomplete Report (flagged)
high_confidence_static Confidence >= 0.8, strong static evidence, no dynamic Report (highlighted)
confirmed Confidence >= 0.8 AND >= 1 dynamic verification artifact Report (top)
verified_chain Confirmed AND PoC reproduced 3x in sandbox, complete chain Exploit report

CLI Reference
Command Description
./scout analyze <firmware> Full firmware analysis pipeline
./scout analyze-8mb <firmware> Truncated 8MB canonical track
./scout stages <run_dir> Rerun specific stages on existing run
./scout mcp --project-id <id> Start MCP stdio server
./scout serve <run_dir> Launch web report viewer
./scout tui <run_dir> Terminal UI dashboard
./scout ti TUI --interactive mode (latest run)
./scout tw <run_dir> -t 2 TUI --watch mode (auto-refresh)
./scout to TUI --mode once (latest run)
./scout t TUI latest run (default mode)
./scout corpus-validate <run_dir> Validate corpus manifest
./scout quality-metrics <run_dir> Compute quality metrics
./scout quality-gate <run_dir> Check quality thresholds
./scout release-quality-gate <run_dir> Unified release gate

Exit codes: 0 success, 10 partial, 20 fatal, 30 policy violation

Environment Variables

Core

Variable Default Description
AIEDGE_LLM_DRIVER codex LLM provider: codex / claude / ollama
ANTHROPIC_API_KEY -- API key for Claude driver
AIEDGE_OLLAMA_URL http://localhost:11434 Ollama server URL
AIEDGE_LLM_BUDGET_USD -- LLM cost budget limit
AIEDGE_PRIV_RUNNER -- Privileged command prefix for dynamic stages
AIEDGE_FEEDBACK_DIR aiedge-feedback Terminator feedback directory

SBOM & CVE

Variable Default Description
AIEDGE_NVD_API_KEY -- NVD API key (optional, improves rate limits)
AIEDGE_NVD_CACHE_DIR aiedge-nvd-cache Cross-run NVD response cache
AIEDGE_SBOM_MAX_COMPONENTS 500 Maximum SBOM components
AIEDGE_CVE_SCAN_MAX_COMPONENTS 50 Maximum components to CVE-scan
AIEDGE_CVE_SCAN_TIMEOUT_S 30 Per-request NVD API timeout

LLM Timeouts

Variable Default Description
AIEDGE_LLM_CHAIN_TIMEOUT_S 180 LLM synthesis timeout
AIEDGE_LLM_CHAIN_MAX_ATTEMPTS 5 LLM synthesis max retries
AIEDGE_AUTOPOC_LLM_TIMEOUT_S 180 Auto-PoC LLM timeout
AIEDGE_AUTOPOC_LLM_MAX_ATTEMPTS 4 Auto-PoC max retries

Ghidra

Variable Default Description
AIEDGE_GHIDRA_HOME -- Ghidra installation path (auto-detected if not set)
AIEDGE_GHIDRA_MAX_BINARIES 20 Max binaries to analyze
AIEDGE_GHIDRA_TIMEOUT_S 300 Per-binary analysis timeout

Fuzzing (AFL++)

Variable Default Description
AIEDGE_AFLPP_IMAGE aflplusplus/aflplusplus AFL++ Docker image
AIEDGE_FUZZ_BUDGET_S 3600 Fuzzing time budget (seconds)
AIEDGE_FUZZ_MAX_TARGETS 5 Max fuzzing target binaries

Emulation

Variable Default Description
AIEDGE_EMULATION_IMAGE scout-emulation:latest Tier 1 Docker image
AIEDGE_FIRMAE_ROOT /opt/FirmAE FirmAE installation path
AIEDGE_QEMU_GDB_PORT 1234 QEMU GDB remote port

MCP & Port Scanning

Variable Default Description
AIEDGE_MCP_MAX_OUTPUT_KB 512 MCP response max size
AIEDGE_PORTSCAN_TOP_K 1000 Top-K ports to scan
AIEDGE_PORTSCAN_WORKERS 128 Concurrent scan workers
AIEDGE_PORTSCAN_BUDGET_S 120 Port scan time budget

Quality Gate Overrides

Variable Default Description
AIEDGE_QG_PRECISION_MIN 0.9 Minimum precision threshold
AIEDGE_QG_RECALL_MIN 0.6 Minimum recall threshold
AIEDGE_QG_FPR_MAX 0.1 Maximum false positive rate
AIEDGE_QG_ABSTAIN_MAX 0.25 Maximum abstention rate
Run Directory Structure
aiedge-runs/<run_id>/
├── manifest.json
├── firmware_handoff.json
├── provenance.intoto.jsonl          # SLSA L2 attestation
├── input/firmware.bin
├── stages/
│   ├── tooling/
│   ├── extraction/
│   ├── firmware_profile/
│   ├── inventory/
│   │   └── binary_analysis.json     # per-binary hardening data
│   ├── sbom/
│   │   ├── sbom.json                # CycloneDX 1.6 + CPE index
│   │   └── vex.json                 # VEX exploitability annotations
│   ├── cve_scan/
│   │   └── cve_scan.json            # NVD API CVE matches
│   ├── reachability/
│   │   └── reachability.json        # BFS reachability classification
│   ├── surfaces/
│   │   └── source_sink_graph.json
│   ├── ghidra_analysis/             # optional
│   ├── findings/
│   │   ├── stage.json                # SHA-256 manifest (evidence chain)
│   │   ├── pattern_scan.json
│   │   ├── credential_mapping.json
│   │   ├── chains.json
│   │   └── sarif.json               # SARIF 2.1.0 export
│   ├── fuzzing/                     # optional
│   │   └── fuzz_results.json
│   └── graph/
│       └── communication_graph.json
└── report/
    ├── report.json
    ├── analyst_digest.json
    └── executive_report.md
Verification Scripts
# Evidence chain integrity
python3 scripts/verify_analyst_digest.py --run-dir aiedge-runs/<run_id>
python3 scripts/verify_verified_chain.py --run-dir aiedge-runs/<run_id>

# Report schema compliance
python3 scripts/verify_aiedge_final_report.py --run-dir aiedge-runs/<run_id>
python3 scripts/verify_aiedge_analyst_report.py --run-dir aiedge-runs/<run_id>

# Security invariants
python3 scripts/verify_run_dir_evidence_only.py --run-dir aiedge-runs/<run_id>
python3 scripts/verify_network_isolation.py --run-dir aiedge-runs/<run_id>
python3 scripts/verify_exploit_meaningfulness.py --run-dir aiedge-runs/<run_id>

# SLSA provenance verification
cosign verify-attestation --type slsaprovenance \
  aiedge-runs/<run_id>/provenance.intoto.jsonl

# Quality gates
./scout quality-gate aiedge-runs/<run_id>
./scout release-quality-gate aiedge-runs/<run_id>

# FirmAE benchmarking (1,124 firmware images)
scripts/benchmark_firmae.sh --parallel 8 --time-budget 300 --cleanup
# Options: --dataset-dir, --results-dir, --parallel N, --time-budget S,
#          --stages STAGES, --max-images N, --8mb, --full, --cleanup, --dry-run
scripts/unpack_firmae_dataset.sh                   # FirmAE dataset classifier

Documentation

Document Purpose
Blueprint Full pipeline architecture and design rationale
Status Current implementation status
Artifact Schema Profiling + inventory artifact contracts
Adapter Contract Terminator-SCOUT handoff protocol
Report Contract Report structure and governance rules
Analyst Digest Digest schema and verdict semantics
Verified Chain Evidence requirements for verified chains
Duplicate Gate Cross-run duplicate suppression rules
Determinism Policy Replay gate rules and relaxation policy
Quality SLO Precision, recall, FPR thresholds
Runbook Operator flow for digest-first review
8MB Track Runbook 8MB truncated track operator guide
Known CVE Ground Truth Known CVE ground truth for validation
Upgrade Plan v2 Full v2.0 upgrade plan with appendices
LLM Agent Roadmap LLM integration roadmap and strategy

Security & Ethics

Authorized environments only.

SCOUT is intended for use in controlled environments with proper authorization:

  • Contracted security audits -- vendor-coordinated firmware assessments
  • Vulnerability research -- responsible disclosure with coordinated timelines
  • CTF and training -- designated targets in lab environments

Dynamic validation runs in network-isolated sandbox containers. Exploit profile and lab attestation are enabled by default. No weaponized payloads are included.


Contributing

Contributions are welcome. Before submitting a pull request:

  1. Read Blueprint for architecture context
  2. Run pytest -q -- all tests must pass
  3. Lint ruff check src/ -- zero lint violations
  4. Check pyright src/ -- zero type errors
  5. Follow the existing stage protocol (see Stage in src/aiedge/stage.py)
  6. Zero pip dependencies -- stdlib only for core modules

CI runs these checks automatically on every push and pull request via GitHub Actions.

For new pipeline stages, see the "Adding a New Pipeline Stage" section in CLAUDE.md.


License

MIT


Built for the security research community. Not for unauthorized access.


github.com/R00T-Kim/SCOUT