🦅 FALCON : Force-multiplying Adaptive Learning & Command Overmatch Network

Ontology-driven combat simulation and decision-support research stack combining simulation, GNN uncertainty modeling, RL training, HITL controls, and evaluation tooling.

Core Values : “We do not increase the number of troops. Instead, we redesign the probabilities of the battlefield.”

📒 TL;DR / Executive Summary

FALCON is an end-to-end experimentation repository for military-domain AI research workflows:

Knowledge modeling: ontology-backed scenario and doctrine representation (ontology/).
Environment dynamics: combat engines with fog-of-war, maneuver, missile, naval, and resource constraints (simulator/).
Learning stack: Bayesian/temporal GNN components and multiple RL paradigms (gnn_model/, rl_agent/).
Decision governance: ROE, constraints, preference modeling, and HITL intervention with NL interface (hitl/, ontology/roe_ethics.py).
Evaluation and reporting: Monte Carlo, adversarial/fog/historical benchmarks, metrics, demo artifacts, and test coverage (evaluation/, demo/, tests/).
Experiment utilities: reproducibility, config loading, run registry, multidomain runner, and security helpers (utils/).

This repository is structured for research-to-prototype iteration rather than a single model benchmark.

📌 Highlights / At a Glance

Area	What is present in this repository
Core scripts	`train.py`, `evaluate.py`, `demo.py`, `generate_data.py`
Configuration	Phase/evaluation/simulator/scenario YAMLs under `configs/`
Models/agents	Bayesian GNN, PPO variants, MAPPO/MAT/NFSP/PSRO/hierarchical RL/IRL modules
Human oversight	Constraint parsing, preference learning, NL interface, Pareto/reranking modules
Evaluation outputs	JSON/CSV summaries, plots, AAR HTML (demo path), adversarial/fog/historical benchmarks
Utilities	Experiment registry, multidomain runner, reproducibility, security helpers (`utils/`)
Reproducibility	Seeded CLI flows, pytest suite, GitHub Actions CI

🎯 Why this repository matters

Many repositories focus on isolated algorithm performance. FALCON instead keeps scenario modeling, simulation realism, agent training, decision constraints, and evaluation artifacts in one codebase. That organization is useful for:

testing ideas across full pipelines,
comparing algorithmic variants under common simulation assumptions,
producing inspectable artifacts suitable for review and iteration.

🏗️ System architecture / core modules

flowchart LR
  O[ontology/] --> S[simulator/]
  O --> H[hitl/]
  S --> G[gnn_model/]
  S --> R[rl_agent/]
  G --> R
  R --> E[evaluation/]
  H --> E
  E --> X[explainability/ + visualization/ + demo/]
  U[utils/] --> R
  U --> E

Module responsibilities

ontology/: combat schema, doctrine encoding, intelligence modeling, joint operations, military units, multidomain links, temporal extensions, scenario presets/loaders, ROE/ethics.
simulator/: Lanchester and mixed combat engines, maneuver/fog/weather/cyber/resource effects, missile modeling, naval engine, adversarial scenario composer.
gnn_model/: Bayesian HGT, temporal GNN, uncertainty decomposition, calibration, and temperature scaling.
rl_agent/: blue/red agents, self-play, RARL, MAPPO, MAT, NFSP, PSRO, hierarchical RL, inverse RL, league training, tier-C trainer.
hitl/: constraint parser, preference learning/adaptation, bandit preference, MC-Pareto validator, Pareto candidate generation, replanning, natural-language interface, web interface.
evaluation/: Monte Carlo evaluation, adversarial benchmark, fog A/B protocol, historical benchmark, metric helpers.
explainability/ + visualization/: AAR/counterfactual/attention and runtime dashboard support.
utils/: config loader, reproducibility utilities, experiment registry, experiment tracker, multidomain runner, security helpers.
demo/: compact runnable pipeline and lightweight evaluation/reporting path.

🧠 Key capabilities

Implemented (verified from code layout and scripts)

Ontology-based scenario creation with schema abstractions, intelligence modeling, joint operations, and temporal extensions.
Multi-engine simulation: Lanchester, mixed Lanchester, maneuver, naval, missile, fog-of-war, weather, cyber effects, and resource management.
Six built-in scenario presets: air superiority, amphibious assault, cyber/EW, Korea defense, multidomain contest, urban warfare.
Phase-oriented training entrypoint (--phase in train.py) with optional algorithm comparison in phase 2.
Rich RL algorithm suite: MAPPO, MAT, NFSP, PSRO, RARL, hierarchical RL, inverse RL, league self-play, tier-C trainer.
HITL with natural-language constraint interface, bandit preference, MC-Pareto validation, and real-time replanning.
Three evaluation surfaces:
- root-level evaluator (evaluate.py),
- demo evaluation suites (python -m demo.evaluate),
- adversarial, fog A/B, and historical benchmark protocols (evaluation/).
Experiment tracking via registry, multidomain domain-transfer runner, and reproducibility utilities (utils/).
Data generation pipeline producing scenario/episode/IRL summary datasets.
Artifact-producing demo flow (summary.json, metrics.csv, fig_episode.png, aar.html).
Automated tests and CI lint/test workflow.

Research-oriented but maturity varies

Some modules are clearly prototyping-oriented (large single-file trainers, mixed Korean/English comments, evolving packaging conventions). Treat the repository as a serious experimental platform, not a finalized product package.

📁 Repository structure

falcon/
├── README.md
├── README_KOR.md
├── CONTRIBUTING.md
├── train.py
├── evaluate.py
├── demo.py
├── generate_data.py
├── requirements.txt
├── requirements-dev.txt
├── pyproject.toml
├── setup.py
├── configs/
│   ├── default.yaml
│   ├── phase1.yaml
│   ├── phase2.yaml
│   ├── phase3.yaml
│   ├── evaluation.yaml
│   ├── simulator.yaml
│   ├── simulator_full.yaml
│   └── scenarios/
│       ├── air_superiority.yaml
│       ├── amphibious_assault.yaml
│       ├── cyber_ew.yaml
│       ├── korea_defense.yaml
│       ├── multidomain_contest.yaml
│       └── urban_warfare.yaml
├── ontology/
├── simulator/
├── gnn_model/
├── rl_agent/
├── hitl/
├── evaluation/
├── explainability/
├── visualization/
├── utils/
├── demo/
├── notebook/
│   └── FALCON.ipynb
├── tests/
├── docs/
└── .github/workflows/ci.yml

⚙️ Installation

1) Clone and create a virtual environment

git clone https://github.com/Navy10021/falcon
cd falcon
python -m venv .venv
source .venv/bin/activate

2) Install dependencies

pip install --upgrade pip
pip install -r requirements.txt

3) Optional development dependencies

pip install -r requirements-dev.txt

🛠️ Quick Start

🔰 New to FALCON?
For a structured, step-by-step walkthrough of the full pipeline,
start with 👉 notebook/FALCON.ipynb.

The notebook demonstrates the complete end-to-end workflow —
from data generation and phased training to evaluation —
with explanations and visualizations.

🚀 Root demo (Fastest Way to Run)

python demo.py --seed 42

📦 Package-style demo pipeline

python -m demo.demo --scenario urban_defense --seed 42 --policy rule --out runs/demo_urban

⚡ Fast evaluation

If you just want to quickly validate model behavior:

python evaluate.py --fast
python -m demo.evaluate --suite small --mc 20 --seed 42 --out outputs/eval_small

✅ End-to-end workflow

📌 After reviewing the notebook, you can reproduce the full experimental pipeline via CLI:

# 1) Generate data artifacts
python generate_data.py --quick

# 2) Train by phase
python train.py --phase 1 --config configs/phase1.yaml
python train.py --phase 2 --config configs/phase2.yaml
python train.py --phase 3 --hitl --config configs/phase3.yaml

# 3) Evaluate
python evaluate.py --monte-carlo 200 --fog-level moderate --output-json runs/eval_report.json

# 4) Optional demo suite eval
python -m demo.evaluate --suite standard --mc 100 --seed 0 --out outputs/eval_standard

⚙️Configuration

Core defaults: configs/default.yaml
Phase defaults: configs/phase1.yaml, configs/phase2.yaml, configs/phase3.yaml
Evaluation defaults: configs/evaluation.yaml
Scenario presets: configs/scenarios/*.yaml

train.py supports --config plus CLI overrides for key hyperparameters (episodes, lr, seed, intervals, algorithm mode, etc.).

📊 Evaluation / metrics

Root evaluator (`evaluate.py`)

Key options include:

--monte-carlo, --workers, --max-steps
--fog-level {clear,moderate,maximum}
--fast / --full
--benchmark historical with --benchmark-runs
--output-json <path>

Demo evaluator (`demo.evaluate`)

Suites: small, standard, stress
Outputs:
- leaderboard.csv
- metrics_aggregate.json

Metric helpers

evaluation/metrics.py contains reusable functions for force reduction, exchange ratio, mission efficiency, and trend-style summaries.

🔬 Explainability / HITL / ontology components

Explainability (explainability/): attention visualization, counterfactual tools, AAR helpers.
HITL (hitl/): constraint parser, bandit preference, preference learner/adapters, MC-Pareto validator, Pareto generators, real-time replanning, natural-language interface, web interface.
Ontology (ontology/): combat schema, doctrine encoder, intelligence, joint operations, military units, multidomain structures, temporal extensions, scenario presets/loaders, ROE/ethics validators.

These modules support policy outputs that can be constrained, interpreted, and reviewed rather than used as opaque model scores.

🧪 Example outputs or expected artifacts

`python -m demo.demo ...`

summary.json
metrics.csv
fig_episode.png
aar.html

`python -m demo.evaluate ...`

leaderboard.csv
metrics_aggregate.json

`python generate_data.py ...`

data/scenarios.json
data/episodes.json
data/irl_demos_summary.json
data/data_stats.json
data/ontology_stats.html

Development & testing

ruff check .
black --check .
pytest -q

Helper scripts:

bash scripts/format.sh
bash scripts/test.sh

CI is defined in .github/workflows/ci.yml and runs lint + tests on push/PR.

Documentation

English primary README: README.md (this file).
Korean README: README_KOR.md (Korean-language project narrative and deeper context).
Contributing guide: CONTRIBUTING.md.
Demo-specific guide: demo/DEMO_README.md.
Structure policy: docs/PROJECT_STRUCTURE.md.
Additional reports: docs/report/, docs/reports/, docs/proposal_assets/.

🤝 Contributing

Please follow CONTRIBUTING.md for contribution expectations, test discipline, and PR workflow.

Practical high-impact contribution areas:

simulation fidelity and calibration,
RL algorithm stability and benchmarking,
HITL policy and constraint design,
test coverage and experiment reproducibility,
documentation cleanup and packaging consistency.

🗺️ Roadmap

Implemented baseline

End-to-end scripts for training/evaluation/demo/data generation.
Modular domains for ontology, simulation, GNN, RL, HITL, evaluation, explainability.
Multi-layer test suite and CI integration.

Near-term improvements (inferred from current structure/docs)

Package naming consistency (root scripts vs package-style invocation patterns).
More explicit experiment cards (seed grids, config snapshots, artifact schema standards).
Additional baseline comparators and standardized benchmark tables.
Continued refactoring of large training/evaluation files into smaller modules.

⚖️ License

This project is licensed under the MIT License. See the LICENSE file for details.

🛡️ Responsible Use Notice

FALCON is developed as a research and simulation framework for AI-driven decision support and force optimization modeling.

It is NOT intended for operational deployment in real-world combat, offensive military action, or targeting of specific entities.

Any use of this repository should comply with:

International humanitarian law
AI ethics and safety standards
Responsible research and innovation principles

The authors disclaim responsibility for misuse or unlawful application.

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
.github/workflows		.github/workflows
configs		configs
data		data
demo		demo
docs		docs
evaluation		evaluation
explainability		explainability
gnn_model		gnn_model
hitl		hitl
notebook		notebook
ontology		ontology
rl_agent		rl_agent
scripts		scripts
simulator		simulator
tests		tests
utils		utils
visualization		visualization
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_KOR.md		README_KOR.md
demo.py		demo.py
evaluate.py		evaluate.py
generate_data.py		generate_data.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

🦅 FALCON : Force-multiplying Adaptive Learning & Command Overmatch Network

Core Values : “We do not increase the number of troops. Instead, we redesign the probabilities of the battlefield.”

📒 TL;DR / Executive Summary

📌 Highlights / At a Glance

🎯 Why this repository matters

🏗️ System architecture / core modules

Module responsibilities

🧠 Key capabilities

Implemented (verified from code layout and scripts)

Research-oriented but maturity varies

📁 Repository structure

⚙️ Installation

1) Clone and create a virtual environment

2) Install dependencies

3) Optional development dependencies

🛠️ Quick Start

🚀 Root demo (Fastest Way to Run)

📦 Package-style demo pipeline

⚡ Fast evaluation

✅ End-to-end workflow

⚙️Configuration

📊 Evaluation / metrics

Root evaluator (evaluate.py)

Demo evaluator (demo.evaluate)

Metric helpers

🔬 Explainability / HITL / ontology components

🧪 Example outputs or expected artifacts

python -m demo.demo ...

python -m demo.evaluate ...

python generate_data.py ...

Development & testing

Documentation

🤝 Contributing

🗺️ Roadmap

Implemented baseline

Near-term improvements (inferred from current structure/docs)

⚖️ License

🛡️ Responsible Use Notice

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Root evaluator (`evaluate.py`)

Demo evaluator (`demo.evaluate`)

`python -m demo.demo ...`

`python -m demo.evaluate ...`

`python generate_data.py ...`

Packages