| title | Disaster Relief Coordination Env |
|---|---|
| emoji | π |
| colorFrom | red |
| colorTo | red |
| sdk | docker |
| pinned | false |
| license | mit |
| short_description | Multi-zone disaster relief AI env for OpenEnv |
Meta PyTorch OpenEnv Hackathon by Scaler β OpenEnv-compliant AI training environment
Every year, delayed disaster response costs lives. DRC-Env puts an AI agent in the role of an emergency coordinator managing real triage decisions: deploy rescue teams, route supplies, and call airlifts across multiple disaster zones β all under time pressure, resource scarcity, and deliberately injected false SOS signals designed to drain resources from zones that actually need them.
Live demo: https://huggingface.co/spaces/krishpotanwar/disaster-relief-env
Most benchmark environments test agents on clean, well-labeled observations. DRC-Env deliberately breaks that assumption in three ways:
1. False SOS signals that look real.
Zones H, I, and J in task_3 broadcast genuine-looking SOS signals with non-zero severity scores β but have zero casualties. There is no flag in the observation space that marks a signal as false; is_false_sos is hidden and only visible to the grader. An agent that cannot distinguish signal noise from genuine distress wastes scarce airlifts and misses the zones that matter. A reward penalty of β0.05 per false-SOS resource deployment reinforces this.
2. Cascading failures mid-episode. At step 7 of task_3, a dam breaks and floods Zone E with 60 new casualties. Road conditions shift with weather. The agent must continuously replan β a static resource allocation computed at step 0 fails catastrophically. This tests generalization, not memorization.
3. Native PyTorch integration in the decision loop.
Stage 1 of the baseline pipeline is a trained PyTorch MLP (ZoneScorerNet) that runs inference every step at under 1ms. Its output β a priority score per zone β feeds directly into the LLM triage prompt. This is not a toy PyTorch import; it replaces what would otherwise require the LLM to reason from raw numbers, demonstrably improving score on task_2 and task_3.
| Task | Name | Difficulty | Zones | Steps | Target Score |
|---|---|---|---|---|---|
| task_1 | Single Zone Flood Response | Easy | 1 | 10 | 0.70β0.85 |
| task_2 | Multi-Zone Earthquake Response | Medium | 5 | 15 | 0.40β0.60 |
| task_3 | Cyclone + Cascading Failures + False SOS | Hard | 10 | 20 | 0.20β0.40 |
task_3 in detail: Zones H, I, J send plausible SOS signals with zero real casualties. At step 7 a dam breaks, flooding Zone E (+60 casualties). Weather shifts drive dynamic road blocks. Optimal play requires false-alarm detection, immediate dam-break redeployment, and airlift precision.
The included baseline fulfills the hackathon's PyTorch requirement through a production-style architecture, not a trivial import:
Stage 1: PyTorch ZoneScorerNet β local MLP, <1ms inference, runs every step
Stage 2: Triage Agent β LLM, false SOS detection + deadline alerts
Stage 3: Planner Agent β LLM, 3-step lookahead resource allocation
Stage 4: Action Agent β LLM + hard constraint validator + deterministic fallback
ZoneScorerNet (Stage 1):
Architecture: MLP 6β16β1 with Sigmoid output. Input features: severity, casualty_ratio, supply_ratio, road_blocked, unattended, time_pressure. Trained on 50K synthetic examples generated from the environment's own reward dynamics. False SOS zones consistently score near 0.0 β the network learns to ignore severity noise when casualty and supply ratios are zero. Training time: ~8 seconds on CPU. Pre-trained weights ship inside the Docker image so judges see no setup friction.
Anti-hallucination strategy (Stage 4): LLMs hallucinate invalid zone IDs and over-commit resources. Three layers prevent this:
- Constraint injection β every prompt lists valid zone IDs, blocked roads, and exact resource counts
- Post-LLM hard validator (
_validate_and_fix()) β rejects any action that violates game rules - Deterministic fallback heuristic β executes if LLM output fails validation, ensuring the episode always completes
LLM: llama-3.3-70b-versatile via Groq (free tier, high rate limits)
The /grader endpoint scores all 8 dimensions defined in the OpenEnv spec:
| Dimension | How It Is Measured |
|---|---|
| Task completion | Casualties rescued + supply gaps closed across all zones |
| Resource efficiency | Penalty for idle teams, supply over-delivery, and false SOS waste |
| Time performance | Urgency decay for high-severity zones left unattended |
| Decision quality | Critical rescues (severity β₯ 0.75) as a fraction of total |
| Adaptability | Score delta before vs. after the dam-break event (task_3) |
| False signal handling | Resources deployed to is_false_sos zones (hidden grader field) |
| Airlift precision | Airlifts used only on blocked + critical zones vs. total airlifts used |
| Episode score | (tanh(cumulative_reward / max_steps * 2) + 1) / 2 β normalized to [0, 1] |
zones[]:
zone_id string Zone identifier (AβJ)
casualties_remaining int Remaining casualties needing rescue
supply_gap int Remaining supply deficit
severity float [0,1] Urgency score (hides casualties_critical)
road_blocked bool Whether ground deployment is blocked
teams_present int Rescue teams currently in zone
sos_active bool SOS signal active (real OR false β indistinguishable)
resources:
teams_available int Teams at HQ ready to deploy
supply_stock int Supply units available
airlifts_remaining int Airlifts left (scarce)
teams_in_transit dict[str,int] Teams currently traveling (1-step delay)
step_number int
steps_remaining int
weather string [clear, storm, flood]
last_action_result string [success, invalid, blocked, insufficient_resources, none]
Hidden from agent, visible to grader only: casualties_critical, is_false_sos
| Action | Parameters | Description |
|---|---|---|
deploy_team |
to_zone, units |
Move teams to zone. Fails if road blocked. |
send_supplies |
to_zone, units |
Route supply units to zone. |
airlift |
to_zone, type (rescue/supply) |
Bypass road block. Consumes 1 airlift. |
recall_team |
from_zone, units |
Pull teams back to HQ (1-step transit delay). |
wait |
β | Do nothing. Penalized every step. |
step_reward = clamp(R_positive - R_negative, -1.0, 1.0)
episode_score = (tanh(cumulative_reward / max_steps * 2) + 1) / 2
| Component | Weight | Description |
|---|---|---|
| rescue progress | +0.40 | Normalized casualties rescued |
| supply gap closed | +0.20 | Supply deficit addressed |
| zone completion | +0.15 | Zone fully rescued + supplied |
| critical rescues | +0.15 | Rescues from severity β₯ 0.75 zones |
| airlift precision | +0.10 | Smart airlift use on blocked+critical zones |
| critical deaths | β0.40 | Critical casualties expired |
| urgency decay | β0.15 | High-severity zones left unattended |
| overcommitment | β0.10 | Teams idling in completed zones |
| supply waste | β0.05 | Over-delivery of supplies |
| false SOS | β0.05 | Resources deployed to false alarm zones |
| wait penalty | β0.05 | Flat penalty per wait action |
- Python 3.11+
GROQ_API_KEY(free at console.groq.com) orOPENAI_API_KEY
# Install dependencies
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt
# Pre-train ZoneScorerNet (one-time, ~8 seconds)
python agents/train_zone_scorer.py
# Start the FastAPI server
python main.py
# Server runs at http://localhost:7860
# Run the 4-stage baseline agent on all tasks
export GROQ_API_KEY=your_key_here
python inference_v2.py| Method | Path | Description |
|---|---|---|
| GET | /health |
Liveness check |
| GET | /tasks |
List all tasks + action schema |
| POST | /reset |
Start new episode β {session_id, observation} |
| POST | /step |
Submit action β {observation, reward, done, info} |
| GET | /state/{session_id} |
Full state including hidden fields (for graders) |
| POST | /grader |
Score a completed episode (all 8 dimensions) |
| POST | /baseline |
Run baseline agent on all 3 tasks (requires API key) |
# Start episode
curl -X POST http://localhost:7860/reset \
-H "Content-Type: application/json" \
-d '{"task_id": "task_1"}'
# Take action (use session_id from reset response)
curl -X POST http://localhost:7860/step \
-H "Content-Type: application/json" \
-d '{"session_id": "<id>", "action": {"action": "deploy_team", "to_zone": "A", "units": 3}}'# Build (ZoneScorerNet trains automatically at build time)
docker build -t drc-env .
# Run
docker run -p 7860:7860 -e GROQ_API_KEY=your_key drc-envSee DEPLOY_TO_HF.txt for the full Hugging Face Spaces deployment guide.
| Task | Score Range | Notes |
|---|---|---|
| task_1 | 0.75β0.90 | Easy, single zone |
| task_2 | 0.45β0.65 | Medium, multi-zone with resource scarcity |
| task_3 | 0.25β0.45 | Hard, false SOS + cascading failures |
Scores are reproducible at temperature=0.
disasterman-v2/
βββ main.py # FastAPI server (OpenEnv API)
βββ environment.py # DisasterEnv β world state + step logic
βββ models.py # Pydantic models (ActionModel, ObservationModel)
βββ reward.py # Reward function components
βββ graders.py # Episode grader β returns [0.0, 1.0] across 8 dimensions
βββ tasks.py # Task configurations (task_1, task_2, task_3)
βββ inference_v2.py # 4-stage agent pipeline runner
βββ test_env.py # Environment unit tests
βββ requirements.txt
βββ Dockerfile
βββ openenv.yaml # OpenEnv spec
βββ README.md
βββ DEPLOY_TO_HF.txt # HF Spaces deployment guide
βββ agents/
βββ zone_scorer.py # PyTorch ZoneScorerNet (Stage 1)
βββ zone_scorer_weights.pt # Pre-trained weights (ships in Docker image)
βββ train_zone_scorer.py # Training script (runs at Docker build time)
βββ triage_agent.py # Triage Agent (Stage 2)
βββ planner_agent.py # Planner Agent (Stage 3)
βββ action_agent.py # Action Agent + validator (Stage 4)
Agentic Apocalypse β Built for the Meta PyTorch OpenEnv Hackathon by Scaler.