Fire Risk Prevention and Response Optimization Engine City of San Francisco Fire Department Operations
Built on the Databricks Lakehouse Platform | April 2026
Pyxis is a production grade data intelligence platform that transforms 7.2 million raw municipal records from the San Francisco Open Data Portal into actionable fire risk analytics. The platform ingests five distinct city datasets (fire incidents, 911 calls, fire violations, building inspections, and building permits), resolves them into a unified property registry of 210,359 buildings, and scores every property in the city on a 0 to 100 risk scale using a hybrid Explainable AI ensemble.
The system directly addresses four critical operational needs of the San Francisco Fire Department:
- Risk Prediction: Identifying which specific properties and neighborhoods face the highest fire risk before incidents occur
- Response Optimization: Quantifying station level NFPA 1710 compliance and recommending resource reallocation based on geospatial risk density
- Compliance Tracking: Measuring violation follow up effectiveness and exposing "dark properties" that have slipped through every inspection queue
- Operational Intelligence: Delivering pre computed, decision ready metrics through a real time geospatial dashboard
For complete technical details, refer to the Technical Documentation.
Pyxis implements the Databricks Medallion Architecture pattern with three data layers:
San Francisco Open Data Portal
(5 Datasets | 7.2M Records)
|
v
+-------------------------------------------------------+
| BRONZE LAYER |
| Raw CSV ingestion with metadata tagging |
| 5 Delta tables preserving original data fidelity |
+-------------------------------------------------------+
|
v
+-------------------------------------------------------+
| SILVER LAYER |
| Entity resolution across 5 datasets via H3 indexing |
| Feature engineering: per property + per hexagon |
| Response time computation per station |
| 210,359 unified property entities |
+-------------------------------------------------------+
|
v
+-------------------------------------------------------+
| GOLD LAYER |
| 8 business ready analytical tables |
| Heuristic risk scoring (0 to 100, 6 components) |
| ML ensemble scoring (sklearn GBT + SHAP) |
| Crisis zone mapping, dark property flagging |
| NFPA compliance, fairness analysis, drift monitoring |
+-------------------------------------------------------+
|
v
+-------------------------------------------------------+
| PRESENTATION LAYER |
| JSON export via Databricks Volumes |
| React/Vite geospatial dashboard |
+-------------------------------------------------------+
The pipeline consists of 13 Databricks notebooks executed in strict dependency order:
| Stage | Notebook | Purpose | Output |
|---|---|---|---|
| Bronze | 01_bronze_ingest | Raw CSV to Delta conversion | 5 Bronze tables |
| Silver | 02_silver_entity_resolution | Cross dataset entity resolution with H3 spatial indexing | address_entity_master |
| Silver | 03_silver_property_and_h3_features | Per property and per hexagon feature engineering | property_features, h3_features |
| Silver | 04_silver_response_performance | Station level response time metrics | response_performance |
| Gold | 05_gold_property_risk_twin | Heuristic risk scoring with explainable factor decomposition | property_risk_twin |
| Gold | 05b_gold_ml_risk_model | ML model training, SHAP analysis, and ensemble scoring | property_risk_twin (updated) |
| Gold | 06_gold_h3_risk_surface | H3 hexagonal risk aggregation and crisis zone flagging | h3_risk_surface |
| Gold | 07_gold_dark_property_discovery | Dark property extraction and ranking | dark_property_discovery |
| Gold | 08_gold_nfpa_response_compliance | Per station NFPA 1710 compliance with risk exposure | nfpa_response_compliance |
| Gold | 09_gold_compliance_funnel | Violation resolution tracking by district | compliance_funnel |
| Gold | 10_gold_fairness_coverage | Inspection equity analysis across districts | fairness_coverage |
| Gold | 11_gold_model_health_drift | Feature distribution drift monitoring via PSI | model_health_drift |
| Export | 99_export_gold_to_json | Hybrid sampled JSON export for frontend | 8 JSON files |
Pyxis uses a deliberate two layer scoring architecture designed for legal auditability in government operations.
Layer 1: Transparent Heuristic (90% of final score)
Every property receives a deterministic score from 0 to 100 based on six verifiable components:
| Component | Max Points | Signal |
|---|---|---|
| Violations | 30 | Open and severe fire code violations on the property |
| Incidents | 25 | Structure fires and emergency incidents in the past 3 years |
| Inspections | 25 | Time elapsed since last inspection (exponential decay function) |
| Call Frequency | 10 | 911 call volume in the surrounding area over 12 months |
| Permit Risk | 10 | Building permit type indicating high risk occupancy |
| Dark Penalty | 8 | Property has never been inspected despite active records |
Each score is accompanied by three human readable explanation lines (e.g., "21 open violations unresolved", "1,235 days since last inspection") that provide legal justification for inspection prioritization.
Layer 2: Machine Learning Refinement (10% of final score)
An sklearn Gradient Boosted Tree classifier trained on 210,000 properties predicts structure fire probability using 8 non leaking features. SHAP TreeExplainer provides per property feature attribution, showing exactly which factors the model weighted most heavily.
Ensemble Formula: ensemble_risk_score = 0.9 * heuristic_score + 0.1 * ml_probability * 100
During development, we tested a 60/40 ensemble weighting. The ML model produced near universal high fire probabilities, reclassifying 83.3% of all properties as CRITICAL (compared to 1.6% under the pure heuristic). This is the exact false positive saturation phenomenon documented in the failure of LAPD PredPol (scrapped 2020) and NYPD COMPAS predictive systems. The ML model was detecting "data density" (properties with more city paperwork) rather than genuine fire physics.
We deliberately reweighted to 90/10, positioning the ML as a precision tie breaker rather than a primary classifier. This reduced CRITICAL to 8.6% while the ML still meaningfully up tiered 293 borderline properties from HIGH to CRITICAL by detecting non linear interaction patterns invisible to the static heuristic.
| Tier | Heuristic Only | 60/40 Ensemble (Rejected) | 90/10 Ensemble (Final) |
|---|---|---|---|
| CRITICAL | 3,408 (1.6%) | 175,209 (83.3%) | 18,065 (8.6%) |
| HIGH | 185,307 (88.1%) | 32,169 (15.3%) | 171,912 (81.7%) |
| MEDIUM | 21,240 (10.1%) | 2,279 (1.1%) | 20,114 (9.6%) |
| LOW | 404 (0.2%) | 702 (0.3%) | 268 (0.1%) |
The Gold layer contains 8 optimized Delta tables, each designed to power a specific operational decision:
Decision it drives: Which building should an inspector visit today, and why?
One row per property in San Francisco (210,359 total). Contains the heuristic score, ML fire probability, ensemble score, three heuristic explanation lines, three SHAP driver attributions, dark property flags, and a recommended action string. This is the primary table powering the geospatial dashboard.
Decision it drives: Where should idle fire trucks pre position for fastest crisis response?
One row per H3 resolution 9 hexagon (~175 meter cells, ~1,200 total). Aggregates property risk scores, critical property counts, dark property density, and NFPA compliance rates per hexagon. Cells meeting multi factor crisis thresholds are flagged as Crisis Zones.
Decision it drives: Which buildings are invisible to the inspection system?
Extracts and ranks properties that appear in city records (permits, violations, 911 calls) but have zero recorded fire inspections. Four classification types identify the specific bureaucratic failure mode (e.g., "Active permit, zero inspections" or "12 open violations, no follow up").
Decision it drives: Which stations need more resources to meet their response time mandate?
Per station analysis of NFPA 1710 compliance (5 minute 20 second response target for 90% of calls). Cross references with the count of high risk properties in each station's coverage area to quantify the gap between risk exposure and response capability.
Decision it drives: Are we actually resolving the violations we cite?
Per district tracking of violation filing to closure pipeline. Computes closure rates and identifies districts with systemic backlogs of unresolved violations.
Decision it drives: Are inspection resources allocated proportionally to risk?
Compares the actual number of inspections each district receives against the inspection volume it should receive based on its share of citywide risk. Districts are flagged as UNDER_SERVED or OVER_SERVED with a coverage gap percentage.
Decision it drives: Is our scoring model still accurate, or has the underlying data shifted?
Monitors the Population Stability Index (PSI) for all key features in the scoring model. A PSI above 0.25 indicates significant distribution drift, signaling that model retraining is needed. All features currently show PSI of 0.01 (STABLE).
Decision it drives: What are the headline numbers for a Fire Chief's daily briefing?
Pre computed summary statistics: total properties, critical count, dark property count, crisis zone count, average risk score, overall NFPA compliance rate, and station failure counts.
| Layer | Technology | Purpose |
|---|---|---|
| Data Governance | Databricks Unity Catalog | Schema management, access control, data lineage |
| Storage Format | Delta Lake | ACID transactions, time travel, schema enforcement |
| Compute | Databricks Serverless | Elastic, zero management notebook execution |
| Processing | PySpark SQL | Distributed transformation of 7.2M records |
| Spatial Indexing | Uber H3 (Databricks native) | Hexagonal geospatial indexing at ~175m resolution |
| ML Training | sklearn GradientBoostingClassifier | Gradient Boosted Trees on driver node (210K rows) |
| ML Explainability | SHAP TreeExplainer | Per property Shapley value attribution |
| Data Export | Databricks Volumes | Governed file storage for JSON frontend payload |
| Frontend | React + Vite + TypeScript | Interactive geospatial dashboard |
| Version Control | GitHub via Databricks Repos | CI/CD integration and reproducibility |
Based on analysis of 210,359 properties across 5 municipal datasets:
- 18,065 properties (8.6%) are classified as CRITICAL risk under the ensemble model
- 15 Crisis Zones identified where high risk density, dark properties, and slow response times converge
- District 10 is the most severely underserved district, receiving 75.9% fewer inspections than its risk volume warrants
- 293 properties were up tiered from HIGH to CRITICAL by the ML model, catching non linear risk patterns the heuristic alone would miss
- All monitored features show PSI of 0.01 (STABLE), confirming the scoring model is not experiencing data drift
- Databricks workspace with Unity Catalog enabled
- Serverless compute access
- Raw CSV data files from the SF Open Data Portal placed in
/Volumes/sf_fire_prod/bronze/raw_data/ - Node.js 18+ for the frontend
- Clone the repository into Databricks Repos
- Upload raw CSV files to the Bronze volume
- Execute notebooks 01 through 99 in order (see pipeline flow table above)
- Download exported JSON files from
/Volumes/sf_fire_prod/gold/exports/
cd frontend/MIIT
npm install
npm install -g tsx
npm run devPlace the 8 exported JSON files in frontend/MIIT/public/data/ before launching the development server.
The prototype architecture is designed for direct graduation to production on the Databricks platform:
| Aspect | Prototype (Current) | Production |
|---|---|---|
| Data Ingestion | Manual CSV upload | Databricks Auto Loader with Socrata API integration |
| Processing Mode | Full overwrite | Incremental Delta Lake MERGE (upsert) |
| Orchestration | Manual notebook execution | Databricks Workflows DAG with dependency chaining |
| ML Lifecycle | Train and score in one notebook | MLflow Model Registry with human approval gates |
| Drift Detection | PSI printed to console | Automated Slack/PagerDuty alerts on PSI threshold breach |
| Serving Layer | Static JSON files | Databricks SQL Endpoint or Delta Sharing |
| Pipeline Runtime | 35 to 55 minutes (full rebuild) | ~90 seconds (incremental) |
| Monthly Cost | $0 (hackathon credits) | ~$195/month (Serverless auto suspend) |
For the complete production architecture including DAG design, MLflow integration, logging and observability plans, and cost analysis, see Part III of the Technical Documentation.
HackBricks_P3/
README.md This file
TECHNICAL_DOCUMENTATION.md Comprehensive technical documentation (3 viewpoints)
context.md Project context and design decisions
notebooks/
01_bronze_ingest.py Raw CSV to Delta ingestion
02_silver_entity_resolution.py Cross dataset entity resolution + H3 indexing
03_silver_property_and_h3_features.py Feature engineering
04_silver_response_performance.py Response time metrics
05_gold_property_risk_twin.py Heuristic risk scoring
05b_gold_ml_risk_model.py ML model + SHAP + ensemble scoring
06_gold_h3_risk_surface.py H3 hexagonal risk aggregation
07_gold_dark_property_discovery.py Dark property extraction
08_gold_nfpa_response_compliance.py NFPA compliance analysis
09_gold_compliance_funnel.py Violation resolution tracking
10_gold_fairness_coverage.py Inspection equity analysis
11_gold_model_health_drift.py Feature drift monitoring
99_export_gold_to_json.py JSON export for frontend
frontend/
MIIT/
public/data/ JSON data files for the dashboard
src/ React/Vite application source code
server.ts Development server
vite.config.ts Build configuration
| Document | Audience | Contents |
|---|---|---|
| Technical Documentation: Part I | IT/Tech Team | System architecture, pipeline execution, troubleshooting, schema reference |
| Technical Documentation: Part II | Firefighters and Inspectors | Risk tier explanations, dark property guide, dashboard usage |
| Technical Documentation: Part III | Engineers | Notebook deep dives, ML model details, tradeoffs, production architecture, logging plans |
| Decision | What We Chose | Why |
|---|---|---|
| Entity resolution | H3 + address grouping | Deterministic, fast, sufficient for macro analysis |
| Spatial indexing | H3 Resolution 9 (~175m) | Balances property granularity with statistical stability |
| ML framework | sklearn on driver | Enables SHAP TreeExplainer (incompatible with Spark MLlib) |
| Ensemble weight | 90% heuristic / 10% ML | Prevents false positive saturation observed at 60/40 |
| JSON sampling | Top 500 + 2% random | Preserves tier proportions while keeping payload manageable |
| Timestamp parsing | try_to_timestamp | Required for ANSI SQL enforcement on Serverless |
| Dark property logic | Rule based (4 types) | Explainability is paramount in government operations |
Team QTπs | HackBricks 2026
This project was developed as part of the HackBricks 2026 hackathon. All data sourced from the San Francisco Open Data Portal under open data license terms.