Full-stack ML system for intelligence analysis: Automatically extracts entities from documents, builds knowledge graphs, computes network analytics, and visualizes relationships through interactive dashboards.
Demonstrated with European defense tech ecosystem analysis to showcase capabilities relevant to intelligence operations, investigative research, and strategic analysis.
Intelligence analysts, investigators, and researchers process hundreds of documents daily (reports, news articles, briefings) with critical connections hidden across thousands of pages. Finding "who knows who," "which organizations are linked," or "what patterns exist" traditionally requires weeks of manual work.
This system automates the entire pipeline:
Documents โ Entity Extraction โ Relationship Detection โ Graph Storage โ Analytics โ Visualization
From weeks of manual analysis โ to instant pattern discovery.
- ๐ Automated Entity Extraction: Named Entity Recognition with spaCy (
en_core_web_lg) identifies 76 unique entities across people, organizations, and locations - ๐ธ๏ธ Intelligent Relationship Detection: 7 relationship types (WORKS_FOR, LOCATED_IN, AFFILIATED_WITH, COLLABORATED_WITH, etc.) with confidence scoring
- ๐ Advanced Graph Analytics:
- PageRank centrality (identifies key influencers)
- Louvain community detection (discovers clusters with 0.329 modularity)
- Betweenness centrality (finds bridge entities)
- Network metrics (density, clustering coefficients)
- ๐ Production REST API: 7 FastAPI endpoints with <100ms response time, full Swagger documentation
- ๐ Interactive Dashboard: React 18 + Material-UI with 4 analysis tabs, force-directed graph visualization (vis-network)
- ๐พ Scalable Storage: Neo4j graph database with optimized Cypher queries (currently 76 nodes, 64 edges)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ INPUT LAYER โ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โ
โ โ PDFs โ โ News โ โ Structured โ โ
โ โ Documents โ โ Articles โ โ Data โ โ
โ โโโโโโโโฌโโโโโโ โโโโโโโโฌโโโโโโ โโโโโโโโฌโโโโโโ โ
โโโโโโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ DOCUMENT PROCESSING (PyPDF2, pdfplumber) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ NLP PIPELINE (spaCy) โ
โ 1. Named Entity Recognition โ Person, Org, Location โ
โ 2. Entity Normalization โ Deduplicate & standardize โ
โ 3. Relationship Extraction โ Pattern matching + dependency โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ NEO4J GRAPH DATABASE โ
โ Nodes: Person, Organization, Location, Event โ
โ Edges: WORKS_FOR, LOCATED_IN, AFFILIATED_WITH โ
โ Properties: confidence_score, source, timestamp โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ANALYTICS ENGINE (NetworkX + python-louvain) โ
โ โข PageRank Centrality โข Community Detection โ
โ โข Betweenness โข Network Metrics โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FASTAPI REST API (7 endpoints) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ REACT DASHBOARD (Material-UI + vis-network) โ
โ 4 Tabs: Network Graph | PageRank | Communities | Metrics โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- Python 3.11+
- Node.js 18+
- Docker (for Neo4j)
1. Clone Repository
git clone https://github.com/Fredbcx/intelligence-knowledge-graph.git
cd intelligence-knowledge-graph2. Start Neo4j Database
docker run -d --name neo4j \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/password123 \
neo4j:5.15.0-community3. Backend Setup
cd backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Download spaCy model
python -m spacy download en_core_web_lg
# Create .env file
cp .env.example .env
# Edit .env with your Neo4j credentials
# Run API server
python analytics_app.py4. Frontend Setup
cd frontend
# Install dependencies
npm install
# Start development server
npm run dev5. Access Applications
- Dashboard: http://localhost:3000
- API Documentation: http://localhost:8000/docs
- Neo4j Browser: http://localhost:7474 (user: neo4j, password: password123)
The system is demonstrated with analysis of the European defense technology landscape, showcasing capabilities relevant to intelligence operations and strategic research.
Top Connected Entities (PageRank Centrality):
| Rank | Entity | Type | PageRank Score | Connections |
|---|---|---|---|---|
| #1 | ARX Robotics | Organization | 0.0814 | 12 |
| #2 | Norway | Location | 0.0692 | 4 |
| #3 | General Catalyst | Organization | 0.0519 | 12 |
| #4 | European Innovation Council | Organization | 0.0519 | 12 |
| #5 | Hummingbird Ventures | Organization | 0.0519 | 12 |
| #6 | Lakestar, Vsquared Ventures | Organization | 0.0519 | 12 |
| #7 | Lightspeed Venture Partners | Organization | 0.0519 | 12 |
| #8 | Helsing | Organization | 0.0270 | 2 |
Key Insights:
- ARX Robotics emerges as the central hub with highest PageRank (0.0814) and 12 direct connections, bridging multiple communities in the ecosystem
- Norway ranks #2 as a critical geographic node connecting Nordic defense entities
- Top-tier VC investors (General Catalyst, Hummingbird Ventures, Lakestar) show equal centrality (0.0519), indicating their distributed influence across the ecosystem
- Helsing, while prominent in defense AI, has fewer direct connections (2) but high strategic importance
Communities Detected (Louvain Algorithm, Modularity: 0.329):
The Louvain algorithm identified 3 distinct communities with modularity score of 0.329, indicating good community structure:
-
Community 24 (12 members): Technology and investment ecosystem
- Key entities: Quantum Systems, Venture Capital Report, General Catalyst, European Innovation Council, Hummingbird Ventures, Lakestar/Vsquared Ventures, Lightspeed Venture Partners
- Characteristics: VC firms and funding bodies clustered together
- Internal edges: 45 | External edges: 5
-
Community 3 (10 members): Munich defense tech cluster
- Key entities: ARX Robotics, Munich, DroneVision Technologies, Defense Tech Capital, Primoco UAV, European Investment Bank
- Characteristics: Geographic concentration around Munich with autonomous systems companies
- Internal edges: 10 | External edges: 5
-
Community 18 (5 members): Nordic military network
- Key entities: Norway, NATO, Norwegian Armed Forces, Altra, ICEYE
- Characteristics: Scandinavian military and allied organizations
- Internal edges: 4 | External edges: 0
Network Metrics:
- Total entities: 76 nodes extracted from documents
- Total relationships: 64 edges identified
- Communities: 3 distinct clusters
- Modularity: 0.329 (good separation between communities)
- Network density: 2.25% (sparse network indicating selective connections)
- Average clustering coefficient: 0.31
- Python 3.11 - Core language
- FastAPI 0.109 - REST API framework
- Uvicorn 0.27 - ASGI server
- spaCy 3.7 - NLP and Named Entity Recognition
- Neo4j 5.15 - Graph database
- py2neo 2021.2 - Neo4j Python driver
- NetworkX 3.2 - Graph analysis algorithms
- python-louvain 0.16 - Community detection
- Pydantic 2.5 - Data validation
- PyPDF2 3.0 - PDF processing
- React 18.2 - UI framework
- TypeScript 5.0 - Type safety
- Vite - Build tool
- Material-UI 5.14 - Component library
- vis-network 9.1 - Graph visualization
- Axios - HTTP client
- Neo4j Community 5.15 - Graph database
GET /api/v1/health - Health check
GET /api/v1/pagerank - PageRank centrality rankings
GET /api/v1/communities - Community detection results
GET /api/v1/betweenness - Betweenness centrality
GET /api/v1/graph-metrics - Overall network statistics
POST /api/v1/shortest-path - Find shortest path between entities
POST /api/v1/neighbors - Get N-hop neighbors of entity
Get PageRank Rankings:
curl http://localhost:8000/api/v1/pagerank | jqResponse:
{
"rankings": [
{
"entity": "ARX Robotics",
"pagerank": 0.0814,
"type": "Organization",
"connections": 12
},
{
"entity": "Norway",
"pagerank": 0.0692,
"type": "Location",
"connections": 4
}
]
}Find Shortest Path:
curl -X POST http://localhost:8000/api/v1/shortest-path \
-H "Content-Type: application/json" \
-d '{
"source": "ARX Robotics",
"target": "Helsing"
}' | jqGet Communities:
curl http://localhost:8000/api/v1/communities | jqGet Network Metrics:
curl http://localhost:8000/api/v1/graph-metrics | jqFull interactive documentation: http://localhost:8000/docs
- Map relationships between entities in security contexts
- Identify key influencers and information hubs
- Discover hidden connections across multiple data sources
- Track entity networks over time
- Uncover relationships in complex investigations
- Connect people, organizations, and events from documents
- Visualize information flows and collaboration networks
- Analyze competitive landscapes and partnership networks
- Identify market influencers and strategic positions
- Map investment flows and corporate relationships
- Process multi-source data (news, reports, social media)
- Build comprehensive entity relationship maps
- Automate pattern discovery in large document collections
cd backend
pytest tests/ -v
# Expected output:
# 8/8 tests passed
# All endpoints respond <100ms
# Graph analytics compute <2scd frontend
npm run build
# Verifies production build works
# Bundle size optimized- Entity Extraction: ~30 seconds per document (varies by length)
- Graph Analytics: <2 seconds for 76 nodes (scales to 10,000+ nodes)
- API Response Time: <100ms for all endpoints
- Graph Query: <500ms for path finding (depth 5)
- Dashboard Load: <1 second initial render
- Community Detection: <1 second for Louvain algorithm on current graph
intelligence-knowledge-graph/
โโโ README.md
โโโ LICENSE
โโโ SETUP.md
โโโ .gitignore
โโโ analytics_app.py # Main API entry point
โโโ requirements.txt
โโโ backend/
โ โโโ core/
โ โ โโโ document_processor.py # PDF/text extraction
โ โ โโโ nlp_processor.py # spaCy NER pipeline
โ โ โโโ graph_analytics.py # NetworkX algorithms
โ โโโ database/
โ โ โโโ neo4j_manager.py # Neo4j connection
โ โ โโโ graph_query_engine.py # Cypher queries
โ โโโ api/
โ โ โโโ analytics_api.py # FastAPI routes
โ โ โโโ analytics_models.py # Pydantic models
โ โโโ requirements.txt
โ โโโ .env.example
โ โโโ tests/
โ โโโ test_api_endpoints.py
โ โโโ test_nlp_extraction.py
โ โโโ test_processor.py
โโโ frontend/
โ โโโ src/
โ โ โโโ components/ # React components
โ โ โโโ services/ # API client
โ โ โโโ App.tsx # Main app
โ โโโ package.json
โ โโโ vite.config.js
โโโ data/
โ โโโ raw/ # Original documents
โ โโโ processed/ # Extracted entities
โ โโโ sample/ # Demo dataset
โโโ docs/
โโโ screenshots/ # Documentation images
- Real-time Processing: Kafka integration for streaming document ingestion
- Advanced NER: Fine-tune BERT models for domain-specific entities
- Temporal Analysis: Track graph evolution over time with temporal queries
- Multi-language: Support German, French for EU defense documents
- Docker Compose: One-command deployment with docker-compose up
- CI/CD: GitHub Actions for automated testing and deployment
- Entity Disambiguation: ML-based coreference resolution for complex entities
- Export Capabilities: PDF reports, graph visualizations, CSV/JSON exports
- Authentication: JWT-based API authentication for production
- Incremental Updates: Efficient graph updates without full rebuilds
This is a portfolio project, but suggestions and feedback are welcome! Feel free to:
- Open issues for bugs or feature requests
- Fork and experiment with your own data
- Share how you've adapted the system for other domains
MIT License - See LICENSE file for details.
- spaCy - Industrial-strength NLP library
- Neo4j - Graph database platform
- NetworkX - Python graph analysis library
- FastAPI - Modern Python web framework
- vis-network - Interactive graph visualization
- Material-UI - React component library
Built with: Python โข FastAPI โข React โข Neo4j โข spaCy โข NetworkX
Target Applications: Intelligence Analysis โข Defense AI โข Investigative Systems โข Business Intelligence