Winner (1st Place) — NASA International Space Apps Challenge 2025 (Ghaziabad Local Hackathon) 🏆
ASTREA is an AI-powered research engine designed to help scientists and researchers explore large-scale space biology literature efficiently. It combines semantic search, knowledge graph reasoning, and LLM-powered synthesis to uncover hidden relationships across NASA bioscience studies.
Built by Team Andromeda during the NASA Space Apps Challenge 2025.
NASA’s space biology research produces vast amounts of unstructured and semi-structured data—research papers, experimental reports, tables, and images. However:
- Relevant insights are scattered across thousands of documents
- Cross-paper relationships are difficult to identify
- Traditional keyword search fails to capture semantic meaning
As a result, valuable connections remain hidden, slowing scientific discovery.
ASTREA introduces a hybrid retrieval and reasoning system that:
-
Scrapes and ingests bioscience documents
-
Extracts structured knowledge (entities & relationships)
-
Stores information in both:
- a Vector Database (semantic similarity)
- a Knowledge Graph (explicit relationships)
-
Answers user queries by reasoning over both representations
This allows ASTREA to go beyond simple document retrieval and generate context-aware, insight-driven responses.
-
Document Ingestion
- Web scraping (HTML pages)
- PDF document loading
-
Chunking & Preprocessing
- Intelligent text chunking with overlap
- Metadata enrichment (source, timestamp, type)
-
Dual Storage Pipeline
- Vector DB (Pinecone) Stores embeddings for semantic retrieval
- Knowledge Graph (Neo4j) Stores extracted entities and relationships as graph triples
-
Query Processing
- User query is embedded and matched against vector DB
- Related entities and edges fetched from Neo4j
-
LLM Reasoning Layer
- Retrieved chunks + graph context are combined
- LLM generates a grounded, reasoned response
This hybrid approach enables both semantic similarity and relational reasoning.
- 🔍 Semantic Search across 600+ NASA bioscience studies
- 🧠 Hybrid RAG Architecture (Vector + Knowledge Graph)
- 🕸️ Entity & Relationship Extraction using LLMs
- 📊 Neo4j Knowledge Graph for relational exploration
- 📦 Pinecone Vector Store for scalable similarity search
- 🧩 Intelligent Chunking for high-quality retrieval
- 📄 PDF + Web Content Support
- 🚀 Multiple Interfaces: REST API, CLI, and programmatic usage
- Backend: Node.js, Express.js
- Web Scraping: Axios, Cheerio
- LLMs & Embeddings: OpenAI / Gemini (configurable)
- Vector Database: Pinecone
- Knowledge Graph: Neo4j
- Document Processing: Custom chunking pipeline
- APIs: RESTful endpoints
Create a .env file in the root directory:
GEMINI_API_KEY=your_api_key
OPENAI_API_KEY=your_api_key
PINECONE_INDEX_NAME=your_index_name
NEO4J_URI=your_neo4j_uri
NEO4J_USERNAME=your_username
NEO4J_PASSWORD=your_password
PORT=8080npm installnpm start
# or
npm run devPOST /scrape/url{
"url": "https://example.com/research-paper"
}POST /scrape/urls{
"urls": ["https://site1.com", "https://site2.com"]
}POST /chat{
"question": "How does microgravity affect gene expression in mice?"
}npm run scraper-cliThe CLI supports:
- Single / multiple URL ingestion
- Progress tracking
- Error diagnostics
Query → Retrieve Documents → LLM → Answer
Query
↓
Vector Retrieval (Pinecone)
+
Graph Traversal (Neo4j)
↓
Reasoning Layer
↓
LLM
↓
Insightful Answer
This allows ASTREA to:
- Infer relationships
- Connect experiments across papers
- Provide more grounded, explainable answers
- 🥇 Winner (1st Place) — NASA International Space Apps Challenge 2025 Ghaziabad Local Hackathon
- Built by Team Andromeda
Team Members:
- Anchita Jain
- Ankit Gupta
- Ashutosh Kumar Singh
- Ananya Patel
- Akhand Pratap Singh
- Lakshya Pratap Singh
- 📽️ Demo Video: https://drive.google.com/file/d/1sU4GM5Dd47xY-x1pKavkodyNyzcAPI7h/view
- 📄 Pitch Deck & Architecture: Included in repository
This project is licensed under the ISC License.
Special thanks to:
- NASA Space Apps Challenge
- Innogeeks & KIET Group of Institutions
- Judges and mentors for valuable feedback and guidance