A small web app that lets a user save notes or URLs and ask questions over that content using a simple RAG (Retrieval Augmented Generation) setup.
currnetly: Single user, no auth, runs locally.
- Save plain text notes
- Save URLs (content is fetched server-side)
- Chunk and embed saved content
- Ask questions over saved data
- Get an answer along with source snippets
- FastAPI
- SQLite
- sentence-transformers (local embeddings)
- requests + BeautifulSoup (URL content extraction)
- Vite
- React
- TypeScript
- Tailwind CSS
- Python 3.11+
cd back
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn app.main:app --reloadBackend runs at:
- Node.js 16+
- npm or yarn
cd front
npm install
npm run devFrontend runs at:
- Modern UI/UX: Clean, responsive design with Tailwind CSS
- Mode-Independent: Consistent appearance regardless of system theme
- Single Page Application: All functionality on one page with smooth navigation
- Real-time Updates: Instant feedback when adding content or querying
- Component-Based: Modular React components for maintainability
- React 18 with TypeScript for type safety
- Vite for fast development and building
- Tailwind CSS for utility-first styling
- React Icons for consistent iconography
- Axios for API communication
cd back
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn app.main:app --reloadcd front
npm install
npm run dev- Frontend: http://localhost:5173
- Backend API: http://127.0.0.1:8000
- API Docs: http://127.0.0.1:8000/docs
POST /ingest
Note:
{
"type": "note",
"content": "FastAPI is a modern Python web framework"
}URL:
{
"type": "url",
"source_url": "https://fastapi.tiangolo.com/"
}GET /items
POST /query
{
"question": "What is FastAPI?",
"top_k": 3
}Returns:
- answer text
- source chunks with similarity scores
- Content is split into chunks
- Each chunk is embedded using a local model
- Embeddings are stored in SQLite as JSON
- Queries are embedded the same way
- Cosine similarity is used to find relevant chunks
- Retrieved chunks are passed as context to the LLM
- Uses a local sentence-transformers model:
all-MiniLM-L6-v2 - No external API calls by default
- Model is downloaded once and cached locally
Model reference:
The project supports using a real LLM for answer generation via Google Gemini (free tier).
By default, the app runs with a stubbed LLM so it works without any external API keys. To enable Gemini locally:
-
Create a Gemini API key:
- https://ai.google.dev/
- Go to Google AI Studio → Get API key
-
Create a
.envfile in theback/directory:
USE_GEMINI=true
GEMINI_API_KEY=your_api_key_here- URLs are fetched server-side using requests and BeautifulSoup
- HTML content is extracted, cleaned, and processed
- Scripts and styles are removed during text extraction
- Extracted text follows the same chunk → embed → store pipeline
- SQLite chosen for simplicity and zero setup requirements
- Raw SQL used instead of ORM for explicit behavior control
- Similarity search uses full scan (suitable for current data scale)
For larger scale:
- ANN indexes or a vector database would be needed
- Background ingestion and better concurrency handling
- FastAPI: https://fastapi.tiangolo.com/
- SQLite: https://sqlite.org/
- Python sqlite3: https://docs.python.org/3/library/sqlite3.html
- Sentence-Transformers: https://www.sbert.net/
- RAG overview: https://www.pinecone.io/learn/retrieval-augmented-generation/