A powerful, completely local RAG (Retrieval-Augmented Generation) system with advanced technical document processing that answers questions without requiring any external APIs or internet connection. Perfect for handling sensitive technical documents while maintaining privacy and da## π Repository Structure
LocalRAG/
βββ π README.md # Complete documentation (this file)
βββ π LICENSE # MIT License
βββ π doc-qa-local.py # β¨ Main application script
βββ π§ enhanced_processor.py # β¨ Technical document processing module
βββ π local-requirements.txt # Python dependencies
βββ π§ setup.sh # Automated setup script
βββ π local_chroma_db/ # Vector database storage (created after processing)
βββ π venv/ # Python virtual environment (created during setup)
βββ π test_document.txt # Sample test document
βββ π .gitignore # Git ignore patterns
doc-qa-local.py - Main application with CLI interface
- Document processing with enhanced chunking
- Multiple answer generation modes
- Interactive Q&A sessions
- Extensive command-line options
enhanced_processor.py - Advanced document processing
- Technical document pattern recognition
- Smart chunking with section awareness
- Enhanced vector search with relevance scoring
- Comprehensive answer generation
local-requirements.txt - Python dependencies
- sentence-transformers (embeddings)
- chromadb (vector database)
- transformers (Q&A models)
- torch (PyTorch backend)
- PyMuPDF (PDF processing)
- pytesseract (OCR)
- Additional supporting libraries
| Feature | LocalRAG | Cloud RAG |
|---|---|---|
| Privacy | β 100% Local | β Data sent to cloud |
| Cost | β Free after setup | π° Per-query costs |
| Internet | β Works offline | β Requires internet |
| Setup | β‘ Quick local setup | π§ API keys needed |
| Speed | β‘ Fast local inference | π Network latency |
| Customization | β Full control | β Limited options |
| Data Security | β Your machine only | β Third-party servers |
# Process a large technical manual
python doc-qa-local.py process
--document large_manual.pdf
--chunk-size 1500
--chunk-overlap 300
--cpu
# Query with comprehensive analysis
python doc-qa-local.py query --cpu --comprehensive# Process multiple documents to the same database
python doc-qa-local.py process --document doc1.pdf --db-path shared_db --cpu
python doc-qa-local.py process --document doc2.pdf --db-path shared_db --cpu
python doc-qa-local.py query --db-path shared_db --cpu# Use different embedding models for better semantic understanding
python doc-qa-local.py process
--document document.pdf
--embedding-model "sentence-transformers/all-mpnet-base-v2"
--cpu- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes and test thoroughly
- Ensure all dependencies are in
local-requirements.txt - Update documentation if needed
- Submit a pull request with detailed description
MIT License - feel free to use in your projects!
- β Fully Tested Setup: Complete installation and usage validation
- β Enhanced Technical Processing: Specialized handling for technical documents
- β Multiple Answer Modes: Standard, comprehensive, and simple modes
- β Improved Documentation: Comprehensive setup and troubleshooting guides
- β macOS Compatibility: Verified working on Apple Silicon with proper CPU flags
- β Robust Error Handling: Better handling of GPU/CUDA issues and memory constraints
Ready to get started? The system is fully tested and ready to use! Simply follow the installation steps above and start processing your documents.
Need help? Check the troubleshooting section above or open an issue for support.
The LocalRAG system has been fully tested and validated with:
- β Complete Environment Setup: Virtual environment with all dependencies installed
- β Document Processing Verified: Successfully processed test documents with enhanced chunking
- β Q&A System Operational: Interactive query system running with transformer models loaded
- β Technical Document Support: Enhanced processing for automotive/technical manuals
- β Multiple Answer Modes: Tested standard, comprehensive, and transformer-based responses
- π 100% Local Processing: No external APIs, no data leaves your machine
- π Advanced OCR Support: Handles rotated and scanned PDFs with automatic text extraction
- π§ Multiple AI Models: Choose from transformer models, Ollama, comprehensive analysis, or simple extraction
- π Enhanced Smart Chunking: Technical document-aware splitting with section preservation
- β‘ Fast Vector Search: ChromaDB with enhanced relevance scoring for technical content
- π― Interactive Q&A: Terminal-based question-answering interface with multiple response modes
- π Source Attribution: Shows which document sections support each answer
- π§ Confidence Scoring: Get confidence levels and model information for answers
- π¬ Technical Document Processing: Specialized handling for technical manuals, specifications, and engineering documents
- π Comprehensive Answer Generation: Detailed, structured responses with signal lists and technical analysis
- π Multiple Answer Modes: Standard, comprehensive, and simple modes for different use cases
- π Context-Aware Search: Enhanced vector search that understands technical document structure
- β Command-Line Flexibility: Extensive options for customizing processing and answer generation
-
Clone and navigate to the repository
git clone https://github.com/shreyasren/LocalRAG.git cd LocalRAG -
Run the automated setup script:
chmod +x setup.sh ./setup.sh
-
Create virtual environment:
python3 -m venv venv source venv/bin/activate # On macOS/Linux # or venv\Scripts\activate # On Windows
-
Install dependencies:
pip install --upgrade pip pip install -r local-requirements.txt
-
Install system dependencies (macOS):
brew install tesseract poppler
Ubuntu/Debian:
sudo apt-get install tesseract-ocr poppler-utils
Windows:
- Install Tesseract: https://github.com/UB-Mannheim/tesseract/wiki
- Install Poppler: http://blog.alivate.com.au/poppler-windows/
- Use
--cpuflag: Required on macOS to avoid CUDA errors - PyMuPDF Installation: Pre-compiled wheels work best; avoid compilation from source
- Memory Requirements: 4-8GB RAM recommended for processing large documents
First, vectorize your document (one-time setup per document):
Basic Processing (Standard Chunking):
python doc-qa-local.py process --document your_document.pdf --cpu --no-enhancedEnhanced Processing for Technical Documents (Recommended):
python doc-qa-local.py process --document technical_manual.pdf --cpuProcessing Options:
--cpu: Required on macOS to avoid GPU/CUDA issues--chunk-size 1000: Text chunk size in characters (default: 1000)--chunk-overlap 200: Overlap between chunks (default: 200)--db-path ./local_chroma_db: Database location (default: ./local_chroma_db)--embedding-model all-MiniLM-L6-v2: Embedding model (default: all-MiniLM-L6-v2)--no-ocr: Disable OCR for faster processing (skip if document is rotated)--no-enhanced: Disable enhanced technical document processing (use basic chunking)
π Key Differences:
- Enhanced Processing (Default): Uses smart chunking optimized for technical documents, better signal/component recognition
- Basic Processing (--no-enhanced): Uses standard text chunking, faster but less optimized for technical content
- Enhanced processing is recommended for technical manuals, specifications, and complex documents
β Example (Successfully Tested):
python doc-qa-local.py process --document test_document.txt --cpu
# Output shows:
# β
Document processed successfully!
# You can now run queries with: python doc-qa-local.py query --cpuAfter processing, start the interactive Q&A session:
Standard Mode (Transformer Models - Tested Working):
python doc-qa-local.py query --cpuComprehensive Mode (Detailed Technical Responses):
python doc-qa-local.py query --cpu --comprehensiveBasic Mode (Simple Extraction):
python doc-qa-local.py query --llm simple --cpuβ Example Query Session:
============================================================
Local Document Q&A System
============================================================
Type your questions below. Type 'exit' or 'quit' to end.
π Your Question: What is machine learning?
π ANSWER:
Machine learning (ML) is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence (AI) based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.
[High confidence answer from RoBERTa]
python doc-qa-local.py query --cpu- Method: Transformer models (DistilBERT, RoBERTa) with auto-enhancement
- Quality: High-quality extractive answers with automatic supplementation for short responses
- Speed: Fast inference on CPU/GPU
- Best For: General questions and quick technical lookups
python doc-qa-local.py query --cpu --comprehensive- Method: Advanced multi-chunk analysis with structured response generation
- Quality: Detailed, structured responses with signal lists, technical components, and section-based analysis
- Features:
- Extracts specific signal names and technical terms
- Groups information by document sections
- Creates bullet-point lists for complex queries
- Provides comprehensive technical analysis
- Best For: Technical documents, signal analysis, complex engineering questions
# First install Ollama
brew install ollama
ollama pull llama3.2:3b # or llama3.2:7b for better quality
# Then use
python doc-qa-local.py query --llm ollama --cpu- Quality: Excellent, human-like responses
- Models: Llama 3.2, Mistral, Phi-3, and more
- Requires: Ollama installation and model download
python doc-qa-local.py query --llm simple --cpu- Speed: Fastest option
- Quality: Basic keyword matching
- Use Case: Quick searches or resource-constrained systems
The LocalRAG system includes specialized enhancements for technical documents like manuals, specifications, and engineering documentation:
- Section-Aware Splitting: Preserves technical sections and subsections
- Header Recognition: Identifies and maintains document structure
- Technical Pattern Detection: Recognizes signal names, commands, and technical terminology
- Context Preservation: Keeps section titles with content for better understanding
- Context Expansion: Groups related technical information from the same section
- Technical Relevance Scoring: Prioritizes chunks with technical indicators over headers/metadata
- Multi-Chunk Analysis: Combines information from multiple relevant document sections
- Header Penalty: Reduces scoring for header-heavy content to focus on technical details
1. Standard Mode (Default - Tested Working)
- Method: Transformer models (DistilBERT, RoBERTa) with auto-enhancement
- Quality: High-quality extractive answers with automatic supplementation for short responses
- Speed: Fast inference on CPU/GPU
- Best For: General questions and quick technical lookups
2. Comprehensive Mode (Technical Documents)
- Method: Advanced multi-chunk analysis with structured response generation
- Features:
- Extracts specific signal names and technical terms
- Groups information by document sections
- Creates bullet-point lists for complex queries
- Provides comprehensive technical analysis
- Best For: Technical documents, signal analysis, complex engineering questions
3. Ollama Integration (Optional)
- Quality: Excellent, human-like responses
- Models: Llama 3.2, Mistral, Phi-3, and more
- Requires: Ollama installation and model download
4. Simple Extraction (Fastest)
- Speed: Fastest option
- Quality: Basic keyword matching
- Use Case: Quick searches or resource-constrained systems
Document Input (PDF/TXT/DOCX)
β
OCR Processing (if needed)
β
Enhanced/Basic Chunking
β
Embedding Generation (Sentence Transformers)
β
Vector Database Storage (ChromaDB)
β
Query Processing Pipeline:
User Question β Question Embedding β Enhanced Vector Search β Context Creation β Answer Generation β Response with Sources
# 1. Process a document with OCR support
python doc-qa-local.py process --document bcm.pdf --cpu
# 2. Start Q&A with standard transformer models
python doc-qa-local.py query --cpu
# Example interaction:
π Your Question: What are the main components of BCM?
π ANSWER:
Body Control Module (BCM) includes lighting control, power management, and vehicle communication systems.
[High confidence answer from RoBERTa]# 1. Process with enhanced technical features
python doc-qa-local.py process --document technical_spec.pdf --cpu
# 2. Use comprehensive mode for detailed analysis
python doc-qa-local.py query --cpu --comprehensive
# Example interaction:
π Your Question: list all control signals for the system
π ANSWER:
**System Control Signals:**
β’ [Detailed signal list with technical context]
β’ [Component relationships and dependencies]
β’ [Functional descriptions and usage patterns]For better semantic search, try different embedding models:
python doc-qa-local.py process \
--document document.pdf \
--embedding-model "sentence-transformers/all-mpnet-base-v2"If you have a CUDA-compatible GPU:
python doc-qa-local.py query # Remove --cpu flag for GPU accelerationFor simpler documents or faster processing:
python doc-qa-local.py process --document simple_doc.pdf --no-enhanced --cpu
python doc-qa-local.py query --no-enhanced --cpu# Process multiple documents to the same database
python doc-qa-local.py process --document doc1.pdf --db-path shared_db
python doc-qa-local.py process --document doc2.pdf --db-path shared_db
python doc-qa-local.py query --db-path shared_db --llm transformersThe system provides detailed feedback on answer quality:
- High Confidence (>0.6):
[High confidence answer from RoBERTa] - Moderate Confidence (0.3-0.6):
[Answer from DistilBERT with moderate confidence: 0.45] - Low Confidence (<0.3):
[Note: Low confidence answer. Consider asking a more specific question.]
π Vectorization Pipeline/
βββ π README.md # This file - comprehensive documentation
βββ π doc-qa-local.py # Main application - process & query documents
βββ π local-requirements.txt # Python dependencies
βββ π setup.sh # Automated setup script (cross-platform)
βββ π local-setup-guide.md # Detailed manual setup instructions
βββ π local_chroma_db/ # Vector database (created after processing)
βββ οΏ½ .venv/ # Python virtual environment
βββ π bcm.pdf # Example document (Body Control Module spec)
- Chunk Size: Use 1500-2000 characters for better context
- Processing Time: 30-60 minutes for initial vectorization
- Memory: Requires 4-8GB RAM for large documents
- Storage: ~1-3GB database size for very large documents
# Disable OCR for faster processing (if document isn't rotated)
python doc-qa-local.py process --document doc.pdf --no-ocr
# Use simple extraction for fastest queries
python doc-qa-local.py query --llm simple- No Internet Required: All processing happens locally - successfully tested offline
- No Data Transmission: Documents never leave your machine
- No API Keys: No external service dependencies
- Secure Storage: Local database with no external access
- Portable: Can run on air-gapped systems
Problem: AssertionError: Torch not compiled with CUDA enabled
Solution: Always use the --cpu flag on macOS
python doc-qa-local.py process --document file.pdf --cpu
python doc-qa-local.py query --cpuProblem: Compilation errors with climits header Solution: Use pre-compiled wheels
pip install PyMuPDF --no-deps # Uses pre-compiled wheelProblem: System runs out of RAM during processing Solutions:
# Use smaller chunks
python doc-qa-local.py process --document doc.pdf --chunk-size 500 --cpu
# Force CPU usage
python doc-qa-local.py query --cpu
# Disable enhanced processing for large documents
python doc-qa-local.py process --document doc.pdf --no-enhanced --cpuProblem: Tesseract not found Solutions:
# macOS
brew install tesseract poppler
# Ubuntu/Debian
sudo apt-get install tesseract-ocr poppler-utils
# Or disable OCR
python doc-qa-local.py process --document doc.pdf --no-ocr --cpuProblem: Getting empty or low-confidence answers Solutions:
# Use comprehensive mode for technical documents
python doc-qa-local.py query --cpu --comprehensive
# Try different chunk settings
python doc-qa-local.py process --document doc.pdf --chunk-size 1500 --chunk-overlap 400 --cpu
# Ensure enhanced processing is enabled (default)
python doc-qa-local.py process --document doc.pdf --cpu # (enhanced by default)Problem: EOF errors or input issues Solution: Run query mode directly without piping input
# Don't use: echo "question" | python doc-qa-local.py query
# Instead use: python doc-qa-local.py query --cpu- Small documents (1-10 pages): 30 seconds - 2 minutes
- Medium documents (100-500 pages): 5-15 minutes
- Large documents (1000+ pages): 30-60 minutes
- Memory usage: 2-8GB RAM depending on document size
- Search time: 1-3 seconds for vector search
- Answer generation: 2-10 seconds depending on mode
- Memory: 2-4GB RAM during queries
- Python: 3.13.5
- Processing: CPU-only with --cpu flag
- Models: DistilBERT + RoBERTa for Q&A, all-MiniLM-L6-v2 for embeddings
- Status: β Fully functional and tested
| Feature | This System | Cloud RAG |
|---|---|---|
| Privacy | β 100% Local | β Data sent to cloud |
| Cost | β Free after setup | π° Per-query costs |
| Internet | β Works offline | β Requires internet |
| Setup | β‘ Quick local setup | π§ API keys needed |
| Speed | β‘ Fast local inference | π Network latency |
- Fork the repository
- Create a feature branch
- Make your changes
- Test with different document types
- Submit a pull request
MIT License - feel free to use in your projects!
Need help? Check the local-setup-guide.md for detailed setup instructions or open an issue for support.