A Question-Answering chatbot specialized in Indian law, built using RAG (Retrieval-Augmented Generation) architecture with LangChain and Streamlit.
- Comprehensive Legal Knowledge: Works with any Indian law documents you provide (PDFs)
- RAG Architecture: Combines retrieval-based search with generative AI for accurate, context-aware responses
- Interactive Web Interface: User-friendly Streamlit-based chat interface
- Vector Database: Efficient document retrieval using FAISS vector store
- Specialized Legal Training: Custom prompt engineering for legal domain expertise
- Flexible Document Support: Add your own legal PDFs to customize the knowledge base
The chatbot uses a Retrieval-Augmented Generation (RAG) approach to answer questions based on legal documents you provide.
This project does not include pre-loaded legal documents due to copyright restrictions. You need to add your own legal PDFs to the law-library/dataset/ directory.
- The Constitution of India
- Judicial review and constitutional amendments
- Administrative law and governance documents
- The Indian Penal Code (IPC)
- Criminal Procedure Code (CrPC)
- Indian Evidence Act, 1872
- Companies Act
- Contract Act
- Partnership and business regulations
- Industrial Disputes Act
- Minimum Wages Act
- Employee State Insurance Act
- Shops and Establishments Act
- Information Technology Act (Cyber Laws)
- Banking Regulation Act
- Consumer Protection Act
- Intellectual Property laws
You can obtain legal documents from these free and legal sources:
-
Government Websites:
- India Code - Official government portal for all Indian laws
- Ministry of Law and Justice - Official legal resources
- Law Commission of India - Reports and recommendations
-
Public Domain Sources:
- Bare Acts from India Code
- State government legal websites
- Public legal databases and repositories
-
Academic Institutions:
- Law school libraries (open access materials)
- University legal research repositories
IMPORTANT:
- Only use documents you have the legal right to use
- Respect copyright and intellectual property laws
- Use only public domain or openly licensed legal texts
- For copyrighted materials, obtain proper permissions
💡 Note: Place all your legal PDF files in the
law-library/dataset/directory before running the ingestion script.
- Language Model: Google FLAN-T5-Base (free, open-access model)
- Framework: LangChain for RAG pipeline
- Vector Store: FAISS for efficient similarity search
- Embeddings: HuggingFace Sentence Transformers
- Frontend: Streamlit for web interface
- Document Processing: PyPDF processing for legal documents
law-library/
├── README.md
├── requirements.txt
└── law-library/
├── app.py # Streamlit web application
├── ingest.py # Document processing and vectorization
├── utils.py # Core RAG pipeline and utilities
├── dataset/ # Legal PDF documents (13 files)
└── vectorstore/ # FAISS vector database
├── index.faiss
└── index.pkl
Before you begin, ensure you have the following:
- ✅ Python 3.8+ installed on your system
- ✅ pip (Python package manager)
- ✅ 8GB+ RAM available
- ✅ 10GB+ free disk space for models and documents
- ✅ (Optional) CUDA-compatible GPU for faster inference
git clone https://github.com/bPavan16/law-Rag.git
cd law-Rag# Create a virtual environment
python -m venv .venv
# Activate the virtual environment
# On Linux/macOS:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate# Install all required packages
pip install -r requirements.txt
# Install additional LangChain packages (if needed)
pip install -U langchain-huggingface langchain-text-splitters accelerateIMPORTANT: This project does not include legal documents due to copyright restrictions.
-
Create the dataset directory (if it doesn't exist):
mkdir -p law-library/dataset
-
Download legal documents from authorized sources:
- Visit India Code for official Bare Acts
- Download PDFs of laws you want to query (e.g., IPC, Constitution, IT Act)
- Ensure documents are in PDF format
-
Place PDF files in the dataset directory:
# Copy your downloaded PDFs to the dataset folder cp /path/to/your/legal-pdfs/*.pdf law-library/dataset/
-
Verify your documents:
cd law-library ls -la dataset/ # On Linux/macOS dir dataset\ # On Windows
You should see your PDF files listed. The system supports any number of PDF files.
Ensure all 13 PDF files are in the law-library/dataset/ directory:
cd law-library
ls -la dataset/ # On Linux/macOS
dir dataset\ # On WindowsYou should see 13 PDF files listed (see Legal Document Coverage section).
Process all legal documents and create the FAISS vector store:
# Make sure you're in the law-library directory
cd law-library # if not already there
# Run the ingestion script
python ingest.pyExpected output:
Starting document embedding process...
================================================================================
Found X PDF files in dataset/:
1. your-legal-document-1.pdf
2. your-legal-document-2.pdf
... (and more)
================================================================================
Loading PDFs from dataset/...
✓ Loaded XXXX document pages
Splitting documents into chunks...
✓ Created XXXX chunks
Loading embedding model (sentence-transformers/all-MiniLM-L6-v2)...
✓ Embedding model loaded
Creating FAISS vector store (this may take several minutes)...
✓ FAISS vector store created
Saving vector store to vectorstore/...
================================================================================
✅ Vector store created and saved successfully!
- Total documents processed: X
- Total pages: XXXX
- Total chunks: XXXX
- Saved to: vectorstore/
================================================================================
⏱️ Note: Processing time varies based on the number and size of documents (typically 5-15 minutes).
# Start the Streamlit web interface
streamlit run app.pyOpen your web browser and navigate to:
http://localhost:8501
🎉 Success! You can now start asking questions about Indian law.
# Complete setup in one go
git clone https://github.com/bPavan16/law-Rag.git
cd law-Rag
python -m venv .venv
source .venv/bin/activate # On Linux/macOS
pip install -r requirements.txt
cd law-library
python ingest.py
streamlit run app.pyAsk questions about Indian law such as:
- "What are the fundamental rights guaranteed by the Indian Constitution?"
- "Explain the provisions of Section 420 of the Indian Penal Code"
- "What are the key features of the Companies Act?"
- "What are the cyber crime laws in India?"
- "Explain the concept of natural justice in Indian law"
The default configuration uses Google FLAN-T5-Base, a free and open-access model. You can modify the model in utils.py:
repo_id = 'google/flan-t5-base' # Change model hereOther recommended free models:
google/flan-t5-large- Larger version for better performancegoogle/flan-t5-small- Smaller, faster versionmistralai/Mistral-7B-v0.1- Open-source alternative
Adjust retrieval parameters in utils.py:
retriever=db.as_retriever(search_kwargs={'k': 2}) # Number of documents to retrieveModify chunk parameters in ingest.py:
splitter = RecursiveCharacterTextSplitter(
chunk_size=800, # Chunk size
chunk_overlap=200 # Overlap between chunks
)- RAM: 8GB
- Storage: 10GB free space
- GPU: 4GB VRAM (optional but recommended)
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
For support, please open an issue on GitHub or contact the development team.
- Google for the FLAN-T5 model
- LangChain team for the RAG framework
- Indian legal system for comprehensive documentation
- Open source community for various tools and libraries
Disclaimer: This chatbot is for educational and informational purposes only. It should not be considered as legal advice. Always consult with qualified legal professionals for legal matters.