Skip to content

bPavan16/law-Rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

law-library: Indian Law Q&A Chatbot

A Question-Answering chatbot specialized in Indian law, built using RAG (Retrieval-Augmented Generation) architecture with LangChain and Streamlit.

🚀 Features

  • Comprehensive Legal Knowledge: Works with any Indian law documents you provide (PDFs)
  • RAG Architecture: Combines retrieval-based search with generative AI for accurate, context-aware responses
  • Interactive Web Interface: User-friendly Streamlit-based chat interface
  • Vector Database: Efficient document retrieval using FAISS vector store
  • Specialized Legal Training: Custom prompt engineering for legal domain expertise
  • Flexible Document Support: Add your own legal PDFs to customize the knowledge base

📚 Legal Document Coverage

The chatbot uses a Retrieval-Augmented Generation (RAG) approach to answer questions based on legal documents you provide.

� How to Add Your Legal Documents

This project does not include pre-loaded legal documents due to copyright restrictions. You need to add your own legal PDFs to the law-library/dataset/ directory.

Recommended Document Types:

📜 Constitutional & Administrative Law

  • The Constitution of India
  • Judicial review and constitutional amendments
  • Administrative law and governance documents

⚖️ Criminal & Evidence Law

  • The Indian Penal Code (IPC)
  • Criminal Procedure Code (CrPC)
  • Indian Evidence Act, 1872

💼 Corporate & Business Law

  • Companies Act
  • Contract Act
  • Partnership and business regulations

👨‍💼 Labour & Employment Law

  • Industrial Disputes Act
  • Minimum Wages Act
  • Employee State Insurance Act
  • Shops and Establishments Act

🌐 Specialized Legal Areas

  • Information Technology Act (Cyber Laws)
  • Banking Regulation Act
  • Consumer Protection Act
  • Intellectual Property laws

� Where to Find Legal Documents

You can obtain legal documents from these free and legal sources:

  1. Government Websites:

  2. Public Domain Sources:

  3. Academic Institutions:

    • Law school libraries (open access materials)
    • University legal research repositories

⚖️ Copyright Notice

IMPORTANT:

  • Only use documents you have the legal right to use
  • Respect copyright and intellectual property laws
  • Use only public domain or openly licensed legal texts
  • For copyrighted materials, obtain proper permissions

💡 Note: Place all your legal PDF files in the law-library/dataset/ directory before running the ingestion script.

🛠️ Technology Stack

  • Language Model: Google FLAN-T5-Base (free, open-access model)
  • Framework: LangChain for RAG pipeline
  • Vector Store: FAISS for efficient similarity search
  • Embeddings: HuggingFace Sentence Transformers
  • Frontend: Streamlit for web interface
  • Document Processing: PyPDF processing for legal documents

📁 Project Structure

law-library/
├── README.md
├── requirements.txt
└── law-library/
    ├── app.py              # Streamlit web application
    ├── ingest.py           # Document processing and vectorization
    ├── utils.py            # Core RAG pipeline and utilities
    ├── dataset/            # Legal PDF documents (13 files)
    └── vectorstore/        # FAISS vector database
        ├── index.faiss
        └── index.pkl

🚀 Quick Start

Prerequisites

Before you begin, ensure you have the following:

  • Python 3.8+ installed on your system
  • pip (Python package manager)
  • 8GB+ RAM available
  • 10GB+ free disk space for models and documents
  • (Optional) CUDA-compatible GPU for faster inference

Step-by-Step Installation

Step 1: Clone the Repository

git clone https://github.com/bPavan16/law-Rag.git
cd law-Rag

Step 2: Create a Virtual Environment (Recommended)

# Create a virtual environment
python -m venv .venv

# Activate the virtual environment
# On Linux/macOS:
source .venv/bin/activate

# On Windows:
.venv\Scripts\activate

Step 3: Install Required Dependencies

# Install all required packages
pip install -r requirements.txt

# Install additional LangChain packages (if needed)
pip install -U langchain-huggingface langchain-text-splitters accelerate

Step 4: Add Your Legal Documents

IMPORTANT: This project does not include legal documents due to copyright restrictions.

  1. Create the dataset directory (if it doesn't exist):

    mkdir -p law-library/dataset
  2. Download legal documents from authorized sources:

    • Visit India Code for official Bare Acts
    • Download PDFs of laws you want to query (e.g., IPC, Constitution, IT Act)
    • Ensure documents are in PDF format
  3. Place PDF files in the dataset directory:

    # Copy your downloaded PDFs to the dataset folder
    cp /path/to/your/legal-pdfs/*.pdf law-library/dataset/
  4. Verify your documents:

    cd law-library
    ls -la dataset/  # On Linux/macOS
    dir dataset\     # On Windows

    You should see your PDF files listed. The system supports any number of PDF files.

Step 5: Verify Dataset

Ensure all 13 PDF files are in the law-library/dataset/ directory:

cd law-library
ls -la dataset/  # On Linux/macOS
dir dataset\     # On Windows

You should see 13 PDF files listed (see Legal Document Coverage section).

Step 6: Build the Vector Database

Process all legal documents and create the FAISS vector store:

# Make sure you're in the law-library directory
cd law-library  # if not already there

# Run the ingestion script
python ingest.py

Expected output:

Starting document embedding process...
================================================================================
Found X PDF files in dataset/:
  1. your-legal-document-1.pdf
  2. your-legal-document-2.pdf
  ... (and more)
================================================================================

Loading PDFs from dataset/...
✓ Loaded XXXX document pages

Splitting documents into chunks...
✓ Created XXXX chunks

Loading embedding model (sentence-transformers/all-MiniLM-L6-v2)...
✓ Embedding model loaded

Creating FAISS vector store (this may take several minutes)...
✓ FAISS vector store created

Saving vector store to vectorstore/...
================================================================================
✅ Vector store created and saved successfully!
   - Total documents processed: X
   - Total pages: XXXX
   - Total chunks: XXXX
   - Saved to: vectorstore/
================================================================================

⏱️ Note: Processing time varies based on the number and size of documents (typically 5-15 minutes).

Step 7: Launch the Application

# Start the Streamlit web interface
streamlit run app.py

Step 8: Access the Chatbot

Open your web browser and navigate to:

http://localhost:8501

🎉 Success! You can now start asking questions about Indian law.

Quick Commands Summary

# Complete setup in one go
git clone https://github.com/bPavan16/law-Rag.git
cd law-Rag
python -m venv .venv
source .venv/bin/activate  # On Linux/macOS
pip install -r requirements.txt
cd law-library
python ingest.py
streamlit run app.py

💡 Usage Examples

Ask questions about Indian law such as:

  • "What are the fundamental rights guaranteed by the Indian Constitution?"
  • "Explain the provisions of Section 420 of the Indian Penal Code"
  • "What are the key features of the Companies Act?"
  • "What are the cyber crime laws in India?"
  • "Explain the concept of natural justice in Indian law"

⚙️ Configuration

Model Configuration

The default configuration uses Google FLAN-T5-Base, a free and open-access model. You can modify the model in utils.py:

repo_id = 'google/flan-t5-base'  # Change model here

Other recommended free models:

  • google/flan-t5-large - Larger version for better performance
  • google/flan-t5-small - Smaller, faster version
  • mistralai/Mistral-7B-v0.1 - Open-source alternative

Retrieval Configuration

Adjust retrieval parameters in utils.py:

retriever=db.as_retriever(search_kwargs={'k': 2})  # Number of documents to retrieve

Text Splitting

Modify chunk parameters in ingest.py:

splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,    # Chunk size
    chunk_overlap=200  # Overlap between chunks
)

🔧 System Requirements

Minimum Requirements

  • RAM: 8GB
  • Storage: 10GB free space
  • GPU: 4GB VRAM (optional but recommended)

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Contributing Guidelines

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📞 Support

For support, please open an issue on GitHub or contact the development team.

✨ Acknowledgments

  • Google for the FLAN-T5 model
  • LangChain team for the RAG framework
  • Indian legal system for comprehensive documentation
  • Open source community for various tools and libraries

Disclaimer: This chatbot is for educational and informational purposes only. It should not be considered as legal advice. Always consult with qualified legal professionals for legal matters.

About

A Question-Answering chatbot specialized in Indian law, built using RAG (Retrieval-Augmented Generation) architecture with LangChain and Streamlit.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages