This repository contains the backend code for an AI-powered document question-answering system. The system allows users to upload PDF documents and ask questions about their content. The backend processes the documents, extracts text, and uses natural language processing to provide answers to the users' questions.
- PDF Upload: Users can upload PDF documents to the system.
- Text Extraction: Extract text content from uploaded PDF documents.
- Question Answering: Use NLP to answer questions based on the content of the uploaded documents.
- Database Integration: Store and retrieve document information and text content.
ai-planet-be/
├── alembic/
├── app/
│ ├── routers/
│ │ ├── __init__.py
│ │ ├── documents.py
│ │ └── questions.py
│ ├── __init__.py
│ ├── crud.py
│ ├── database.py
│ ├── main.py
│ ├── models.py
│ ├── nlp_processor.py
│ ├── pdf_processing.py
│ └── schemas.py
├── env/
├── migrations/
├── .env
├── alembic.ini
└── LICENSE
- Python 3.8+
- FastAPI
- SQLAlchemy
- PyMuPDF
- LangChain
- OpenAI API Key
git clone https://github.com/AI-planet-Project/ai-planet-backend.git
cd ai-planet-backendpython -m venv env
source env/bin/activate # On Windows use `env\Scripts\activate`pip install -r requirements.txtCreate a .env file in the root directory of the project and add the following values:
LANGCHAIN_TRACING_V2=true
LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
LANGCHAIN_API_KEY=your-langchain-api-key
LANGCHAIN_PROJECT=your-langchain-project
OPENAI_API_KEY=your-openai-api-key
Run the following command to create all database tables:
python -m app.databaseStart the FastAPI application with:
uvicorn app.main:app --reloadThe application will be available at http://127.0.0.1:8000.
- URL:
/api/documents/ - Method:
POST - Description: Uploads a PDF document, extracts text content, and stores it in the database.
- URL:
/api/questions/{doc_id} - Method:
POST - Description: Asks a question based on the content of an uploaded document.
- Initializes the FastAPI application and includes routers.
- Sets up the database connection and session management.
- Provides a function to create all tables.
- Defines the SQLAlchemy ORM model for the
Documenttable.
- Defines Pydantic models for data validation and serialization.
- Provides CRUD operations for interacting with the database.
- Contains functions to extract text from PDF documents.
- Contains functions to process questions and generate answers using LangChain.
- Defines the API endpoint for uploading documents.
- Defines the API endpoint for asking questions based on document content.
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
This project is licensed under the MIT License.