AI-powered financial document analysis with intelligent section-based summarization using OpenAI's GPT models.
- Project Overview
- Architecture
- Get Started
- Project Structure
- Usage Guide
- Environment Variables
- Inference Benchmarks
- Model Capabilities
- Technology Stack
- Troubleshooting
- License
FinSights is an intelligent financial document analysis platform that processes financial documents (PDF, DOCX) to generate comprehensive summaries with dynamically generated, document-driven sections and an interactive chat interface for context-aware analysis.
- Document Upload & Processing: Users upload or paste financial documents. The system extracts and caches the raw text.
- Dynamic Section Generation: Based on the document content, the system intelligently generates relevant financial analysis sections tailored to the specific document.
- Section-wise Summarization: Users can then generate summaries for each dynamically detected section, allowing them to explore different aspects of the financial document at their own pace.
- Chat with RAG: Users can interact with an intelligent chat interface that uses Retrieval Augmented Generation (RAG) to answer questions about the uploaded document, providing context-aware responses based on the actual document content.
The platform leverages OpenAI's GPT-4o-mini model for intelligent content analysis and summarization. The backend caches extracted documents, allowing users to explore different sections without re-uploading the same document. The RAG-powered chat system enables conversational analysis of financial documents with high accuracy.
The application follows a modular microservices architecture with specialized components for document processing, dynamic section detection, AI-powered summarization, and RAG-based chat:
graph LR
%% ====== FRONTEND ======
subgraph FE[Frontend]
A[React Web UI<br/>Port 5173]
end
%% ====== BACKEND ======
subgraph BE[Backend - FastAPI<br/>Port 8000]
B[API Router]
E[Document Service]
S[Section Detector]
D[LLM Service]
K[PDF Generator]
CHAT[RAG Chat Service]
VEC[Vector Store<br/>Embeddings]
G[In-Memory Cache<br/>TTL 1 hour]
H[Session Summary History]
end
%% ====== EXTERNAL ======
subgraph EXT[External]
F[OpenAI API<br/>gpt-4o-mini]
end
%% ====== CONNECTIONS (ARCHITECTURE) ======
A -->|HTTP| B
B --> E
B --> S
B --> D
B --> K
B --> CHAT
E -->|Extracted Text| G
S -->|Read Cached Text| G
S -->|Detect Sections| D
D -->|Read Cached Text| G
D -->|Generate Summary| H
K -->|Read History| H
CHAT -->|Retrieve Context| VEC
CHAT -->|Generate Response| D
CHAT -->|Store Embeddings| VEC
E -->|Index Document| VEC
D -->|API Call| F
F -->|Response| D
B -->|JSON| A
K -->|PDF File| A
%% ====== STYLES ======
style A fill:#e1f5ff
style B fill:#fff4e1
style S fill:#ffe1f5
style D fill:#ffe1f5
style E fill:#ffe1f5
style K fill:#ffe1f5
style CHAT fill:#ffe1f5
style F fill:#fff3cd
style G fill:#e8f5e9
style H fill:#e8f5e9
style VEC fill:#e8f5e9
Frontend (React)
- User-friendly interface for document upload and section exploration
- Real-time display of dynamically detected sections
- Summary viewing and export functionality
- Interactive chat interface for RAG-based document queries
Backend Services
- Document Service: Extracts text from PDF/DOCX files with validation
- Section Detector: Analyzes document content and identifies relevant financial sections
- LLM Service: Generates section-specific summaries using OpenAI API
- PDF Generator: Creates formatted PDF exports of summaries
- RAG Chat Service: Implements Retrieval Augmented Generation (RAG) for context-aware question answering about uploaded documents
- Vector Store: Manages document embeddings for efficient semantic search in RAG operations
- Cache System: In-memory caching of extracted documents (1-hour TTL)
- History System: Maintains session summary records
External Integration
- OpenAI API: GPT-4o-mini model for intelligent content analysis, summarization, and RAG-based chat responses
Before you begin, ensure you have the following installed and configured:
- Docker and Docker Compose (v20.10+)
- OpenAI API Key (for GPT-4o-mini access)
# Check Docker
docker --version
docker compose version
# Verify Docker is running
docker ps# If cloning:
git clone git@github.com:cld2labs/FinSights.git
cd FinSightsCreate backend/.env with your OpenAI credentials:
cat > backend/.env << EOF
# OpenAI Configuration (REQUIRED)
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4o-mini
# LLM Configuration
LLM_TEMPERATURE=0.2
LLM_MAX_TOKENS=900
# Caching Configuration
CACHE_MAX_DOCS=25
CACHE_TTL_SECONDS=3600
# Service Configuration
SERVICE_PORT=8000
LOG_LEVEL=INFO
# CORS Settings
CORS_ORIGINS=*
EOFReplace your_openai_api_key_here with your actual OpenAI API key.
Option A: Standard Deployment
# Build and start all services
docker compose up --build
# Or run in detached mode (background)
docker compose up -d --buildOption B: View Logs While Running
# All services
docker compose up --build
# In another terminal, view specific logs
docker compose logs -f backend
docker compose logs -f frontendOnce containers are running, access:
- Frontend UI: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- API Redoc: http://localhost:8000/redoc
# Check health status
curl http://localhost:8000/health
# View running containers
docker compose psdocker compose downFinSights/
├── backend/
│ ├── api/
│ │ └── routes.py # API endpoints (document upload, summaries, sections)
│ ├── services/
│ │ ├── llm_service.py # OpenAI LLM integration and section summarization
│ │ ├── pdf_service.py # PDF/DOCX extraction and OCR handling
│ │ ├── rag_service.py # Document-aware RAG logic (doc_id based)
│ │ └── vector_store.py # In-memory ephemeral vector store
│ ├── server.py # FastAPI application entry point
│ ├── config.py # Environment and app configuration
│ ├── requirements.txt # Python dependencies
│ └── Dockerfile # Backend container
├── frontend/
│ ├── src/
│ │ ├── pages/
│ │ │ └── Generate.jsx # Main document upload and section analysis page
│ │ ├── components/ # Reusable UI components
│ │ ├── services/ # API client utilities
│ │ └── App.jsx # Application root
│ ├── package.json # npm dependencies
│ └── Dockerfile # Frontend container
├── docker-compose.yml # Service orchestration
└── README.md # Project documentation
-
Open the Application
- Navigate to
http://localhost:5173
- Navigate to
-
Choose Input Method
- Paste Text Tab: Copy/paste financial document text directly
- Upload File Tab: Upload PDF or DOCX files (max 50MB)
-
Generate Summary
- Click "Summarize" button
- Wait for AI processing
- View comprehensive financial summary
-
Explore Financial Sections
- Click any dynamically generated section chip to view detailed analysis.
- Sections are created automatically based on the document content and are not predefined.
- For example: Financial Performance, Key Metrics, Risks, Opportunities, Outlook / Guidance, and Other Important Highlights.
- Switching sections is instant (cached document).
-
Chat with Your Document (RAG)
- Use the chat interface to ask questions about the document
- System retrieves relevant sections and provides context-aware answers
- Ask follow-up questions for deeper insights
- Examples:
- "What are the main revenue streams?"
- "What risks are mentioned in this document?"
- "What is the projected growth rate?"
-
Export Results
- Click "Export as PDF" button
- Save formatted summary to your computer
-
View History
- All previous summaries in chat-like history
- Scroll through past analyses
- Re-explore or export any summary
- Large PDFs: For PDFs > 100 pages, only first 100 pages are processed
- Best Results: Clearly formatted financial documents with structured text
- Caching: First analysis processes document, subsequent sections are instant
- Temperature Setting: Default 0.2 ensures consistent, focused summaries
Configure the application behavior using environment variables in backend/.env:
| Variable | Description | Default | Type |
|---|---|---|---|
OPENAI_API_KEY |
OpenAI API key for LLM access (REQUIRED) | - | string |
OPENAI_MODEL |
LLM model used for summarization and analysis | gpt-4o-mini |
string |
LLM_TEMPERATURE |
Model creativity level (0.0–2.0, lower = deterministic) | 0.2 |
float |
LLM_MAX_TOKENS |
Maximum tokens per response | 900 |
integer |
RAG_ENABLED |
Enable document-aware RAG flow | true |
boolean |
RAG_MODE |
RAG strategy used (doc_id = cached full-document context) |
doc_id |
string |
RAG_TOP_K |
Number of top relevant context segments used internally | 5 |
integer |
EMBEDDING_MODEL |
Embedding model for internal relevance scoring (if applicable) | text-embedding-3-small |
string |
VECTOR_RESET_ON_UPLOAD |
Clear vector dataset when a new document is uploaded | true |
boolean |
VECTOR_RESET_ON_REFRESH |
Clear vector dataset when the client refreshes the site | true |
boolean |
CACHE_MAX_DOCS |
Maximum documents stored in memory cache | 25 |
integer |
CACHE_TTL_SECONDS |
Cache time-to-live in seconds | 3600 |
integer |
SERVICE_PORT |
Backend API port | 8000 |
integer |
LOG_LEVEL |
Logging level (DEBUG, INFO, WARNING, ERROR) | INFO |
string |
CORS_ORIGINS |
Allowed CORS origins (comma-separated or *) |
* |
string |
MAX_PDF_PAGES |
Maximum PDF pages to process | 100 |
integer |
MAX_PDF_SIZE |
Maximum PDF file size in bytes | 52428800 |
integer |
Note:
This blueprint uses a document-cached RAG approach without static chunking.
- The full extracted document is cached by
doc_idfor fast section switching. - When a new document is uploaded or the client is refreshed, the in-memory vector dataset is automatically cleared to prevent context leakage across documents.
The table below compares inference performance across different providers, deployment modes, and hardware profiles using a standardized FinSights document analysis workload (averaged over 3 runs of the full pipeline: initial summary, overall summary, section summary, RAG indexing, and RAG chat).
| Provider | LLM Model | LLM Context | Embedding Model | Embed Context | Deployment | Avg Input Tokens/Gen | Avg Output Tokens/Gen | Avg Total Tokens/Gen | P50 Latency (ms) | P95 Latency (ms) | Throughput (req/s) | Hardware |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| vLLM | Llama-3.2-3B-Instruct |
4,096 | BAAI/bge-base-en-v1.5 |
512 | Local | 441 | 127 | 568 | 15,283 | 59,437 | 0.050 | Apple Silicon (Metal) (MacBook Pro M4) |
| Intel OPEA EI | Llama-3.2-3B-Instruct |
8,192 | BAAI/bge-base-en-v1.5 |
512 | Enterprise (On-Prem) | 444 | 122 | 566 | 4,393 | 23,270 | 0.133 | CPU-only (Xeon) |
| OpenAI (Cloud) | gpt-4o-mini |
128,000 | text-embedding-3-small |
8,191 | API (Cloud) | 411 | 133 | 544 | 2,772 | 11,906 | 0.221 | N/A |
Notes:
- All benchmarks use the same FinSights document analysis pipeline. Token counts may vary slightly per run due to non-deterministic model output.
- vLLM on Apple Silicon uses Metal (MPS) GPU acceleration for the LLM and CPU-based vLLM for the BERT embedding model (
BAAI/bge-base-en-v1.5).- Intel OPEA Enterprise Inference runs on Intel Xeon CPUs without GPU acceleration.
- Llama 3.2 3B natively supports 128K context, but vLLM local was benchmarked with
--max-model-len 4096due to Apple Silicon memory constraints. EI is configured with 8,192 token context.- Each benchmark run exercises 5 generations: initial summary, overall summary, section summary, RAG indexing (embeddings), and RAG chat.
- Langfuse tracing is used for full observability of each benchmark run.
A 3-billion-parameter open-weight model from Meta's Llama family, optimized for instruction-following and on-device deployment.
| Attribute | Details |
|---|---|
| Parameters | 3.21B |
| Architecture | Transformer with Grouped Query Attention (GQA) — 28 layers, 24 Q-heads / 8 KV-heads |
| Context Window | 128,000 tokens |
| Instruction Tuning | RLHF + supervised fine-tuning on instruction data |
| Multilingual | English, German, French, Italian, Portuguese, Hindi, Spanish, Thai |
| Quantization Formats | GGUF, AWQ, GPTQ, MLX (4-bit) |
| Inference Runtimes | vLLM, Ollama, llama.cpp, LMStudio, SGLang, TGI |
| License | Llama 3.2 Community License (permissive, with acceptable use policy) |
| Deployment | Local, on-prem, air-gapped, cloud — full data sovereignty |
A 110M-parameter BERT-based embedding model from BAAI, widely used for retrieval and RAG pipelines.
| Attribute | Details |
|---|---|
| Parameters | 109M |
| Architecture | BERT base (12 layers, 768 hidden dim) |
| Embedding Dimensions | 768 |
| Max Sequence Length | 512 tokens |
| MTEB Retrieval Score | 53.25 (competitive with models 3x its size) |
| Inference Runtimes | sentence-transformers, vLLM (CPU), ONNX, TGI |
| License | MIT |
| Deployment | Local, on-prem, air-gapped — lightweight enough for CPU |
OpenAI's compact embedding model, used for RAG indexing and retrieval when running with the OpenAI provider.
| Attribute | Details |
|---|---|
| Parameters | Not publicly disclosed |
| Embedding Dimensions | 1,536 (default) or 512 (with dimensions parameter) |
| Max Sequence Length | 8,191 tokens |
| MTEB Retrieval Score | 44.0 |
| Pricing | $0.02 / 1M tokens |
| License | Proprietary (OpenAI Terms of Use) |
| Deployment | Cloud-only — OpenAI API or Azure OpenAI Service |
OpenAI's cost-efficient multimodal model, accessible exclusively via cloud API.
| Attribute | Details |
|---|---|
| Parameters | Not publicly disclosed |
| Architecture | Multimodal Transformer (text + image input, text output) |
| Context Window | 128,000 tokens input / 16,384 tokens max output |
| Tool / Function Calling | Supported; parallel function calling |
| Structured Output | JSON mode and strict JSON schema adherence supported |
| Multilingual | Broad multilingual support |
| Pricing | $0.15 / 1M input tokens, $0.60 / 1M output tokens (Batch API: 50% discount) |
| Fine-Tuning | Supervised fine-tuning via OpenAI API |
| License | Proprietary (OpenAI Terms of Use) |
| Deployment | Cloud-only — OpenAI API or Azure OpenAI Service. No self-hosted or on-prem option |
| Capability | Llama 3.2 3B Instruct | GPT-4o-mini |
|---|---|---|
| Financial document analysis | Yes | Yes |
| RAG-based document chat | Yes | Yes |
| On-prem / air-gapped deployment | Yes | No |
| Data sovereignty | Full (weights run locally) | No (data sent to cloud API) |
| Open weights | Yes (Llama Community License) | No (proprietary) |
| Custom fine-tuning | Full fine-tuning + LoRA adapters | Supervised fine-tuning (API only) |
| Multimodal (image input) | No | Yes |
| Native context window | 128K | 128K |
Both models support financial document analysis and RAG-based chat. However, only Llama 3.2 offers open weights, data sovereignty, and local deployment flexibility — making it suitable for air-gapped, regulated, or cost-sensitive environments. GPT-4o-mini offers lower latency and higher throughput via OpenAI's cloud infrastructure, with added multimodal capabilities.
- Framework: FastAPI (Python web framework)
- AI / LLM: OpenAI GPT-4o-mini (document-aware analysis)
- RAG Architecture: In-memory, document-cached RAG using
doc_id(no static chunking) - Embeddings: OpenAI embeddings (used internally for relevance scoring when required)
- Document Processing:
- pypdf (PDF text extraction)
- python-docx (DOCX processing)
- pdf2image + pytesseract (OCR for image-based PDFs)
- State Management:
- In-memory document cache
- Ephemeral vector dataset (cleared on new upload or client refresh)
- Async Server: Uvicorn (ASGI)
- Config Management: python-dotenv for environment variables
- Framework: React 18 with React Router
- Build Tool: Vite (fast bundler)
- Styling: Tailwind CSS + PostCSS
- UI Components: Lucide React icons
- RAG UX:
- Dynamic, document-driven section chips
- Instant section switching using cached context
- Export: jsPDF for PDF generation
- Notifications: react-hot-toast
Encountering issues? Check the following:
Issue: API not responding
# Check service health
curl http://localhost:8000/health
# View backend logs
docker compose logs backendIssue: OpenAI API errors
- Verify
OPENAI_API_KEYis correct and has credits - Check API key permissions in OpenAI dashboard
- Ensure model
gpt-4o-miniis available in your account
Issue: PDF upload fails
- Max file size: 50MB
- Max pages: 100 pages
- Supported formats: PDF, DOCX
- Ensure file is not corrupted
Issue: Frontend can't connect to API
- Verify backend is running:
docker compose ps - Check CORS settings in
.env - Ensure both services are on same network
Enable debug logging:
# Update .env
LOG_LEVEL=DEBUG
# Restart services
docker compose restart backend
docker compose logs -f backendThis project is licensed under our LICENSE file for details.
FinSights is provided as-is for analysis and informational purposes. While we strive for accuracy:
- Always verify AI-generated summaries against original documents
- Do not rely solely on AI summaries for investment decisions
- Consult financial advisors for investment guidance
- Test thoroughly before using in production environments
For full disclaimer details, see DISCLAIMER.md
