📊 FinSights - Financial Document Summarization

AI-powered financial document analysis with intelligent section-based summarization using OpenAI's GPT models.

📋 Table of Contents

Project Overview
Architecture
Get Started
- Prerequisites
- Quick Start
Project Structure
Usage Guide
Environment Variables
Inference Benchmarks
Model Capabilities
Technology Stack
Troubleshooting
License

Project Overview

FinSights is an intelligent financial document analysis platform that processes financial documents (PDF, DOCX) to generate comprehensive summaries with dynamically generated, document-driven sections and an interactive chat interface for context-aware analysis.

How It Works

Document Upload & Processing: Users upload or paste financial documents. The system extracts and caches the raw text.
Dynamic Section Generation: Based on the document content, the system intelligently generates relevant financial analysis sections tailored to the specific document.
Section-wise Summarization: Users can then generate summaries for each dynamically detected section, allowing them to explore different aspects of the financial document at their own pace.
Chat with RAG: Users can interact with an intelligent chat interface that uses Retrieval Augmented Generation (RAG) to answer questions about the uploaded document, providing context-aware responses based on the actual document content.

The platform leverages OpenAI's GPT-4o-mini model for intelligent content analysis and summarization. The backend caches extracted documents, allowing users to explore different sections without re-uploading the same document. The RAG-powered chat system enables conversational analysis of financial documents with high accuracy.

Architecture

The application follows a modular microservices architecture with specialized components for document processing, dynamic section detection, AI-powered summarization, and RAG-based chat:

graph LR

  %% ====== FRONTEND ======
  subgraph FE[Frontend]
    A[React Web UI<br/>Port 5173]
  end

  %% ====== BACKEND ======
  subgraph BE[Backend - FastAPI<br/>Port 8000]
    B[API Router]
    E[Document Service]
    S[Section Detector]
    D[LLM Service]
    K[PDF Generator]
    CHAT[RAG Chat Service]
    VEC[Vector Store<br/>Embeddings]
    G[In-Memory Cache<br/>TTL 1 hour]
    H[Session Summary History]
  end

  %% ====== EXTERNAL ======
  subgraph EXT[External]
    F[OpenAI API<br/>gpt-4o-mini]
  end

  %% ====== CONNECTIONS (ARCHITECTURE) ======
  A -->|HTTP| B

  B --> E
  B --> S
  B --> D
  B --> K
  B --> CHAT

  E -->|Extracted Text| G
  S -->|Read Cached Text| G
  S -->|Detect Sections| D
  D -->|Read Cached Text| G
  D -->|Generate Summary| H
  K -->|Read History| H
  
  CHAT -->|Retrieve Context| VEC
  CHAT -->|Generate Response| D
  CHAT -->|Store Embeddings| VEC
  E -->|Index Document| VEC

  D -->|API Call| F
  F -->|Response| D

  B -->|JSON| A
  K -->|PDF File| A

  %% ====== STYLES ======
  style A fill:#e1f5ff
  style B fill:#fff4e1
  style S fill:#ffe1f5
  style D fill:#ffe1f5
  style E fill:#ffe1f5
  style K fill:#ffe1f5
  style CHAT fill:#ffe1f5
  style F fill:#fff3cd
  style G fill:#e8f5e9
  style H fill:#e8f5e9
  style VEC fill:#e8f5e9

Architecture Components

Frontend (React)

User-friendly interface for document upload and section exploration
Real-time display of dynamically detected sections
Summary viewing and export functionality
Interactive chat interface for RAG-based document queries

Backend Services

Document Service: Extracts text from PDF/DOCX files with validation
Section Detector: Analyzes document content and identifies relevant financial sections
LLM Service: Generates section-specific summaries using OpenAI API
PDF Generator: Creates formatted PDF exports of summaries
RAG Chat Service: Implements Retrieval Augmented Generation (RAG) for context-aware question answering about uploaded documents
Vector Store: Manages document embeddings for efficient semantic search in RAG operations
Cache System: In-memory caching of extracted documents (1-hour TTL)
History System: Maintains session summary records

External Integration

OpenAI API: GPT-4o-mini model for intelligent content analysis, summarization, and RAG-based chat responses

Get Started

Prerequisites

Before you begin, ensure you have the following installed and configured:

Docker and Docker Compose (v20.10+)
- Install Docker
- Install Docker Compose
OpenAI API Key (for GPT-4o-mini access)
- Create OpenAI Account
- API Key Management

Verify Installation

# Check Docker
docker --version
docker compose version

# Verify Docker is running
docker ps

Quick Start

1. Clone or Navigate to Repository

# If cloning:
git clone git@github.com:cld2labs/FinSights.git
cd FinSights

2. Configure Environment Variables

Create backend/.env with your OpenAI credentials:

cat > backend/.env << EOF
# OpenAI Configuration (REQUIRED)
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4o-mini

# LLM Configuration
LLM_TEMPERATURE=0.2
LLM_MAX_TOKENS=900

# Caching Configuration
CACHE_MAX_DOCS=25
CACHE_TTL_SECONDS=3600

# Service Configuration
SERVICE_PORT=8000
LOG_LEVEL=INFO

# CORS Settings
CORS_ORIGINS=*
EOF

Replace your_openai_api_key_here with your actual OpenAI API key.

3. Launch the Application

Option A: Standard Deployment

# Build and start all services
docker compose up --build

# Or run in detached mode (background)
docker compose up -d --build

Option B: View Logs While Running

# All services
docker compose up --build

# In another terminal, view specific logs
docker compose logs -f backend
docker compose logs -f frontend

4. Access the Application

Once containers are running, access:

Frontend UI: http://localhost:5173
Backend API: http://localhost:8000
API Documentation: http://localhost:8000/docs
API Redoc: http://localhost:8000/redoc

5. Verify Services

# Check health status
curl http://localhost:8000/health

# View running containers
docker compose ps

6. Stop the Application

docker compose down

Project Structure

FinSights/
├── backend/
│ ├── api/
│ │ └── routes.py        # API endpoints (document upload, summaries, sections)
│ ├── services/
│ │ ├── llm_service.py   # OpenAI LLM integration and section summarization
│ │ ├── pdf_service.py    # PDF/DOCX extraction and OCR handling
│ │ ├── rag_service.py    # Document-aware RAG logic (doc_id based)
│ │ └── vector_store.py    # In-memory ephemeral vector store
│ ├── server.py            # FastAPI application entry point
│ ├── config.py             # Environment and app configuration
│ ├── requirements.txt      # Python dependencies
│ └── Dockerfile           # Backend container
├── frontend/
│ ├── src/
│ │ ├── pages/
│ │ │ └── Generate.jsx     # Main document upload and section analysis page
│ │ ├── components/        # Reusable UI components
│ │ ├── services/           # API client utilities
│ │ └── App.jsx            # Application root
│ ├── package.json         # npm dependencies
│ └── Dockerfile          # Frontend container
├── docker-compose.yml    # Service orchestration
└── README.md             # Project documentation

Usage Guide

Using FinSights

Open the Application
- Navigate to http://localhost:5173
Choose Input Method
- Paste Text Tab: Copy/paste financial document text directly
- Upload File Tab: Upload PDF or DOCX files (max 50MB)
Generate Summary
- Click "Summarize" button
- Wait for AI processing
- View comprehensive financial summary
Explore Financial Sections

Click any dynamically generated section chip to view detailed analysis.
Sections are created automatically based on the document content and are not predefined.
For example: Financial Performance, Key Metrics, Risks, Opportunities, Outlook / Guidance, and Other Important Highlights.
Switching sections is instant (cached document).

Chat with Your Document (RAG)
- Use the chat interface to ask questions about the document
- System retrieves relevant sections and provides context-aware answers
- Ask follow-up questions for deeper insights
- Examples:
  - "What are the main revenue streams?"
  - "What risks are mentioned in this document?"
  - "What is the projected growth rate?"
Export Results
- Click "Export as PDF" button
- Save formatted summary to your computer
View History
- All previous summaries in chat-like history
- Scroll through past analyses
- Re-explore or export any summary

Performance Tips

Large PDFs: For PDFs > 100 pages, only first 100 pages are processed
Best Results: Clearly formatted financial documents with structured text
Caching: First analysis processes document, subsequent sections are instant
Temperature Setting: Default 0.2 ensures consistent, focused summaries

Environment Variables

Configure the application behavior using environment variables in backend/.env:

Variable	Description	Default	Type
`OPENAI_API_KEY`	OpenAI API key for LLM access (REQUIRED)	-	string
`OPENAI_MODEL`	LLM model used for summarization and analysis	`gpt-4o-mini`	string
`LLM_TEMPERATURE`	Model creativity level (0.0–2.0, lower = deterministic)	`0.2`	float
`LLM_MAX_TOKENS`	Maximum tokens per response	`900`	integer
`RAG_ENABLED`	Enable document-aware RAG flow	`true`	boolean
`RAG_MODE`	RAG strategy used (`doc_id` = cached full-document context)	`doc_id`	string
`RAG_TOP_K`	Number of top relevant context segments used internally	`5`	integer
`EMBEDDING_MODEL`	Embedding model for internal relevance scoring (if applicable)	`text-embedding-3-small`	string
`VECTOR_RESET_ON_UPLOAD`	Clear vector dataset when a new document is uploaded	`true`	boolean
`VECTOR_RESET_ON_REFRESH`	Clear vector dataset when the client refreshes the site	`true`	boolean
`CACHE_MAX_DOCS`	Maximum documents stored in memory cache	`25`	integer
`CACHE_TTL_SECONDS`	Cache time-to-live in seconds	`3600`	integer
`SERVICE_PORT`	Backend API port	`8000`	integer
`LOG_LEVEL`	Logging level (DEBUG, INFO, WARNING, ERROR)	`INFO`	string
`CORS_ORIGINS`	Allowed CORS origins (comma-separated or `*`)	`*`	string
`MAX_PDF_PAGES`	Maximum PDF pages to process	`100`	integer
`MAX_PDF_SIZE`	Maximum PDF file size in bytes	`52428800`	integer

Note:
This blueprint uses a document-cached RAG approach without static chunking.

The full extracted document is cached by doc_id for fast section switching.
When a new document is uploaded or the client is refreshed, the in-memory vector dataset is automatically cleared to prevent context leakage across documents.

Inference Benchmarks

The table below compares inference performance across different providers, deployment modes, and hardware profiles using a standardized FinSights document analysis workload (averaged over 3 runs of the full pipeline: initial summary, overall summary, section summary, RAG indexing, and RAG chat).

Provider	LLM Model	LLM Context	Embedding Model	Embed Context	Deployment	Avg Input Tokens/Gen	Avg Output Tokens/Gen	Avg Total Tokens/Gen	P50 Latency (ms)	P95 Latency (ms)	Throughput (req/s)	Hardware
vLLM	`Llama-3.2-3B-Instruct`	4,096	`BAAI/bge-base-en-v1.5`	512	Local	441	127	568	15,283	59,437	0.050	Apple Silicon (Metal) (MacBook Pro M4)
Intel OPEA EI	`Llama-3.2-3B-Instruct`	8,192	`BAAI/bge-base-en-v1.5`	512	Enterprise (On-Prem)	444	122	566	4,393	23,270	0.133	CPU-only (Xeon)
OpenAI (Cloud)	`gpt-4o-mini`	128,000	`text-embedding-3-small`	8,191	API (Cloud)	411	133	544	2,772	11,906	0.221	N/A

Notes:

All benchmarks use the same FinSights document analysis pipeline. Token counts may vary slightly per run due to non-deterministic model output.

vLLM on Apple Silicon uses Metal (MPS) GPU acceleration for the LLM and CPU-based vLLM for the BERT embedding model (BAAI/bge-base-en-v1.5).

Intel OPEA Enterprise Inference runs on Intel Xeon CPUs without GPU acceleration.

Llama 3.2 3B natively supports 128K context, but vLLM local was benchmarked with --max-model-len 4096 due to Apple Silicon memory constraints. EI is configured with 8,192 token context.

Each benchmark run exercises 5 generations: initial summary, overall summary, section summary, RAG indexing (embeddings), and RAG chat.

Langfuse tracing is used for full observability of each benchmark run.

Model Capabilities

Meta Llama 3.2 3B Instruct

A 3-billion-parameter open-weight model from Meta's Llama family, optimized for instruction-following and on-device deployment.

Attribute	Details
Parameters	3.21B
Architecture	Transformer with Grouped Query Attention (GQA) — 28 layers, 24 Q-heads / 8 KV-heads
Context Window	128,000 tokens
Instruction Tuning	RLHF + supervised fine-tuning on instruction data
Multilingual	English, German, French, Italian, Portuguese, Hindi, Spanish, Thai
Quantization Formats	GGUF, AWQ, GPTQ, MLX (4-bit)
Inference Runtimes	vLLM, Ollama, llama.cpp, LMStudio, SGLang, TGI
License	Llama 3.2 Community License (permissive, with acceptable use policy)
Deployment	Local, on-prem, air-gapped, cloud — full data sovereignty

BAAI/bge-base-en-v1.5

A 110M-parameter BERT-based embedding model from BAAI, widely used for retrieval and RAG pipelines.

Attribute	Details
Parameters	109M
Architecture	BERT base (12 layers, 768 hidden dim)
Embedding Dimensions	768
Max Sequence Length	512 tokens
MTEB Retrieval Score	53.25 (competitive with models 3x its size)
Inference Runtimes	sentence-transformers, vLLM (CPU), ONNX, TGI
License	MIT
Deployment	Local, on-prem, air-gapped — lightweight enough for CPU

OpenAI text-embedding-3-small

OpenAI's compact embedding model, used for RAG indexing and retrieval when running with the OpenAI provider.

Attribute	Details
Parameters	Not publicly disclosed
Embedding Dimensions	1,536 (default) or 512 (with `dimensions` parameter)
Max Sequence Length	8,191 tokens
MTEB Retrieval Score	44.0
Pricing	$0.02 / 1M tokens
License	Proprietary (OpenAI Terms of Use)
Deployment	Cloud-only — OpenAI API or Azure OpenAI Service

GPT-4o-mini

OpenAI's cost-efficient multimodal model, accessible exclusively via cloud API.

Attribute	Details
Parameters	Not publicly disclosed
Architecture	Multimodal Transformer (text + image input, text output)
Context Window	128,000 tokens input / 16,384 tokens max output
Tool / Function Calling	Supported; parallel function calling
Structured Output	JSON mode and strict JSON schema adherence supported
Multilingual	Broad multilingual support
Pricing	$0.15 / 1M input tokens, $0.60 / 1M output tokens (Batch API: 50% discount)
Fine-Tuning	Supervised fine-tuning via OpenAI API
License	Proprietary (OpenAI Terms of Use)
Deployment	Cloud-only — OpenAI API or Azure OpenAI Service. No self-hosted or on-prem option

Comparison Summary

Capability	Llama 3.2 3B Instruct	GPT-4o-mini
Financial document analysis	Yes	Yes
RAG-based document chat	Yes	Yes
On-prem / air-gapped deployment	Yes	No
Data sovereignty	Full (weights run locally)	No (data sent to cloud API)
Open weights	Yes (Llama Community License)	No (proprietary)
Custom fine-tuning	Full fine-tuning + LoRA adapters	Supervised fine-tuning (API only)
Multimodal (image input)	No	Yes
Native context window	128K	128K

Both models support financial document analysis and RAG-based chat. However, only Llama 3.2 offers open weights, data sovereignty, and local deployment flexibility — making it suitable for air-gapped, regulated, or cost-sensitive environments. GPT-4o-mini offers lower latency and higher throughput via OpenAI's cloud infrastructure, with added multimodal capabilities.

Technology Stack

Backend

Framework: FastAPI (Python web framework)
AI / LLM: OpenAI GPT-4o-mini (document-aware analysis)
RAG Architecture: In-memory, document-cached RAG using doc_id (no static chunking)
Embeddings: OpenAI embeddings (used internally for relevance scoring when required)
Document Processing:
- pypdf (PDF text extraction)
- python-docx (DOCX processing)
- pdf2image + pytesseract (OCR for image-based PDFs)
State Management:
- In-memory document cache
- Ephemeral vector dataset (cleared on new upload or client refresh)
Async Server: Uvicorn (ASGI)
Config Management: python-dotenv for environment variables

Frontend

Framework: React 18 with React Router
Build Tool: Vite (fast bundler)
Styling: Tailwind CSS + PostCSS
UI Components: Lucide React icons
RAG UX:
- Dynamic, document-driven section chips
- Instant section switching using cached context
Export: jsPDF for PDF generation
Notifications: react-hot-toast

Troubleshooting

Encountering issues? Check the following:

Common Issues

Issue: API not responding

# Check service health
curl http://localhost:8000/health

# View backend logs
docker compose logs backend

Issue: OpenAI API errors

Verify OPENAI_API_KEY is correct and has credits
Check API key permissions in OpenAI dashboard
Ensure model gpt-4o-mini is available in your account

Issue: PDF upload fails

Max file size: 50MB
Max pages: 100 pages
Supported formats: PDF, DOCX
Ensure file is not corrupted

Issue: Frontend can't connect to API

Verify backend is running: docker compose ps
Check CORS settings in .env
Ensure both services are on same network

Debug Mode

Enable debug logging:

# Update .env
LOG_LEVEL=DEBUG

# Restart services
docker compose restart backend
docker compose logs -f backend

License

This project is licensed under our LICENSE file for details.

Disclaimer

FinSights is provided as-is for analysis and informational purposes. While we strive for accuracy:

Always verify AI-generated summaries against original documents
Do not rely solely on AI summaries for investment decisions
Consult financial advisors for investment guidance
Test thoroughly before using in production environments

For full disclaimer details, see DISCLAIMER.md

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github/workflows		.github/workflows
backend		backend
docs/assets		docs/assets
frontend		frontend
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
DISCLAIMER.md		DISCLAIMER.md
LICENSE.md		LICENSE.md
README.md		README.md
SECURITY.md		SECURITY.md
TERMS_AND_CONDITIONS.md		TERMS_AND_CONDITIONS.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

📊 FinSights - Financial Document Summarization

📋 Table of Contents

Project Overview

How It Works

Architecture

Architecture Components

Get Started

Prerequisites

Verify Installation

Quick Start

1. Clone or Navigate to Repository

2. Configure Environment Variables

3. Launch the Application

4. Access the Application

5. Verify Services

6. Stop the Application

Project Structure

Usage Guide

Using FinSights

Performance Tips

Environment Variables

Inference Benchmarks

Model Capabilities

Meta Llama 3.2 3B Instruct

BAAI/bge-base-en-v1.5

OpenAI text-embedding-3-small

GPT-4o-mini

Comparison Summary

Technology Stack

Backend

Frontend

Troubleshooting

Common Issues

Debug Mode

License

Disclaimer

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages