Skip to content

cld2labs/FinSights

Repository files navigation

Company Logo

📊 FinSights - Financial Document Summarization

AI-powered financial document analysis with intelligent section-based summarization using OpenAI's GPT models.


📋 Table of Contents


Project Overview

FinSights is an intelligent financial document analysis platform that processes financial documents (PDF, DOCX) to generate comprehensive summaries with dynamically generated, document-driven sections and an interactive chat interface for context-aware analysis.

How It Works

  1. Document Upload & Processing: Users upload or paste financial documents. The system extracts and caches the raw text.
  2. Dynamic Section Generation: Based on the document content, the system intelligently generates relevant financial analysis sections tailored to the specific document.
  3. Section-wise Summarization: Users can then generate summaries for each dynamically detected section, allowing them to explore different aspects of the financial document at their own pace.
  4. Chat with RAG: Users can interact with an intelligent chat interface that uses Retrieval Augmented Generation (RAG) to answer questions about the uploaded document, providing context-aware responses based on the actual document content.

The platform leverages OpenAI's GPT-4o-mini model for intelligent content analysis and summarization. The backend caches extracted documents, allowing users to explore different sections without re-uploading the same document. The RAG-powered chat system enables conversational analysis of financial documents with high accuracy.


Architecture

The application follows a modular microservices architecture with specialized components for document processing, dynamic section detection, AI-powered summarization, and RAG-based chat:

graph LR

  %% ====== FRONTEND ======
  subgraph FE[Frontend]
    A[React Web UI<br/>Port 5173]
  end

  %% ====== BACKEND ======
  subgraph BE[Backend - FastAPI<br/>Port 8000]
    B[API Router]
    E[Document Service]
    S[Section Detector]
    D[LLM Service]
    K[PDF Generator]
    CHAT[RAG Chat Service]
    VEC[Vector Store<br/>Embeddings]
    G[In-Memory Cache<br/>TTL 1 hour]
    H[Session Summary History]
  end

  %% ====== EXTERNAL ======
  subgraph EXT[External]
    F[OpenAI API<br/>gpt-4o-mini]
  end

  %% ====== CONNECTIONS (ARCHITECTURE) ======
  A -->|HTTP| B

  B --> E
  B --> S
  B --> D
  B --> K
  B --> CHAT

  E -->|Extracted Text| G
  S -->|Read Cached Text| G
  S -->|Detect Sections| D
  D -->|Read Cached Text| G
  D -->|Generate Summary| H
  K -->|Read History| H
  
  CHAT -->|Retrieve Context| VEC
  CHAT -->|Generate Response| D
  CHAT -->|Store Embeddings| VEC
  E -->|Index Document| VEC

  D -->|API Call| F
  F -->|Response| D

  B -->|JSON| A
  K -->|PDF File| A

  %% ====== STYLES ======
  style A fill:#e1f5ff
  style B fill:#fff4e1
  style S fill:#ffe1f5
  style D fill:#ffe1f5
  style E fill:#ffe1f5
  style K fill:#ffe1f5
  style CHAT fill:#ffe1f5
  style F fill:#fff3cd
  style G fill:#e8f5e9
  style H fill:#e8f5e9
  style VEC fill:#e8f5e9

Loading

Architecture Components

Frontend (React)

  • User-friendly interface for document upload and section exploration
  • Real-time display of dynamically detected sections
  • Summary viewing and export functionality
  • Interactive chat interface for RAG-based document queries

Backend Services

  • Document Service: Extracts text from PDF/DOCX files with validation
  • Section Detector: Analyzes document content and identifies relevant financial sections
  • LLM Service: Generates section-specific summaries using OpenAI API
  • PDF Generator: Creates formatted PDF exports of summaries
  • RAG Chat Service: Implements Retrieval Augmented Generation (RAG) for context-aware question answering about uploaded documents
  • Vector Store: Manages document embeddings for efficient semantic search in RAG operations
  • Cache System: In-memory caching of extracted documents (1-hour TTL)
  • History System: Maintains session summary records

External Integration

  • OpenAI API: GPT-4o-mini model for intelligent content analysis, summarization, and RAG-based chat responses

Get Started

Prerequisites

Before you begin, ensure you have the following installed and configured:

Verify Installation

# Check Docker
docker --version
docker compose version

# Verify Docker is running
docker ps

Quick Start

1. Clone or Navigate to Repository

# If cloning:
git clone git@github.com:cld2labs/FinSights.git
cd FinSights

2. Configure Environment Variables

Create backend/.env with your OpenAI credentials:

cat > backend/.env << EOF
# OpenAI Configuration (REQUIRED)
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4o-mini

# LLM Configuration
LLM_TEMPERATURE=0.2
LLM_MAX_TOKENS=900

# Caching Configuration
CACHE_MAX_DOCS=25
CACHE_TTL_SECONDS=3600

# Service Configuration
SERVICE_PORT=8000
LOG_LEVEL=INFO

# CORS Settings
CORS_ORIGINS=*
EOF

Replace your_openai_api_key_here with your actual OpenAI API key.

3. Launch the Application

Option A: Standard Deployment

# Build and start all services
docker compose up --build

# Or run in detached mode (background)
docker compose up -d --build

Option B: View Logs While Running

# All services
docker compose up --build

# In another terminal, view specific logs
docker compose logs -f backend
docker compose logs -f frontend

4. Access the Application

Once containers are running, access:

5. Verify Services

# Check health status
curl http://localhost:8000/health

# View running containers
docker compose ps

6. Stop the Application

docker compose down

Project Structure

FinSights/
├── backend/
│ ├── api/
│ │ └── routes.py        # API endpoints (document upload, summaries, sections)
│ ├── services/
│ │ ├── llm_service.py   # OpenAI LLM integration and section summarization
│ │ ├── pdf_service.py    # PDF/DOCX extraction and OCR handling
│ │ ├── rag_service.py    # Document-aware RAG logic (doc_id based)
│ │ └── vector_store.py    # In-memory ephemeral vector store
│ ├── server.py            # FastAPI application entry point
│ ├── config.py             # Environment and app configuration
│ ├── requirements.txt      # Python dependencies
│ └── Dockerfile           # Backend container
├── frontend/
│ ├── src/
│ │ ├── pages/
│ │ │ └── Generate.jsx     # Main document upload and section analysis page
│ │ ├── components/        # Reusable UI components
│ │ ├── services/           # API client utilities
│ │ └── App.jsx            # Application root
│ ├── package.json         # npm dependencies
│ └── Dockerfile          # Frontend container
├── docker-compose.yml    # Service orchestration
└── README.md             # Project documentation

Usage Guide

Using FinSights

  1. Open the Application

    • Navigate to http://localhost:5173
  2. Choose Input Method

    • Paste Text Tab: Copy/paste financial document text directly
    • Upload File Tab: Upload PDF or DOCX files (max 50MB)
  3. Generate Summary

    • Click "Summarize" button
    • Wait for AI processing
    • View comprehensive financial summary
  4. Explore Financial Sections

  • Click any dynamically generated section chip to view detailed analysis.
  • Sections are created automatically based on the document content and are not predefined.
  • For example: Financial Performance, Key Metrics, Risks, Opportunities, Outlook / Guidance, and Other Important Highlights.
  • Switching sections is instant (cached document).
  1. Chat with Your Document (RAG)

    • Use the chat interface to ask questions about the document
    • System retrieves relevant sections and provides context-aware answers
    • Ask follow-up questions for deeper insights
    • Examples:
      • "What are the main revenue streams?"
      • "What risks are mentioned in this document?"
      • "What is the projected growth rate?"
  2. Export Results

    • Click "Export as PDF" button
    • Save formatted summary to your computer
  3. View History

    • All previous summaries in chat-like history
    • Scroll through past analyses
    • Re-explore or export any summary

Performance Tips

  • Large PDFs: For PDFs > 100 pages, only first 100 pages are processed
  • Best Results: Clearly formatted financial documents with structured text
  • Caching: First analysis processes document, subsequent sections are instant
  • Temperature Setting: Default 0.2 ensures consistent, focused summaries

Environment Variables

Configure the application behavior using environment variables in backend/.env:

Variable Description Default Type
OPENAI_API_KEY OpenAI API key for LLM access (REQUIRED) - string
OPENAI_MODEL LLM model used for summarization and analysis gpt-4o-mini string
LLM_TEMPERATURE Model creativity level (0.0–2.0, lower = deterministic) 0.2 float
LLM_MAX_TOKENS Maximum tokens per response 900 integer
RAG_ENABLED Enable document-aware RAG flow true boolean
RAG_MODE RAG strategy used (doc_id = cached full-document context) doc_id string
RAG_TOP_K Number of top relevant context segments used internally 5 integer
EMBEDDING_MODEL Embedding model for internal relevance scoring (if applicable) text-embedding-3-small string
VECTOR_RESET_ON_UPLOAD Clear vector dataset when a new document is uploaded true boolean
VECTOR_RESET_ON_REFRESH Clear vector dataset when the client refreshes the site true boolean
CACHE_MAX_DOCS Maximum documents stored in memory cache 25 integer
CACHE_TTL_SECONDS Cache time-to-live in seconds 3600 integer
SERVICE_PORT Backend API port 8000 integer
LOG_LEVEL Logging level (DEBUG, INFO, WARNING, ERROR) INFO string
CORS_ORIGINS Allowed CORS origins (comma-separated or *) * string
MAX_PDF_PAGES Maximum PDF pages to process 100 integer
MAX_PDF_SIZE Maximum PDF file size in bytes 52428800 integer

Note:
This blueprint uses a document-cached RAG approach without static chunking.

  • The full extracted document is cached by doc_id for fast section switching.
  • When a new document is uploaded or the client is refreshed, the in-memory vector dataset is automatically cleared to prevent context leakage across documents.

Inference Benchmarks

The table below compares inference performance across different providers, deployment modes, and hardware profiles using a standardized FinSights document analysis workload (averaged over 3 runs of the full pipeline: initial summary, overall summary, section summary, RAG indexing, and RAG chat).

Provider LLM Model LLM Context Embedding Model Embed Context Deployment Avg Input Tokens/Gen Avg Output Tokens/Gen Avg Total Tokens/Gen P50 Latency (ms) P95 Latency (ms) Throughput (req/s) Hardware
vLLM Llama-3.2-3B-Instruct 4,096 BAAI/bge-base-en-v1.5 512 Local 441 127 568 15,283 59,437 0.050 Apple Silicon (Metal) (MacBook Pro M4)
Intel OPEA EI Llama-3.2-3B-Instruct 8,192 BAAI/bge-base-en-v1.5 512 Enterprise (On-Prem) 444 122 566 4,393 23,270 0.133 CPU-only (Xeon)
OpenAI (Cloud) gpt-4o-mini 128,000 text-embedding-3-small 8,191 API (Cloud) 411 133 544 2,772 11,906 0.221 N/A

Notes:

  • All benchmarks use the same FinSights document analysis pipeline. Token counts may vary slightly per run due to non-deterministic model output.
  • vLLM on Apple Silicon uses Metal (MPS) GPU acceleration for the LLM and CPU-based vLLM for the BERT embedding model (BAAI/bge-base-en-v1.5).
  • Intel OPEA Enterprise Inference runs on Intel Xeon CPUs without GPU acceleration.
  • Llama 3.2 3B natively supports 128K context, but vLLM local was benchmarked with --max-model-len 4096 due to Apple Silicon memory constraints. EI is configured with 8,192 token context.
  • Each benchmark run exercises 5 generations: initial summary, overall summary, section summary, RAG indexing (embeddings), and RAG chat.
  • Langfuse tracing is used for full observability of each benchmark run.

Model Capabilities

Meta Llama 3.2 3B Instruct

A 3-billion-parameter open-weight model from Meta's Llama family, optimized for instruction-following and on-device deployment.

Attribute Details
Parameters 3.21B
Architecture Transformer with Grouped Query Attention (GQA) — 28 layers, 24 Q-heads / 8 KV-heads
Context Window 128,000 tokens
Instruction Tuning RLHF + supervised fine-tuning on instruction data
Multilingual English, German, French, Italian, Portuguese, Hindi, Spanish, Thai
Quantization Formats GGUF, AWQ, GPTQ, MLX (4-bit)
Inference Runtimes vLLM, Ollama, llama.cpp, LMStudio, SGLang, TGI
License Llama 3.2 Community License (permissive, with acceptable use policy)
Deployment Local, on-prem, air-gapped, cloud — full data sovereignty

BAAI/bge-base-en-v1.5

A 110M-parameter BERT-based embedding model from BAAI, widely used for retrieval and RAG pipelines.

Attribute Details
Parameters 109M
Architecture BERT base (12 layers, 768 hidden dim)
Embedding Dimensions 768
Max Sequence Length 512 tokens
MTEB Retrieval Score 53.25 (competitive with models 3x its size)
Inference Runtimes sentence-transformers, vLLM (CPU), ONNX, TGI
License MIT
Deployment Local, on-prem, air-gapped — lightweight enough for CPU

OpenAI text-embedding-3-small

OpenAI's compact embedding model, used for RAG indexing and retrieval when running with the OpenAI provider.

Attribute Details
Parameters Not publicly disclosed
Embedding Dimensions 1,536 (default) or 512 (with dimensions parameter)
Max Sequence Length 8,191 tokens
MTEB Retrieval Score 44.0
Pricing $0.02 / 1M tokens
License Proprietary (OpenAI Terms of Use)
Deployment Cloud-only — OpenAI API or Azure OpenAI Service

GPT-4o-mini

OpenAI's cost-efficient multimodal model, accessible exclusively via cloud API.

Attribute Details
Parameters Not publicly disclosed
Architecture Multimodal Transformer (text + image input, text output)
Context Window 128,000 tokens input / 16,384 tokens max output
Tool / Function Calling Supported; parallel function calling
Structured Output JSON mode and strict JSON schema adherence supported
Multilingual Broad multilingual support
Pricing $0.15 / 1M input tokens, $0.60 / 1M output tokens (Batch API: 50% discount)
Fine-Tuning Supervised fine-tuning via OpenAI API
License Proprietary (OpenAI Terms of Use)
Deployment Cloud-only — OpenAI API or Azure OpenAI Service. No self-hosted or on-prem option

Comparison Summary

Capability Llama 3.2 3B Instruct GPT-4o-mini
Financial document analysis Yes Yes
RAG-based document chat Yes Yes
On-prem / air-gapped deployment Yes No
Data sovereignty Full (weights run locally) No (data sent to cloud API)
Open weights Yes (Llama Community License) No (proprietary)
Custom fine-tuning Full fine-tuning + LoRA adapters Supervised fine-tuning (API only)
Multimodal (image input) No Yes
Native context window 128K 128K

Both models support financial document analysis and RAG-based chat. However, only Llama 3.2 offers open weights, data sovereignty, and local deployment flexibility — making it suitable for air-gapped, regulated, or cost-sensitive environments. GPT-4o-mini offers lower latency and higher throughput via OpenAI's cloud infrastructure, with added multimodal capabilities.


Technology Stack

Backend

  • Framework: FastAPI (Python web framework)
  • AI / LLM: OpenAI GPT-4o-mini (document-aware analysis)
  • RAG Architecture: In-memory, document-cached RAG using doc_id (no static chunking)
  • Embeddings: OpenAI embeddings (used internally for relevance scoring when required)
  • Document Processing:
    • pypdf (PDF text extraction)
    • python-docx (DOCX processing)
    • pdf2image + pytesseract (OCR for image-based PDFs)
  • State Management:
    • In-memory document cache
    • Ephemeral vector dataset (cleared on new upload or client refresh)
  • Async Server: Uvicorn (ASGI)
  • Config Management: python-dotenv for environment variables

Frontend

  • Framework: React 18 with React Router
  • Build Tool: Vite (fast bundler)
  • Styling: Tailwind CSS + PostCSS
  • UI Components: Lucide React icons
  • RAG UX:
    • Dynamic, document-driven section chips
    • Instant section switching using cached context
  • Export: jsPDF for PDF generation
  • Notifications: react-hot-toast

Troubleshooting

Encountering issues? Check the following:

Common Issues

Issue: API not responding

# Check service health
curl http://localhost:8000/health

# View backend logs
docker compose logs backend

Issue: OpenAI API errors

  • Verify OPENAI_API_KEY is correct and has credits
  • Check API key permissions in OpenAI dashboard
  • Ensure model gpt-4o-mini is available in your account

Issue: PDF upload fails

  • Max file size: 50MB
  • Max pages: 100 pages
  • Supported formats: PDF, DOCX
  • Ensure file is not corrupted

Issue: Frontend can't connect to API

  • Verify backend is running: docker compose ps
  • Check CORS settings in .env
  • Ensure both services are on same network

Debug Mode

Enable debug logging:

# Update .env
LOG_LEVEL=DEBUG

# Restart services
docker compose restart backend
docker compose logs -f backend

License

This project is licensed under our LICENSE file for details.


Disclaimer

FinSights is provided as-is for analysis and informational purposes. While we strive for accuracy:

  • Always verify AI-generated summaries against original documents
  • Do not rely solely on AI summaries for investment decisions
  • Consult financial advisors for investment guidance
  • Test thoroughly before using in production environments

For full disclaimer details, see DISCLAIMER.md


About

AI powered financial document analysis platform that extracts insights from reports and generates structured summaries and answers using LLMs

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors