Skip to content

aryan17032002/sample-md

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊 FinSights - Financial Document Summarization AI Blueprint

AI-powered financial document analysis with intelligent section-based summarization using OpenAI's GPT models.


📋 Table of Contents


Project Overview

FinSights is an intelligent financial document analysis platform that processes text and financial documents (PDF, DOCX) to generate comprehensive summaries organized into six key financial sections:

  • Financial Performance - Narrative overview with key financial numbers
  • Key Metrics - Essential KPIs and financial indicators
  • Risks - Identified risks and challenges
  • Opportunities - Growth and business opportunities
  • Outlook / Guidance - Forward-looking statements and guidance
  • Other Important Highlights - Notable items, dividends, balance sheet items, and auditor notes

Users can paste text directly or upload documents, and the system intelligently extracts and summarizes content using OpenAI's GPT-4o-mini model. The backend caches extracted documents, allowing users to explore different sections without re-uploading.


Architecture

The application follows a modular microservices architecture with specialized components for document processing and AI summarization:

graph LR

  %% ====== FRONTEND ======
  subgraph FE[Frontend]
    A[React Web UI<br/>Port 5173]
  end

  %% ====== BACKEND ======
  subgraph BE[Backend - FastAPI<br/>Port 8000]
    B[API Router]
    E[PDF Service]
    D[LLM Service]
    K[PDF Generator]
    G[In-Memory Cache<br/>TTL 1 hour]
    H[Session Summary History]
  end

  %% ====== EXTERNAL ======
  subgraph EXT[External]
    F[OpenAI API<br/>gpt-4o-mini]
  end

  %% ====== CONNECTIONS (ARCHITECTURE) ======
  A -->|HTTP| B

  B --> E
  B --> D
  B --> K

  E -->|Extracted Text| G
  D -->|Read Cached Text| G
  D -->|Write Summary| H
  K -->|Read History| H

  D -->|API Call| F
  F -->|Response| D

  B -->|JSON| A
  K -->|PDF File| A

  %% ====== STYLES ======
  style A fill:#e1f5ff
  style B fill:#fff4e1
  style D fill:#ffe1f5
  style E fill:#ffe1f5
  style K fill:#ffe1f5
  style F fill:#fff3cd
  style G fill:#e8f5e9
  style H fill:#e8f5e9

Loading

Features

Backend

  • Multiple input format support (text, PDF, DOCX)
  • PDF text extraction with OCR support for image-based PDFs using pytesseract
  • DOCX document processing with python-docx
  • AI-powered summarization using OpenAI's GPT-4o-mini model
  • Intelligent section-based summarization with context-aware analysis
  • Smart document caching system (1-hour TTL, up to 25 documents) to avoid reprocessing
  • File validation and size limits (PDF/DOCX: 50 MB)
  • Page limit protection (max 100 pages per PDF) to prevent timeouts
  • Streaming response support for optimal performance
  • CORS enabled for web integration
  • Comprehensive error handling and logging
  • Health check endpoints
  • Modular architecture (routes + services + LLM service + PDF service)

Frontend

  • Clean, intuitive interface with tab-based input selection (Text / File)
  • Drag-and-drop file upload capability
  • Real-time summary display with clickable financial section chips
  • Chat-like history view of all summaries
  • PDF export functionality for generated summaries
  • Mobile-responsive design with Tailwind CSS
  • Built with Vite for fast development and hot module replacement

Project Structure

FinSights/
├── backend/
│   ├── api/
│   │   └── routes.py          # API endpoints
│   ├── services/
│   │   ├── llm_service.py     # OpenAI integration
│   │   └── pdf_service.py     # Document processing
│   ├── server.py              # FastAPI app
│   ├── config.py              # Configuration
│   ├── requirements.txt        # Python dependencies
│   └── Dockerfile             # Backend container
├── frontend/
│   ├── src/
│   │   ├── pages/             # React pages
│   │   ├── components/        # React components
│   │   ├── services/          # API client
│   │   └── App.jsx            # Main app
│   ├── package.json           # npm dependencies
│   └── Dockerfile             # Frontend container
├── docker-compose.yml         # Service orchestration
└── README.md                  # This file

Get Started

Prerequisites

Before you begin, ensure you have the following installed and configured:

Verify Installation

# Check Docker
docker --version
docker compose version

# Verify Docker is running
docker ps

Quick Start

1. Clone or Navigate to Repository

# If cloning:
git clone <your-repo-url>
cd FinSights

2. Configure Environment Variables

Create backend/.env with your OpenAI credentials:

cat > backend/.env << EOF
# OpenAI Configuration (REQUIRED)
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4o-mini

# LLM Configuration
LLM_TEMPERATURE=0.2
LLM_MAX_TOKENS=900

# Caching Configuration
CACHE_MAX_DOCS=25
CACHE_TTL_SECONDS=3600

# Service Configuration
SERVICE_PORT=8000
LOG_LEVEL=INFO

# CORS Settings
CORS_ORIGINS=*
EOF

Replace your_openai_api_key_here with your actual OpenAI API key.

3. Launch the Application

Option A: Standard Deployment

# Build and start all services
docker compose up --build

# Or run in detached mode (background)
docker compose up -d --build

Option B: View Logs While Running

# All services
docker compose up --build

# In another terminal, view specific logs
docker compose logs -f backend
docker compose logs -f frontend

4. Access the Application

Once containers are running, access:

5. Verify Services

# Check health status
curl http://localhost:8000/health

# View running containers
docker compose ps

6. Stop the Application

docker compose down

Usage Guide

Using FinSights

  1. Open the Application

    • Navigate to http://localhost:5173
  2. Choose Input Method

    • Paste Text Tab: Copy/paste financial document text directly
    • Upload File Tab: Upload PDF or DOCX files (max 50MB)
  3. Generate Summary

    • Click "Summarize" button
    • Wait for AI processing
    • View comprehensive financial summary
  4. Explore Financial Sections

    • Click any section chip to view detailed analysis:
      • Financial Performance
      • Key Metrics
      • Risks
      • Opportunities
      • Outlook / Guidance
      • Other Important Highlights
    • Switching sections is instant (cached document)
  5. Export Results

    • Click "Export as PDF" button
    • Save formatted summary to your computer
  6. View History

    • All previous summaries in chat-like history
    • Scroll through past analyses
    • Re-explore or export any summary

Performance Tips

  • Large PDFs: For PDFs > 100 pages, only first 100 pages are processed
  • Best Results: Clearly formatted financial documents with structured text
  • Caching: First analysis processes document, subsequent sections are instant
  • Temperature Setting: Default 0.2 ensures consistent, focused summaries

Environment Variables

Configure the application behavior using environment variables in backend/.env:

Variable Description Default Type
OPENAI_API_KEY OpenAI API key for GPT access (REQUIRED) - string
OPENAI_MODEL GPT model version to use gpt-4o-mini string
LLM_TEMPERATURE Model creativity level (0.0-2.0, lower = deterministic) 0.2 float
LLM_MAX_TOKENS Maximum tokens per summary response 900 integer
CACHE_MAX_DOCS Maximum documents in memory cache 25 integer
CACHE_TTL_SECONDS Cache time-to-live in seconds 3600 integer
SERVICE_PORT Backend API port 8000 integer
LOG_LEVEL Logging level (DEBUG, INFO, WARNING, ERROR) INFO string
CORS_ORIGINS Allowed CORS origins (comma-separated or *) * string
MAX_PDF_PAGES Maximum PDF pages to process 100 integer
MAX_PDF_SIZE Maximum PDF file size in bytes 52428800 integer

Configuration Examples

Production Setup

OPENAI_API_KEY=sk-your-production-key
OPENAI_MODEL=gpt-4o-mini
LLM_TEMPERATURE=0.1
LOG_LEVEL=WARNING
CACHE_MAX_DOCS=50
CORS_ORIGINS=https://yourdomain.com,https://app.yourdomain.com

Development Setup

OPENAI_API_KEY=sk-your-dev-key
OPENAI_MODEL=gpt-4o-mini
LLM_TEMPERATURE=0.5
LOG_LEVEL=DEBUG
CACHE_MAX_DOCS=10


Technology Stack

Backend

  • Framework: FastAPI (Python web framework)
  • AI/LLM: OpenAI GPT-4o-mini API
  • Document Processing:
    • pypdf (PDF text extraction)
    • python-docx (DOCX processing)
    • pdf2image + pytesseract (OCR for image-based PDFs)
  • Async: Uvicorn ASGI server
  • Config: Python-dotenv for environment management

Frontend

  • Framework: React 18 with React Router
  • Build Tool: Vite (fast bundler)
  • Styling: Tailwind CSS + PostCSS
  • UI Components: Lucide React icons
  • Export: jsPDF for PDF generation
  • Notifications: react-hot-toast

DevOps

  • Containerization: Docker + Docker Compose
  • Architecture: Microservices with isolated containers
  • Networking: Docker bridge network

Troubleshooting

Encountering issues? Check the following:

Common Issues

Issue: API not responding

# Check service health
curl http://localhost:8000/health

# View backend logs
docker compose logs backend

Issue: OpenAI API errors

  • Verify OPENAI_API_KEY is correct and has credits
  • Check API key permissions in OpenAI dashboard
  • Ensure model gpt-4o-mini is available in your account

Issue: PDF upload fails

  • Max file size: 50MB
  • Max pages: 100 pages
  • Supported formats: PDF, DOCX
  • Ensure file is not corrupted

Issue: Frontend can't connect to API

  • Verify backend is running: docker compose ps
  • Check CORS settings in .env
  • Ensure both services are on same network

Debug Mode

Enable debug logging:

# Update .env
LOG_LEVEL=DEBUG

# Restart services
docker compose restart backend
docker compose logs -f backend

License

This project is licensed under the MIT License - see LICENSE file for details.

For third-party licenses, see LICENSE-3rd-party.txt


Disclaimer

FinSights is provided as-is for analysis and informational purposes. While we strive for accuracy:

  • Always verify AI-generated summaries against original documents
  • Do not rely solely on AI summaries for investment decisions
  • Consult financial advisors for investment guidance
  • Test thoroughly before using in production environments

Support & Feedback

Have suggestions or encountered an issue?


Back to Top

About

No description, website, or topics provided.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors