AWS Doc QA

AI-powered Document Q&A system. Upload PDFs, extract text via AWS Textract, and ask questions powered by Claude (Haiku 4.5) on AWS Bedrock.

Architecture

                         ┌──────────────────┐
                         │   API Gateway    │
                         │ POST /upload     │
                         │ POST /query      │
                         │ GET  /documents  │
                         └────────┬─────────┘
                                  │
              ┌───────────────────┼───────────────────┐
              ▼                   ▼                    ▼
     ┌────────────────┐  ┌───────────────┐  ┌─────────────────┐
     │ upload_handler │  │ query_handler │  │  query_handler  │
     │    Lambda      │  │   Lambda      │  │  (GET status)   │
     └───────┬────────┘  └───────┬───────┘  └────────┬────────┘
             │                   │                    │
             ▼                   ▼                    ▼
     ┌──────────────┐    ┌─────────────┐     ┌────────────────┐
     │  S3 (PDFs)   │    │ AWS Bedrock │     │   DynamoDB     │
     │  presigned   │    │  (Claude)   │     │  (status)      │
     └──────┬───────┘    └─────────────┘     └────────────────┘
            │ S3 Event
            ▼
     ┌────────────────────┐
     │ document_processor │
     │      Lambda        │
     └────────┬───────────┘
              │
              ▼
     ┌────────────────┐
     │  AWS Textract  │
     │  (async job)   │
     └────────┬───────┘
              │ SNS notify
              ▼
     ┌────────────────────┐
     │ textract_callback  │
     │      Lambda        │
     └────────┬───────────┘
              │
              ▼
     ┌────────────────┐
     │   DynamoDB     │◀──── (read by query_handler)
     │  (documents)   │
     └────────────────┘

Flow

Frontend sends POST /upload with filename → receives presigned S3 URL + document ID
Frontend uploads PDF directly to S3 via presigned URL
S3 event triggers document_processor Lambda → starts async Textract job
Textract extracts text and notifies via SNS
textract_callback Lambda stores extracted text in DynamoDB (status: READY)
Frontend polls GET /documents?document_id=... until status is READY
User asks a question via POST /query → query_handler retrieves text from DynamoDB, sends to Claude via Bedrock, returns the answer

API Endpoints

`POST /upload`

Generate a presigned S3 URL for PDF upload.

curl -X POST https://<API_ID>.execute-api.eu-central-1.amazonaws.com/prod/upload \
  -H "Content-Type: application/json" \
  -d '{"filename": "document.pdf"}'

Response:

{"upload_url": "https://s3.amazonaws.com/...", "document_id": "<uuid>_document.pdf"}

Then upload the file using the presigned URL:

curl -X PUT "<upload_url>" \
  -H "Content-Type: application/pdf" \
  --data-binary @document.pdf

`GET /documents`

Check document processing status.

curl "https://<API_ID>.execute-api.eu-central-1.amazonaws.com/prod/documents?document_id=<document_id>"

Response:

{"document_id": "...", "status": "PROCESSING|READY|FAILED"}

`POST /query`

Ask a question about a processed document.

curl -X POST https://<API_ID>.execute-api.eu-central-1.amazonaws.com/prod/query \
  -H "Content-Type: application/json" \
  -d '{"document_id": "<document_id>", "question": "What is this document about?"}'

Response:

{"document_id": "...", "question": "...", "answer": "..."}

Frontend

Single-page application built with vanilla HTML/CSS/JavaScript (no build step required).

Two-panel layout: upload & questions on the left, answers on the right
Drag-and-drop PDF upload with presigned URL flow
Real-time document status polling during Textract processing
Animated "thinking" state during Claude inference
Response time display
Dark/light theme support
Mobile-responsive design

Serve frontend/index.html with any static file server or host on S3/CloudFront.

Prerequisites

AWS CLI configured with credentials
Python 3.12+
Node.js (for AWS CDK CLI)
AWS CDK CLI: npm install -g aws-cdk
Enable Claude model access in AWS Bedrock console (eu-central-1)

Setup

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Linux/Mac
# .venv\Scripts\activate   # Windows

# Install dependencies
pip install -r requirements-dev.txt

# Build Lambda layer
pip install -r requirements.txt -t lambdas/layer/python/

# Bootstrap CDK (first time only)
cd cdk
cdk bootstrap

# Deploy
cdk deploy

Testing

pytest tests/

Project Structure

aws-doc-qa/
├── frontend/
│   └── index.html              # Single-page web UI
├── lambdas/
│   ├── document_processor/     # S3 trigger → Textract
│   ├── textract_callback/      # Textract result → DynamoDB
│   ├── query_handler/          # API Gateway → Bedrock Claude + status polling
│   ├── upload_handler/         # Presigned S3 URL generation
│   └── layer/                  # Shared Lambda layer
├── cdk/                        # AWS CDK infrastructure
├── tests/                      # Unit tests
├── requirements.txt            # Lambda runtime dependencies
└── requirements-dev.txt        # Dev/CDK dependencies

AWS Services

Service	Purpose
S3	PDF storage (eu-central-1)
Lambda	Python 3.12, 4 functions
Textract	Async document text extraction
Bedrock	Claude Haiku 4.5 for Q&A (cross-region inference)
DynamoDB	Document metadata + extracted text
API Gateway	REST API with CORS
SNS	Textract job completion notifications

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AWS Doc QA

Architecture

Flow

API Endpoints

`POST /upload`

`GET /documents`

`POST /query`

Frontend

Prerequisites

Setup

Testing

Project Structure

AWS Services

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
cdk		cdk
frontend		frontend
lambdas		lambdas
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AWS Doc QA

Architecture

Flow

API Endpoints

POST /upload

GET /documents

POST /query

Frontend

Prerequisites

Setup

Testing

Project Structure

AWS Services

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /upload`

`GET /documents`

`POST /query`

Packages