AI-powered Document Q&A system. Upload PDFs, extract text via AWS Textract, and ask questions powered by Claude (Haiku 4.5) on AWS Bedrock.
┌──────────────────┐
│ API Gateway │
│ POST /upload │
│ POST /query │
│ GET /documents │
└────────┬─────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌────────────────┐ ┌───────────────┐ ┌─────────────────┐
│ upload_handler │ │ query_handler │ │ query_handler │
│ Lambda │ │ Lambda │ │ (GET status) │
└───────┬────────┘ └───────┬───────┘ └────────┬────────┘
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌─────────────┐ ┌────────────────┐
│ S3 (PDFs) │ │ AWS Bedrock │ │ DynamoDB │
│ presigned │ │ (Claude) │ │ (status) │
└──────┬───────┘ └─────────────┘ └────────────────┘
│ S3 Event
▼
┌────────────────────┐
│ document_processor │
│ Lambda │
└────────┬───────────┘
│
▼
┌────────────────┐
│ AWS Textract │
│ (async job) │
└────────┬───────┘
│ SNS notify
▼
┌────────────────────┐
│ textract_callback │
│ Lambda │
└────────┬───────────┘
│
▼
┌────────────────┐
│ DynamoDB │◀──── (read by query_handler)
│ (documents) │
└────────────────┘
- Frontend sends
POST /uploadwith filename → receives presigned S3 URL + document ID - Frontend uploads PDF directly to S3 via presigned URL
- S3 event triggers
document_processorLambda → starts async Textract job - Textract extracts text and notifies via SNS
textract_callbackLambda stores extracted text in DynamoDB (status: READY)- Frontend polls
GET /documents?document_id=...until status is READY - User asks a question via
POST /query→query_handlerretrieves text from DynamoDB, sends to Claude via Bedrock, returns the answer
Generate a presigned S3 URL for PDF upload.
curl -X POST https://<API_ID>.execute-api.eu-central-1.amazonaws.com/prod/upload \
-H "Content-Type: application/json" \
-d '{"filename": "document.pdf"}'Response:
{"upload_url": "https://s3.amazonaws.com/...", "document_id": "<uuid>_document.pdf"}Then upload the file using the presigned URL:
curl -X PUT "<upload_url>" \
-H "Content-Type: application/pdf" \
--data-binary @document.pdfCheck document processing status.
curl "https://<API_ID>.execute-api.eu-central-1.amazonaws.com/prod/documents?document_id=<document_id>"Response:
{"document_id": "...", "status": "PROCESSING|READY|FAILED"}Ask a question about a processed document.
curl -X POST https://<API_ID>.execute-api.eu-central-1.amazonaws.com/prod/query \
-H "Content-Type: application/json" \
-d '{"document_id": "<document_id>", "question": "What is this document about?"}'Response:
{"document_id": "...", "question": "...", "answer": "..."}Single-page application built with vanilla HTML/CSS/JavaScript (no build step required).
- Two-panel layout: upload & questions on the left, answers on the right
- Drag-and-drop PDF upload with presigned URL flow
- Real-time document status polling during Textract processing
- Animated "thinking" state during Claude inference
- Response time display
- Dark/light theme support
- Mobile-responsive design
Serve frontend/index.html with any static file server or host on S3/CloudFront.
- AWS CLI configured with credentials
- Python 3.12+
- Node.js (for AWS CDK CLI)
- AWS CDK CLI:
npm install -g aws-cdk - Enable Claude model access in AWS Bedrock console (eu-central-1)
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/Mac
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements-dev.txt
# Build Lambda layer
pip install -r requirements.txt -t lambdas/layer/python/
# Bootstrap CDK (first time only)
cd cdk
cdk bootstrap
# Deploy
cdk deploypytest tests/aws-doc-qa/
├── frontend/
│ └── index.html # Single-page web UI
├── lambdas/
│ ├── document_processor/ # S3 trigger → Textract
│ ├── textract_callback/ # Textract result → DynamoDB
│ ├── query_handler/ # API Gateway → Bedrock Claude + status polling
│ ├── upload_handler/ # Presigned S3 URL generation
│ └── layer/ # Shared Lambda layer
├── cdk/ # AWS CDK infrastructure
├── tests/ # Unit tests
├── requirements.txt # Lambda runtime dependencies
└── requirements-dev.txt # Dev/CDK dependencies
| Service | Purpose |
|---|---|
| S3 | PDF storage (eu-central-1) |
| Lambda | Python 3.12, 4 functions |
| Textract | Async document text extraction |
| Bedrock | Claude Haiku 4.5 for Q&A (cross-region inference) |
| DynamoDB | Document metadata + extracted text |
| API Gateway | REST API with CORS |
| SNS | Textract job completion notifications |