Skip to content

Neurl-LLC/deepgram-54

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Serverless Transcription (S3 ➜ SQS ➜ Lambda ➜ Deepgram)

Accurate, scalable, pay-per-use speech-to-text app. Drop audio into S3, get transcripts back in seconds—no servers to babysit.

This project contains the code to Deepgram's technical guide that explains how to build serverless transctription apps with a Lambda function that transcribes audio uploaded to S3 using Deepgram, writing JSON + TXT transcripts back to S3.

Overview

The project wires Amazon S3 event notifications to an SQS that trihhers an AWS Lambda function that calls Deepgram’s REST API to transcribe audio. The Lambda writes two outputs to S3:

  • transcripts/.json – full Deepgram response
  • transcripts/.txt – best-guess transcript (plain text)

By default, Lambda submits a presigned S3 URL to Deepgram so your audio never leaves your bucket except via a signed, time-limited link.

Architecture

Serverless transcription architecture

What You Get

  • Handler code (Python 3.11) ready for Lambda
  • Reference IAM policies for least-privilege access
  • Docs for console setup, optional SQS buffering, testing, logs, and alarms
  • Troubleshooting playbook for the common gotchas (403s, timeouts, etc.)

Prerequisites

  • AWS account with permissions for S3, Lambda, IAM (and SQS if using the buffer)
  • Deepgram account + API key -> 200 USD credits to get started building -> https://deepgram.com/product/speech-to-text
  • A small test audio file (.mp3, .wav, or .m4a)
  • Lambda with internet egress (keep it out of a private VPC or add NAT)

Repo Structure

.
├─ lambda/
│  └─ lambda_function.py        # Lambda handler
├─ policies/                    # Reference IAM policies for the Lambda execution role
│  ├─ lambda-transcriber-s3.json   # S3 read/write for Lambda
│  └─ lambda-sqs-consumer.json    # SQS consumer perms
└─ README.md

Why s3:GetObject in the IAM policy on input? Needed for presigned URL signing (S3 checks the signer’s IAM). Why s3:GetObject on transcripts? For the idempotency HeadObject check.

Environment variables

Set in Lambda → Configuration → Environment variables:

  • DEEPGRAM_API_KEY (or) DEEPGRAM_SECRET_NAME
  • INPUT_PREFIX (default: audio-incoming/)
  • TRANSCRIPTS_PREFIX (default: transcripts/)
  • Optional: DG_MODEL (e.g., nova-3), DG_LANGUAGE (e.g., en)
  • Optional (debug): SKIP_HEAD_CHECK=true while IAM is being finalized

Steps to deploy (console)

  1. Create an S3 bucket with prefixes: audio-incoming/, transcripts/.
  2. Create a Lambda function (Python 3.11, arm64). Add a requests layer.
  3. Set the environment variables above.
  4. Attach the policy in policies/lambda-transcriber-s3.json to the Lambda execution role.
  5. Add S3 Event Notification: ObjectCreated on prefix audio-incoming/ ➜ Lambda.
    • Or wire S3 ➜ SQS ➜ Lambda using policies/lambda-sqs-consumer.json.
  6. Upload a small .mp3/.wav to audio-incoming/ and check CloudWatch logs.
  7. Verify outputs under transcripts/ (.json + .txt).

Test the Function

A) End-to-end (recommended)

aws s3 cp ./sample.mp3 s3://YOUR_BUCKET/audio-incoming/sample.mp3

Then check:

CloudWatch Logs → /aws/lambda/ → latest stream

S3 → transcripts/ should contain sample.json and sample.txt

B) Lambda “Test” button payloads

  • Direct S3 event
{ "Records":[{ "s3": { "bucket": { "name": "YOUR_BUCKET" }, "object": { "key": "audio-incoming/hello.mp3" } } }] }
  • SQS carrying S3 event
{ "Records":[{ "messageId":"1", "body":"{\"Records\":[{\"s3\":{\"bucket\":{\"name\":\"YOUR_BUCKET\"},\"object\":{\"key\":\"audio-incoming/hello.mp3\"}}}]}" }] }

Monitor & Operate

Key metrics

  • Lambda: Invocations, Errors, Throttles, Duration (p95)
  • Lambda (SQS): IteratorAge
  • SQS: ApproximateNumberOfMessagesVisible, ApproximateAgeOfOldestMessage
  • DLQ: Visible messages

Log Insights (latency)

fields @timestamp, @message
| parse @message /"dg_request_ms":\s*(\d+)/
| filter ispresent(@1)
| stats count() as requests, avg(@1) as avg_ms, pct(@1,95) as p95_ms by bin(5m)
| sort @timestamp desc

Suggested alarms

  • Lambda Errors ≥ 1 for 2×5-min
  • Lambda p95 Duration > 5s for 2×15-min
  • (SQS) IteratorAge > 60s for 2×5-min
  • (DLQ) Messages ≥ 1 (immediate)

Troubleshooting

Deepgram 400 REMOTE_CONTENT_ERROR with 403 in message

  • Your presigned URL fetch was denied. Make sure the Lambda execution role has s3:GetObject on the input prefix (audio-incoming/*).
  • Avoid bucket policies that Deny GetObject based on VPC endpoint/IP—these block external fetchers like Deepgram.

HeadObject 403 during idempotency check

  • Add s3:GetObject on the transcripts/ prefix. Temporarily set SKIP_HEAD_CHECK=true to unblock.

ModuleNotFoundError: requests

  • Add a requests layer or vendor the dep in your deployment.

Lambda not firing

  • Check S3 event notification prefix/suffix and that the trigger is attached (or SQS path is wired).

SQS trigger creation error: visibility timeout < function timeout

  • Increase queue visibility (e.g., 120–180 s) or shorten Lambda timeout.

In a VPC without internet?

  • Add a NAT Gateway or keep the function outside the VPC so it can reach Deepgram.

Cost & Limits

  • Lambda: submit path is ~sub-second, so cost is negligible vs. transcription time.
  • Deepgram: billed by audio duration / model—dominant component.
  • SQS (optional): per-request; inexpensive for typical workloads.
  • Use URL fetch (this guide) so Lambda doesn’t read/stream full files. For very long files, consider Deepgram async + webhooks.

Security Notes

  • No bucket policy required for presigned URLs (Block Public Access can remain ON).
  • IAM on the Lambda role is what authorizes presigned access.
  • If you enable SSE-KMS on inputs/outputs, also grant the role KMS permissions (Decrypt/GenerateDataKey for reads, Encrypt/GenerateDataKey for writes).
  • Do not log presigned URLs or secrets.

Roadmap/Extensions

  • Switch to Deepgram async + webhook for multi-minute files or heavy batch jobs.
  • Add S3 lifecycle (transition to IA/Glacier) and date partitioning for transcripts.
  • Push transcripts and metadata into DynamoDB/OpenSearch for search/analytics.
  • Package IaC (SAM/Terraform) and a CI pipeline.
  • Add diarization, timestamps, and downstream summarization.

About

Technical guide on how to create a scalable, cost-efficient serverless speech-to-text transcription pipeline using AWS Lambda and Deepgram’s REST API. Perfect for teams needing automatic transcription of audio uploaded to AWS S3, avoiding server maintenance and idle compute costs. Deepgram’s STT API offers accurate, low-latency transcriptions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages