Skip to content

Latest commit

 

History

History
398 lines (298 loc) · 12.2 KB

File metadata and controls

398 lines (298 loc) · 12.2 KB

Lambda — Serverless Compute

What Is It?

Lambda lets you run code without managing servers. You upload a function, define a trigger, and AWS handles everything: OS patching, scaling, availability. You pay only for the milliseconds your code actually runs.

Real-World: A ride-sharing app uses Lambda to process trip data. Zero requests at 3am = zero cost. 10,000 simultaneous trip completions at rush hour = Lambda automatically runs 10,000 parallel invocations.


Key Configuration Numbers (Memorize These)

Property Value
Max timeout 15 minutes
Memory 128MB – 10GB (CPU scales with memory)
Ephemeral storage (/tmp) 512MB – 10GB
Deployment package (zip) 50MB zipped, 250MB unzipped
Container image Up to 10GB
Concurrent executions (default) 1,000 per account per region
Max concurrency (reserved) Set per function
Cold start timeout 100ms – few seconds (varies)
Environment variables 4KB total

Invocation Types

Synchronous (RequestResponse)

Caller waits for response. If Lambda fails, the caller gets the error immediately.

API Gateway → Lambda → returns response to API Gateway → client

Examples: API Gateway, ALB, Cognito Triggers, SDK/CLI direct invocation.

# Direct synchronous invocation
response = lambda_client.invoke(
    FunctionName='my-function',
    InvocationType='RequestResponse',
    Payload=json.dumps({'key': 'value'})
)

Asynchronous (Event)

Caller fires and forgets. Lambda queues the event internally and retries on failure.

Retry behavior: 2 retries (3 total attempts). On failure → Dead Letter Queue (SQS or SNS).

Examples: S3 events, SNS, EventBridge, CloudWatch Events.

# Fire and forget
lambda_client.invoke(
    FunctionName='my-function',
    InvocationType='Event',  # Async
    Payload=json.dumps({'key': 'value'})
)

Poll-Based (Event Source Mapping)

Lambda polls a stream/queue and pulls batches of records.

Examples: SQS, Kinesis, DynamoDB Streams, MSK, MQ.

Lambda service polls SQS → gets batch of 10 messages → invokes Lambda

Retry: Lambda retries the entire batch until success or expiry.


Event Source Mapping (Poll-Based) Deep Dive

SQS + Lambda

SQS Queue → Lambda polls → batch (1-10,000 messages) → Lambda function
                          ↓ on failure
                    SQS Visibility Timeout expires → message back in queue
                    After maxReceiveCount → DLQ

Key settings:

  • BatchSize: 1-10,000 messages
  • BatchWindow: wait up to N seconds to fill batch (reduce cost)
  • MaximumConcurrency: limit Lambda concurrent instances polling this queue
  • Reports batch item failures: Lambda can return batchItemFailures to only retry failed messages

Kinesis + Lambda

Kinesis Stream → Lambda polls each shard → process in order per shard
  • One Lambda invocation per shard simultaneously
  • Retries on the same shard (blocking!) until success or expiry
  • Use Bisect On Error to split failing batches

DynamoDB Streams + Lambda

Same shard model as Kinesis. Use for change data capture patterns.


Concurrency Models

Reserved Concurrency

"This function gets N concurrent executions, no more, no less."

lambda_client.put_function_concurrency(
    FunctionName='payment-processor',
    ReservedConcurrentExecutions=100
)

Use for: Throttle Lambda to protect downstream services (e.g., RDS). Warning: Setting to 0 = function is disabled.

Provisioned Concurrency

Pre-warm N instances to eliminate cold starts.

lambda_client.put_provisioned_concurrency_config(
    FunctionName='checkout-api',
    Qualifier='prod',   # Must use version or alias
    ProvisionedConcurrentExecutions=50
)

Use for: User-facing APIs where cold start latency is unacceptable. Cost: You pay for provisioned concurrency whether you use it or not.

Concurrency Math

Account limit: 1,000 concurrent
Reserved for function A: 200
Reserved for function B: 300
Remaining unreserved pool: 500 (shared by all other functions)

If function C needs more than 500 → throttling (429 TooManyRequestsException).


Cold Starts — Understanding & Solving

Cold start phases:

  1. Download code package/container
  2. Start execution environment (OS, runtime)
  3. Run initialization code (outside handler)
  4. Run handler

Cold start impact by runtime (approx):

  • Python/Node.js: 100-500ms
  • Java/.NET: 500ms - 5 seconds
  • Container images: seconds

Minimizing Cold Starts

Move expensive init outside handler:

# GOOD - runs once per cold start, cached for warm invocations
import boto3
db_client = boto3.resource('dynamodb')
table = db_client.Table('orders')

def handler(event, context):
    # table is already initialized!
    result = table.get_item(Key={'id': event['id']})
    return result
# BAD - DB client created on every invocation
def handler(event, context):
    db_client = boto3.resource('dynamodb')  # cold start every time
    table = db_client.Table('orders')
    result = table.get_item(Key={'id': event['id']})
    return result

Use Provisioned Concurrency for latency-sensitive APIs.

Use Snap Start (Java) — Lambda takes a snapshot of initialized execution env, restores it instead of re-initializing.


Lambda Layers

Share code/dependencies across multiple functions:

Layer: common-utils.zip (boto3 extensions, shared models)
     ↑ used by ↑
func1   func2   func3   func4

Max 5 layers per function. Layers included in 250MB unzipped limit.


Lambda with VPC

By default, Lambda runs in AWS-managed VPC — no access to your private resources (RDS, ElastiCache).

Lambda in VPC:

Lambda → ENI (Elastic Network Interface) in your VPC → RDS private subnet

Trade-offs:

  • Adds ENI creation overhead (historical cold start issue — largely fixed with Hyperplane ENI)
  • Lambda still needs internet: attach NAT Gateway in private subnet
  • Lambda in VPC CAN'T access internet by default (no IGW attached to Lambda)
# Accessing RDS from Lambda in VPC
import psycopg2
import os

conn = psycopg2.connect(
    host=os.environ['DB_HOST'],
    database=os.environ['DB_NAME'],
    user=os.environ['DB_USER'],
    password=os.environ['DB_PASSWORD']  # Use Secrets Manager instead!
)

Real Best Practice:

Lambda → Secrets Manager (to get DB password) → RDS Proxy → RDS

RDS Proxy handles connection pooling — Lambda can open 1000 concurrent connections without overwhelming RDS.


Lambda Destinations

For async invocations — where to send success/failure events:

lambda_client.put_function_event_invoke_config(
    FunctionName='order-processor',
    DestinationConfig={
        'OnSuccess': {
            'Destination': 'arn:aws:sqs:us-east-1:123:success-queue'
        },
        'OnFailure': {
            'Destination': 'arn:aws:sqs:us-east-1:123:failure-queue'
        }
    }
)

Destinations vs DLQ:

  • DLQ: failure only, older mechanism, less metadata
  • Destinations: success AND failure, richer event metadata, recommended

Environment Variables & Secrets

import os
import boto3

# Simple config - fine for non-sensitive
REGION = os.environ['AWS_REGION']
TABLE_NAME = os.environ['TABLE_NAME']

# NEVER hardcode secrets - use Secrets Manager
def get_db_password():
    client = boto3.client('secretsmanager')
    response = client.get_secret_value(SecretId='prod/db/password')
    return response['SecretString']

Best practice: Cache secrets in Lambda execution environment memory (call Secrets Manager once in init, not per request).


Lambda Versions & Aliases

Versions: Immutable snapshots of your function code + config.

  • $LATEST = latest unpublished version
  • 1, 2, 3 = published, immutable versions

Aliases: Named pointers to versions.

prod    → version 5 (stable, 90% traffic)
          version 6 (canary, 10% traffic)   ← weighted alias!
staging → version 6 (100%)

Canary Deployment with Aliases:

lambda_client.update_alias(
    FunctionName='checkout',
    Name='prod',
    FunctionVersion='5',
    RoutingConfig={
        'AdditionalVersionWeights': {'6': 0.1}  # 10% to new version
    }
)

Lambda Function URLs

Direct HTTPS endpoint without API Gateway:

https://abc123.lambda-url.us-east-1.on.aws/
  • Auth types: NONE (public) or AWS_IAM
  • Supports CORS configuration
  • Not a replacement for API Gateway (no rate limiting, no WAF, no usage plans)

Lambda Execution Role Patterns

Lambda Role → needs these permissions:
  - cloudwatch:PutMetricData (metrics)
  - logs:CreateLogGroup/CreateLogStream/PutLogEvents (logging)
  - ec2:CreateNetworkInterface (if in VPC)
  - xray:PutTraceSegments (if X-Ray enabled)
  + whatever your function actually does (DynamoDB, S3, etc.)

AWS Managed Policy: AWSLambdaBasicExecutionRole covers the logs part.


Good Practices

Practice Reason
Keep functions small and focused Easier to test, deploy, debug
Initialize connections outside handler Reuse across warm invocations
Use Provisioned Concurrency for APIs Eliminate cold start latency
Use Lambda Destinations over DLQ Richer metadata for async flows
Set Reserved Concurrency to protect downstream Prevent Lambda from overwhelming RDS/ElastiCache
Use Lambda Layers for shared dependencies Keep deployment packages small
Set timeout thoughtfully Default is 3s — too short for many real workloads
Enable X-Ray tracing Debug performance issues in production

Bad Practices

Anti-Pattern Impact Fix
Opening DB connection per invocation Connection pool exhaustion on RDS Use RDS Proxy + init outside handler
Storing state in Lambda (global variables across invocations) State is NOT shared between instances Use DynamoDB/ElastiCache/S3 for state
Very high timeout with no DLQ Stuck invocations block event queues Set reasonable timeout + DLQ/Destinations
Recursive Lambda invocation without circuit breaker Lambda calls itself endlessly → huge bill Use SQS loop detection or reserved concurrency = 0
Deploying giant zip files Slow deployments, slow cold starts Use Layers, trim dependencies
Not monitoring concurrent executions Silent throttling CloudWatch metric: ConcurrentExecutions

Exam Tips

  1. 15 min max timeout — if task takes longer, use Step Functions or ECS/Fargate.
  2. Lambda@Edge vs CloudFront Functions: Lambda@Edge runs at regional edge locations, full Lambda runtime, 30s timeout. CloudFront Functions run at ALL edge locations, 1ms timeout, JavaScript only, much cheaper.
  3. Throttling: 429 error. Scale-up rate: 500-3000 concurrent per minute burst limit.
  4. Event Source Mapping: Lambda polls Kinesis/SQS/DynamoDB Streams — not the other way.
  5. Async invocation retries: 2 retries over 6 hours. Configure DLQ or Destinations.
  6. Version $LATEST is mutable. Published versions are immutable.
  7. Lambda in VPC needs NAT Gateway for internet access.
  8. RDS Proxy is the solution when Lambda causes too many DB connections.
  9. Ephemeral storage /tmp is shared across invocations on the same instance — don't rely on it for persistence but can use for caching within a warm invocation.

Common Exam Scenarios

Q: Lambda fails to connect to RDS in private subnet? → Lambda must be configured to run in the same VPC, with security group allowing port 5432.

Q: Lambda hitting RDS connection limit? → Add RDS Proxy between Lambda and RDS.

Q: Lambda processing SQS messages but some fail and need retry? → Return batchItemFailures in response to retry only failed messages; set up DLQ on SQS for max retries.

Q: Lambda cold starts causing API latency? → Enable Provisioned Concurrency on the Lambda alias used by API Gateway.

Q: Run a task that takes 30 minutes? → Lambda can't do it (15 min max). Use Step Functions or ECS Fargate or AWS Batch.

Q: Deploy new Lambda with zero downtime canary testing? → Use Lambda Aliases with weighted routing (e.g., 90% v1, 10% v2).