Skip to content

[APPSEC-61865] (WIP) Add inferred span and AppSec calls for API Gateway#136

Draft
Strech wants to merge 24 commits intomainfrom
appsec-61865-add-inferred-span-for-api-gateway
Draft

[APPSEC-61865] (WIP) Add inferred span and AppSec calls for API Gateway#136
Strech wants to merge 24 commits intomainfrom
appsec-61865-add-inferred-span-for-api-gateway

Conversation

@Strech
Copy link
Copy Markdown
Member

@Strech Strech commented Mar 25, 2026

What does this PR do?

Compute inferred span for AWS Lambda that suppose to be a service-entry span. In addition - initialize AppSec component (if possible), push events to it and flush captured security events.

Motivation

Part of Endpoint Discovery & Correlation from Inferred Spans RFC

Testing Guidelines

Additional Notes

I would need some assistance to get more context over testing.

Types of changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Strech and others added 15 commits March 18, 2026 17:44
Remove all AppSec lifecycle management (context creation,
Event.record, export_metrics, Context.deactivate) — now
owned by dd-trace-rb watcher. Listener becomes a dumb
data provider: just pushes raw event/response hashes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lambda::AppSec owns full context lifecycle (create, activate,
record, export, deactivate). Lambda::Trace::InferredSpan owns
span creation and finishing. Listener becomes thin orchestrator.

AppSec context now targets the inferred span (service-entry),
so tags land there directly — tag propagation hack removed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove [DEBUG:inject_appsec] debug logging from inject_appsec_data
- Restore removed comments in send_end_invocation_request
- Revert unnecessary ::Datadog:: namespace prefixes in lambda.rb

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…vices check

F29: Move `instrument(:aws_lambda)` from per-invocation `AppSec.on_start`
to `configure_apm` after yield block. The `instrument` method is a no-op
when appsec is disabled (checks `enabled` internally), so no guard needed.
Matches Python's pattern of one-time initialization.

F30: Reuse `Datadog::Lambda.trace_managed_services?` in InferredSpan
instead of duplicating the ENV reading logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
F31: 44 new examples covering:
- AppSec.on_start: enabled/disabled, context creation, gateway push,
  fallback to active trace/span, missing security engine, error handling
- AppSec.on_finish: no active context, gateway push, Event.record,
  metrics/telemetry export, context deactivation, deactivation on error
- InferredSpan.create: managed services disabled, non-hash events,
  missing requestContext/stage/httpMethod, v1 span name/tags/resource_key,
  v2 span name/tags/resource_key, trace_digest continuation, empty domain,
  error handling
- InferredSpan.finish: nil span, statusCode tagging, non-hash response,
  nil response, error handling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
F25: `Datadog::Tracing.trace()` returns a SpanOperation, not a
TraceOperation. Rename `@trace` to `@span` in Listener to reflect
the actual type. Remove stale class-level `@trace = nil`.

AppSec.on_start now always uses `Datadog::Tracing.active_trace` for
the TraceOperation (matching Rack's RequestMiddleware pattern) and
receives only the target span (inferred span) as an argument, with
fallback to `active_span`.

F26: Pass `@inferred_span || @span` to `inject_appsec_data` so the
extension reads tags from the span where AppSec actually sets them.

F32: Variable renamed from `@trace` to `@span`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Rewrite specs to test behavior instead of internals
* Remove obsolete appsec files
Split the god method into ApiGatewayV1/V2 parser classes with a uniform
interface, and rewire InferredSpan.create as a PARSERS dispatcher +
build_span builder. Existing behavior tests pass unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
http.status_code on the inferred span is not in the RFC and not checked
by system tests. Without it, finish() just delegates to span.finish —
the listener now calls @inferred_span&.finish directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
def domain = @request_context.fetch('domainName', '')
def api_id = @request_context.fetch('apiId', '')
def stage = @request_context.fetch('stage', '')
def request_time_ms = @request_context['requestTimeEpoch']
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No fallback & coersion?

Comment on lines +11 to +18
def self.match?(payload)
api_gateway?(payload) && payload.key?('httpMethod')
end

private_class_method def self.api_gateway?(payload)
payload.is_a?(Hash) &&
payload.key?('requestContext') && payload['requestContext'].key?('stage')
end
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to do class << self style

Comment on lines +26 to +27
@span = nil
@inferred_span = nil
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed within instance?

Comment on lines +62 to +64
Datadog::Utils.send_end_invocation_request(
response:, span_id: @span.id, request_context:
)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single-line is better, and we can reorder empty kwargs to be last

Comment on lines +30 to +33
def parser_for(event)
klass = PARSERS.find { |parser| parser.match?(event) }
klass&.new(event)
end
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Irresponsible method with no benefits

}

arn = request_context.invoked_function_arn.to_s
region = arn.split(':')[3] if arn.include?(':')
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No limit on split and magic number 3

private

def parser_for(event)
klass = PARSERS.find { |parser| parser.match?(event) }
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call it parsers here, but in fact it's value object style, and it's better to word it that way in code and docs as they provide object oriented access to the AWS event payload

Comment on lines +77 to +79
def managed_services_enabled?
Datadog::Lambda.trace_managed_services?
end
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed at all?

Strech and others added 3 commits March 26, 2026 17:30
Removes hidden dependency on Datadog::Tracing.active_trace and
active_span — the listener now passes both explicitly. Span selection
policy (prefer inferred span over lambda span) moves to the caller.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DD_TRACE_MANAGED_SERVICES controls outbound AWS SDK call tracing,
not inbound inferred spans. The guard was added by mistake during
extraction.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
def span_name = 'aws.httpapi'
def method = @http['method']
def path = @payload.fetch('rawPath', '/')
def resource_path = @payload['routeKey']&.sub(/^[A-Z]+ /, '') || path
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Does it makes sense to use \A instead of ^ (security)
  2. What this sub here is for?


# Remove Parent ID if it is the same as the Span ID
request.delete(DD_PARENT_ID_HEADER) if request[DD_PARENT_ID_HEADER] == span_id.to_s

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it needed?

return unless context

Datadog::AppSec::Instrumentation.gateway.push('aws_lambda.response.start', response)
Datadog::AppSec::Event.record(context, request: context.state[:request])
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Who creates this request in the code?
  2. Seems tests are wrong doing here and over-stubbing

klass&.new(event)
end

def build_span(parser, request_context, trace_digest)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method is massive now, and signature could use some help of kwargs for secondary arguments

tags: tags,
}
options[:continue_from] = trace_digest if trace_digest
options[:start_time] = Time.at(parser.request_time_ms / 1000.0) if parser.request_time_ms
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like small method and a constant


yield(c) if block_given?

c.appsec.instrument(:aws_lambda)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to add comment about instrumentation guard on enabled from AppSec settings class and no auto instrumentation

Comment on lines +51 to +54
{
'httpMethod' => 'POST',
'requestContext' => {'stage' => 'dev'},
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 line?

Comment on lines +65 to +68
{
'routeKey' => 'POST /data',
'requestContext' => {'stage' => 'dev'},
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 line?

parent_id: '797643193680388254'
}
end
let(:ctx) { LambdaContextVersion.new }
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is needed from this test? Can we write a clean separate test with good standards?

Comment on lines +40 to +44
allow(Datadog::AppSec).to receive(:enabled?).and_return(true)
allow(Datadog::AppSec).to receive(:security_engine).and_return(security_engine)
allow(Datadog::AppSec::Context).to receive(:new).and_return(appsec_context)
allow(Datadog::AppSec::Context).to receive(:activate)
allow(Datadog::AppSec::Context).to receive(:active).and_return(appsec_context)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So many stubs, are all of them needed?


before { allow(Datadog::AppSec::Context).to receive(:active).and_return(nil) }

it 'does not activate or push' do
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what doesn't activate, trace?

it 'does not activate or push' do
on_start

aggregate_failures do
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is description?

end

context 'when context has a request in state' do
before { appsec_context.state[:request] = request_data }
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stubs stuff, but we miss actual push on the gateway because request data is never given in actual code

describe '.create' do
subject(:created_span) { described_class.create(event, request_context, trace_digest) }

after { created_span&.finish unless created_span&.finished? }
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do that here, we are not coding, we controlling environment, we should not have to do safe-navigation in tests at all


describe Datadog::Lambda::InferredSpan do
let(:request_context) { LambdaContextVersion.new }
let(:trace_digest) { nil }
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

violates rspec guide

end

context 'when event has no httpMethod or routeKey' do
let(:event) { {'requestContext' => {'stage' => 'prod'}} }
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use let/subject instead, hard to read test

Strech and others added 5 commits March 26, 2026 18:52
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nstants

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…in continue_from test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Strech Strech force-pushed the appsec-61865-add-inferred-span-for-api-gateway branch from 80f3ad5 to eee4562 Compare March 27, 2026 14:52
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant