Feature request: filter "In-Progress X-Ray Segment" error spans from async-triggered Lambda traces (AWS-side quirk, ~95% false-positive ratio)

## Summary

Datadog APM error spans tagged `@error.type:"In-Progress X-Ray Segment"` are dominantly false positives for SQS- (and other async-) triggered Lambda functions. They originate from an AWS-side X-Ray quirk where the `AWS::Lambda` wrapper segment fails to be closed by AWS Lambda's control plane, despite the function itself completing successfully. Filtering these out at the extension/integration layer would significantly improve signal-to-noise for any team monitoring async Lambdas through Datadog APM.

## Root cause (AWS-side, not Datadog-side)

For each Lambda invocation with `tracing_config.mode = Active`, X-Ray emits two segments:

1. `AWS::Lambda::Function` — function execution; closed by the Lambda runtime when the handler returns.
2. `AWS::Lambda` — service-level wrapper segment, emitted by Lambda's control plane.

For SQS / SNS / EventBridge / S3-triggered Lambdas, AWS's internal pipeline frequently fails to close the wrapper segment, leaving it `in_progress: true` indefinitely. The function segment closes normally.

Verification (via `aws xray batch-get-traces` on real production trace IDs):

```
AWS::Lambda::Function   in_progress: false   has_end_time: true   duration: 1.59s
AWS::Lambda             in_progress: true    has_end_time: false  duration: null
```

The function's CloudWatch `REPORT` log also confirms the invocation completed normally (`START` → `REPORT` → `END` trio with billed duration well under the configured timeout).

## A note on existing public references

I haven't found an AWS-side GitHub issue that specifically describes the **wrapper-segment-fails-to-close** symptom. The publicly-discussed adjacent issues are about trace context propagation across async boundaries — e.g. [`aws-xray-sdk-node#208`](https://github.com/aws/aws-xray-sdk-node/issues/208), [`aws-xray-sdk-go#218`](https://github.com/aws/aws-xray-sdk-go/issues/218) (the latter resolved in 2023 with SNS/SQS active tracing) — which are related but mechanically different (continuity vs. closure). AWS's [X-Ray troubleshooting docs](https://docs.aws.amazon.com/xray/latest/devguide/xray-troubleshooting.html) acknowledge the broader category of "incomplete traces" but don't single out this specific symptom. The trace-data evidence and span-attribute comparison below are therefore the strongest signal.

## Why Datadog can filter this

These error spans reach Datadog via the AWS X-Ray integration (out-of-process polling), not via the Datadog Lambda Extension. They have distinctive tags that make them easy to identify:

```yaml
service: aws.lambda.<function-name>
@error.type: "In-Progress X-Ray Segment"
@error.stack: "This X-Ray segment is still in-progress... likely terminated due
              to a timeout or out-of-memory error..."
ingestion_reason: xray
trace_type: xray
```

Compare to Datadog-Extension-emitted spans which have `ingestion_reason: lambda` and rich Lambda metadata (`runtime-id`, `dd_extension_version`, `init_type`, `language`, `go_path`). The two pipelines are cleanly distinguishable, so a filter that targets `ingestion_reason: xray` will never accidentally drop dd-trace-go spans.

## Production impact (single team's measurement)

Over a 15-day window on one SQS-triggered production Lambda:

- **336** Datadog error spans with `@error.type:"In-Progress X-Ray Segment"`
- **19** real Lambda errors per CloudWatch `aws.lambda.errors`
- **~95% false-positive ratio** in the APM error tracker

This corrupts error tracking dashboards, monitors, and on-call paging. Multiple engineers across the org have been burned chasing these as real Lambda timeouts (the canned "timeout/OOM" stack text is misleading — function actually completed cleanly in 1-2 s).

## Requested fix

In rough order of preference:

**A. Default-off Retention Filter / ingestion drop rule** shipped with the Datadog AWS Lambda integration, matching:

```
service:aws.lambda.* @error.type:"In-Progress X-Ray Segment"
```

With clear documentation that customers can opt in.

**B. Ingestion-time correlation drop**: when ingesting an `in_progress` wrapper segment (`origin: AWS::Lambda`), check if the corresponding `AWS::Lambda::Function` child segment has closed normally. If yes, drop the wrapper from APM ingestion as a known-AWS-artefact.

**C. Documentation** in [Datadog Lambda Troubleshooting](https://docs.datadoghq.com/serverless/aws_lambda/troubleshooting/) explicitly addressing this error type and recommending the manual exclusion filter, with the rationale that it's an AWS-side artefact, not a real failure.

Happy to provide additional trace IDs, X-Ray segment JSON dumps, and CloudWatch correlation data if useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: filter "In-Progress X-Ray Segment" error spans from async-triggered Lambda traces (AWS-side quirk, ~95% false-positive ratio) #1229

Summary

Root cause (AWS-side, not Datadog-side)

A note on existing public references

Why Datadog can filter this

Production impact (single team's measurement)

Requested fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: filter "In-Progress X-Ray Segment" error spans from async-triggered Lambda traces (AWS-side quirk, ~95% false-positive ratio) #1229

Description

Summary

Root cause (AWS-side, not Datadog-side)

A note on existing public references

Why Datadog can filter this

Production impact (single team's measurement)

Requested fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions