Skip to content

Implement ApplicationSignals Logs integ tests for dynamic placeholders for otlp headers, and add Metrics tests for routing & OTLP endpoint#690

Open
jj22ee wants to merge 18 commits into
aws:mainfrom
jj22ee:awscwlogsprovisioner-tests
Open

Implement ApplicationSignals Logs integ tests for dynamic placeholders for otlp headers, and add Metrics tests for routing & OTLP endpoint#690
jj22ee wants to merge 18 commits into
aws:mainfrom
jj22ee:awscwlogsprovisioner-tests

Conversation

@jj22ee

@jj22ee jj22ee commented May 11, 2026

Copy link
Copy Markdown
Member

Description of the issue and changes

Add tests for changes in: aws/amazon-cloudwatch-agent#2111

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

  • TestAppSignalsLogsDynamicRouting — Multiple services route to separate log groups with correct stream names; verifies unknown_service:java truncation and internal attribute cleanup
  • TestAppSignalsLogsNoisyNeighbor — Invalid service name (colons) doesn't block healthy services; validates provisioner failure is logged
  • TestAppSignalsLogsDefaultPlaceholder — Missing service.name defaults to unknown_service; missing attrs default to unknown; no invalid log group creation attempted
  • TestAppSignalsLogsRouting — Routing connector splits logs between batch and no-batch pipelines based on event.name == "aws.service_events.aggregate_profile"
  • TestAppSignalsMetricsRouting — Routing connector splits metrics between EMF (Latency/Error/Fault) and OTLP monitoring endpoint (ServiceEvents); validates via PromQL query

@jj22ee jj22ee requested a review from a team as a code owner May 11, 2026 11:05
@jj22ee jj22ee changed the title Implement ApplicationSignals Logs integ tests for dynamic placeholders for otlp headers Implement ApplicationSignals Logs integ tests for dynamic placeholders for otlp headers, and add Metrics tests for routing & OTLP endpoint May 22, 2026
@jj22ee jj22ee force-pushed the awscwlogsprovisioner-tests branch from d6ef3ab to 39d9b61 Compare May 22, 2026 11:11
@jj22ee jj22ee force-pushed the awscwlogsprovisioner-tests branch from 144d480 to 5a5afad Compare May 22, 2026 18:16
Comment thread test/app_signals/service_events_test.go Outdated
Comment on lines +144 to +164
// Write a valid credentials file sourced from the instance role (via IMDSv2)
// BEFORE blocking IMDS, so the agent has credentials from the file alone.
writeCredsScript := `
set -e
mkdir -p ` + onPremCredsDir + `
TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 600")
ROLE=$(curl -s -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/iam/security-credentials/)
CREDS=$(curl -s -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/iam/security-credentials/$ROLE)
AKID=$(echo "$CREDS" | python3 -c 'import sys,json; print(json.load(sys.stdin)["AccessKeyId"])')
SAK=$(echo "$CREDS" | python3 -c 'import sys,json; print(json.load(sys.stdin)["SecretAccessKey"])')
TOK=$(echo "$CREDS" | python3 -c 'import sys,json; print(json.load(sys.stdin)["Token"])')
printf '[default]\naws_access_key_id=%s\naws_secret_access_key=%s\naws_session_token=%s\n' "$AKID" "$SAK" "$TOK" | sudo tee ` + onPremCredsFile + `>/dev/null
# Region for onPrem mode is read from a config file next to the credentials file.
printf '[default]\nregion = us-east-1\n' | sudo tee ` + onPremCredsDir + `/config>/dev/null`
require.NoError(t, common.RunCommands([]string{writeCredsScript}), "Failed to write credentials file")

// common-config.toml points the agent at the credentials file.
require.NoError(t, common.RunCommands([]string{
"printf '[credentials]\\n shared_credential_profile = \"default\"\\n shared_credential_file = \"" +
onPremCredsFile + "\"\\n' | sudo tee " + commonConfigOutput + ">/dev/null",
}), "Failed to write common-config.toml")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 52be990


// TestAppSignalsLogsRouting verifies that the routing connector correctly
// splits logs between the batch and no-batch pipelines:
// - 4 regular logs → batch pipeline (same ingestion time)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This seems like it could be flaky. Is there something else we can look at instead of ingestion time?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't find a better indicator instead of ingestion time. Since these logs are sent in quick succession, there is a very low chance for the flakiness issue (e.g. logs are in different batches) to occur. At least I haven't run into it yet...

Comment thread test/app_signals/agent_configs/config_logs_dynamic_placeholders.json Outdated
Comment thread test/app_signals/service_events_test.go Outdated
Comment on lines +124 to +136
// blockIMDS adds an iptables rule rejecting traffic to the IMDS endpoint so the
// AWS SDK default credential chain cannot resolve credentials from IMDS.
func blockIMDS(t *testing.T) {
t.Helper()
_, err := common.RunCommand(fmt.Sprintf("sudo iptables -A OUTPUT -d %s -j REJECT", imdsEndpoint))
require.NoError(t, err, "Failed to block IMDS")
}

// unblockIMDS removes the iptables rule added by blockIMDS.
func unblockIMDS(t *testing.T) {
t.Helper()
_, _ = common.RunCommand(fmt.Sprintf("sudo iptables -D OUTPUT -d %s -j REJECT", imdsEndpoint))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of blocking with IP tables, consider using the existing SetupSystemdOverride with AWS_EC2_METADATA_DISABLED set to true.

func SetupSystemdOverride(overrideContent string) error {
// Create override directory
if err := common.MkdirAll(SystemdOverrideDir); err != nil {
return fmt.Errorf("failed to create systemd override directory: %w", err)
}
if err := common.WriteFile(SystemdOverridePath, overrideContent); err != nil {
return fmt.Errorf("failed to write systemd override content: %w", err)
}
return nil
}

That may not help with the initial translation (during the fetch), but it'll impact the agent startup and runtime.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use SetupSystemdOverride, and replaced block/unblock with just disableIMDS that returns a cleanup function.

Comment thread test/app_signals/service_events_test.go Outdated
defer common.StopAgent()
defer common.RunCommand("sudo " + agentCtl + " -a remove-config -c all")

fetchConfig(t, "ec2")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use common.StartAgent?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, updated to use common.StartAgent.

Comment thread test/app_signals/service_events_test.go Outdated
common.CopyFile(logsConfigPath, common.ConfigOutputPath)

defer common.StopAgent()
defer common.RunCommand("sudo " + agentCtl + " -a remove-config -c all")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Don't think this is necessary. It also doesn't clean up the common-config.toml, so can create a false sense of a reset.

Comment thread test/app_signals/service_events_test.go Outdated
// (Scoped to fatal startup errors — a non-fatal RootCAs warning from other
// components, e.g. ec2tagger, does not crash the agent.)
agentLog := common.ReadAgentLogfile(common.AgentLogFile)
assert.NotContains(t, agentLog, "failed to create CW Logs client",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very specific point-in-time failure. The NotContains test passes even if the CA bundle is ignored. The test's "custom" CA is the same as the system CA, so unless we can see the path that it's reading from, it's hard to know if it's using the custom CA. Is there a stronger signal we can use to verify it works? Take a look at how the CA bundle tests do their assertions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to address in: 67b9b50

jj22ee added 8 commits June 17, 2026 12:41
Use credential_chain util's SetupSystemdOverride with AWS_EC2_METADATA_DISABLED
instead of an iptables REJECT rule. This only affects the agent's environment
(not the test process), so the e2e validation's own AWS SDK calls keep working
and no IMDS restore step is needed.
…remove-config

- Write the credentials file and common-config.toml via credential_chain util's
  SetupSharedCredentialsFile / SetupCommonConfig (template-based) instead of
  inline bash printf blocks.
- Resolve instance credentials through the SDK default chain instead of curling
  IMDS + parsing with python.
- Replace the 'remove-config' cleanup defer with ResetCommonConfig so the
  common-config.toml is actually cleaned up between tests.
A copy of the system CA bundle is indistinguishable from the default trust
store, so the NotContains check passed even if the bundle were ignored. Add a
second subtest using a standalone self-signed bundle that does not trust the
real AWS CAs: the agent still starts (provisioner builds its client), but
outbound TLS fails with x509 — which only happens if the agent actually loaded
our custom bundle. This proves AWS_CA_BUNDLE is honored, not silently ignored.
Add ./test/app_signals to the ec2_linux test matrix (al2023/amd64) so the
ServiceEvents logs + metrics integration tests run in CI.
The new ServiceEvents logs/metrics-routing and startup tests share the
test/app_signals directory with the older test_runner-based metrics/traces
suite that was de-registered from CI in aws#409 (moved to e2e). Registering the
whole app_signals package would resurrect that retired suite. Move the new
tests into test/app_signals/serviceevents (with their own resources/agent_configs)
and point the integration matrix at the sub-package so only these tests run.

Register environment metadata flags in serviceevents package

The init() that registers -computeType and other env flags lived in the old
app_signals package (app_signals_test.go, left behind). The new serviceevents
package needs its own, or test binaries fail with 'flag provided but not
defined: -computeType'.
…s dir

Match the repo convention of flat, sibling test directories under test/
(e.g. emf, emf_concurrent) instead of nesting under app_signals/. Keeps the
package separate from the old app_signals test_runner suite so CI runs only
these tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants