Implement ApplicationSignals Logs integ tests for dynamic placeholders for otlp headers, and add Metrics tests for routing & OTLP endpoint#690
Conversation
…ts for missing attributes
d6ef3ab to
39d9b61
Compare
144d480 to
5a5afad
Compare
| // Write a valid credentials file sourced from the instance role (via IMDSv2) | ||
| // BEFORE blocking IMDS, so the agent has credentials from the file alone. | ||
| writeCredsScript := ` | ||
| set -e | ||
| mkdir -p ` + onPremCredsDir + ` | ||
| TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 600") | ||
| ROLE=$(curl -s -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/iam/security-credentials/) | ||
| CREDS=$(curl -s -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/iam/security-credentials/$ROLE) | ||
| AKID=$(echo "$CREDS" | python3 -c 'import sys,json; print(json.load(sys.stdin)["AccessKeyId"])') | ||
| SAK=$(echo "$CREDS" | python3 -c 'import sys,json; print(json.load(sys.stdin)["SecretAccessKey"])') | ||
| TOK=$(echo "$CREDS" | python3 -c 'import sys,json; print(json.load(sys.stdin)["Token"])') | ||
| printf '[default]\naws_access_key_id=%s\naws_secret_access_key=%s\naws_session_token=%s\n' "$AKID" "$SAK" "$TOK" | sudo tee ` + onPremCredsFile + `>/dev/null | ||
| # Region for onPrem mode is read from a config file next to the credentials file. | ||
| printf '[default]\nregion = us-east-1\n' | sudo tee ` + onPremCredsDir + `/config>/dev/null` | ||
| require.NoError(t, common.RunCommands([]string{writeCredsScript}), "Failed to write credentials file") | ||
|
|
||
| // common-config.toml points the agent at the credentials file. | ||
| require.NoError(t, common.RunCommands([]string{ | ||
| "printf '[credentials]\\n shared_credential_profile = \"default\"\\n shared_credential_file = \"" + | ||
| onPremCredsFile + "\"\\n' | sudo tee " + commonConfigOutput + ">/dev/null", | ||
| }), "Failed to write common-config.toml") |
There was a problem hiding this comment.
Take a look at https://github.com/aws/amazon-cloudwatch-agent-test/blob/main/test/credential_chain/util/setup_util.go. It does a lot of what this is trying to do.
|
|
||
| // TestAppSignalsLogsRouting verifies that the routing connector correctly | ||
| // splits logs between the batch and no-batch pipelines: | ||
| // - 4 regular logs → batch pipeline (same ingestion time) |
There was a problem hiding this comment.
nit: This seems like it could be flaky. Is there something else we can look at instead of ingestion time?
There was a problem hiding this comment.
Couldn't find a better indicator instead of ingestion time. Since these logs are sent in quick succession, there is a very low chance for the flakiness issue (e.g. logs are in different batches) to occur. At least I haven't run into it yet...
| // blockIMDS adds an iptables rule rejecting traffic to the IMDS endpoint so the | ||
| // AWS SDK default credential chain cannot resolve credentials from IMDS. | ||
| func blockIMDS(t *testing.T) { | ||
| t.Helper() | ||
| _, err := common.RunCommand(fmt.Sprintf("sudo iptables -A OUTPUT -d %s -j REJECT", imdsEndpoint)) | ||
| require.NoError(t, err, "Failed to block IMDS") | ||
| } | ||
|
|
||
| // unblockIMDS removes the iptables rule added by blockIMDS. | ||
| func unblockIMDS(t *testing.T) { | ||
| t.Helper() | ||
| _, _ = common.RunCommand(fmt.Sprintf("sudo iptables -D OUTPUT -d %s -j REJECT", imdsEndpoint)) | ||
| } |
There was a problem hiding this comment.
Instead of blocking with IP tables, consider using the existing SetupSystemdOverride with AWS_EC2_METADATA_DISABLED set to true.
amazon-cloudwatch-agent-test/test/credential_chain/util/setup_util.go
Lines 100 to 111 in c3bb641
That may not help with the initial translation (during the fetch), but it'll impact the agent startup and runtime.
There was a problem hiding this comment.
Updated to use SetupSystemdOverride, and replaced block/unblock with just disableIMDS that returns a cleanup function.
| defer common.StopAgent() | ||
| defer common.RunCommand("sudo " + agentCtl + " -a remove-config -c all") | ||
|
|
||
| fetchConfig(t, "ec2") |
There was a problem hiding this comment.
Any reason not to use common.StartAgent?
There was a problem hiding this comment.
No, updated to use common.StartAgent.
| common.CopyFile(logsConfigPath, common.ConfigOutputPath) | ||
|
|
||
| defer common.StopAgent() | ||
| defer common.RunCommand("sudo " + agentCtl + " -a remove-config -c all") |
There was a problem hiding this comment.
nit: Don't think this is necessary. It also doesn't clean up the common-config.toml, so can create a false sense of a reset.
| // (Scoped to fatal startup errors — a non-fatal RootCAs warning from other | ||
| // components, e.g. ec2tagger, does not crash the agent.) | ||
| agentLog := common.ReadAgentLogfile(common.AgentLogFile) | ||
| assert.NotContains(t, agentLog, "failed to create CW Logs client", |
There was a problem hiding this comment.
This is a very specific point-in-time failure. The NotContains test passes even if the CA bundle is ignored. The test's "custom" CA is the same as the system CA, so unless we can see the path that it's reading from, it's hard to know if it's using the custom CA. Is there a stronger signal we can use to verify it works? Take a look at how the CA bundle tests do their assertions.
Use credential_chain util's SetupSystemdOverride with AWS_EC2_METADATA_DISABLED instead of an iptables REJECT rule. This only affects the agent's environment (not the test process), so the e2e validation's own AWS SDK calls keep working and no IMDS restore step is needed.
…remove-config - Write the credentials file and common-config.toml via credential_chain util's SetupSharedCredentialsFile / SetupCommonConfig (template-based) instead of inline bash printf blocks. - Resolve instance credentials through the SDK default chain instead of curling IMDS + parsing with python. - Replace the 'remove-config' cleanup defer with ResetCommonConfig so the common-config.toml is actually cleaned up between tests.
A copy of the system CA bundle is indistinguishable from the default trust store, so the NotContains check passed even if the bundle were ignored. Add a second subtest using a standalone self-signed bundle that does not trust the real AWS CAs: the agent still starts (provisioner builds its client), but outbound TLS fails with x509 — which only happens if the agent actually loaded our custom bundle. This proves AWS_CA_BUNDLE is honored, not silently ignored.
Add ./test/app_signals to the ec2_linux test matrix (al2023/amd64) so the ServiceEvents logs + metrics integration tests run in CI.
The new ServiceEvents logs/metrics-routing and startup tests share the test/app_signals directory with the older test_runner-based metrics/traces suite that was de-registered from CI in aws#409 (moved to e2e). Registering the whole app_signals package would resurrect that retired suite. Move the new tests into test/app_signals/serviceevents (with their own resources/agent_configs) and point the integration matrix at the sub-package so only these tests run. Register environment metadata flags in serviceevents package The init() that registers -computeType and other env flags lived in the old app_signals package (app_signals_test.go, left behind). The new serviceevents package needs its own, or test binaries fail with 'flag provided but not defined: -computeType'.
…s dir Match the repo convention of flat, sibling test directories under test/ (e.g. emf, emf_concurrent) instead of nesting under app_signals/. Keeps the package separate from the old app_signals test_runner suite so CI runs only these tests.
Description of the issue and changes
Add tests for changes in: aws/amazon-cloudwatch-agent#2111
License
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Tests
TestAppSignalsLogsDynamicRouting— Multiple services route to separate log groups with correct stream names; verifies unknown_service:java truncation and internal attribute cleanupTestAppSignalsLogsNoisyNeighbor— Invalid service name (colons) doesn't block healthy services; validates provisioner failure is loggedTestAppSignalsLogsDefaultPlaceholder— Missing service.name defaults to unknown_service; missing attrs default to unknown; no invalid log group creation attemptedTestAppSignalsLogsRouting— Routing connector splits logs between batch and no-batch pipelines based on event.name == "aws.service_events.aggregate_profile"TestAppSignalsMetricsRouting— Routing connector splits metrics between EMF (Latency/Error/Fault) and OTLP monitoring endpoint (ServiceEvents); validates via PromQL query