Skip to content

feat(log-ingestor)!: Support the default AWS SDK credential provider chain (resolves #1939); Add serviceAccount.annotations to the Helm chart for IRSA.#1962

Merged
junhaoliao merged 18 commits intoy-scope:mainfrom
junhaoliao:iam-role-log-ingestor
Mar 12, 2026

Conversation

@junhaoliao
Copy link
Copy Markdown
Member

@junhaoliao junhaoliao commented Feb 9, 2026

  • Rename AwsAuthType.ec2 to AwsAuthType.default.
  • Rename Helm ServiceAccount from job-watcher to service-account; add serviceAccountName to compression-worker and query-worker deployments.

Description

Add support for the AWS SDK's default credential provider chain (type: "default") in CLP's Rust
and Python components, enabling deployment on platforms where credentials are provided
automatically (e.g., EKS with IRSA, ECS, EC2 instance roles).

Previously, S3/SQS access required explicit type: "credentials" with an access key pair. The new
type: "default" option omits explicit credentials and lets the AWS SDK discover them from the
environment (env vars, config files, IRSA web identity tokens, container credentials, EC2 metadata).

Key changes:

  • Rust (clp-rust-utils): Add AwsAuthentication::Default variant with
    #[serde(rename = "default")]. Add credentials_pair() method returning Option<(&str, &str)>
    (None for Default, Some(...) for Credentials). Update S3 and SQS client creation to accept
    optional credentials -- when None, the default AWS SDK provider chain is used.
  • Rust (log-ingestor, api-server): Thread AwsAuthentication through instead of raw
    key/secret pairs. Update aws_client_manager, compression_job_submitter,
    ingestion_job_manager, and api-server/client to use credentials_pair().
  • Python (clp-py-utils): Rename AwsAuthType.ec2 to AwsAuthType.default. Update
    validation logic and error messages in clp_config.py and s3_utils.py.
  • Helm chart: Rename ServiceAccount from job-watcher to service-account (more generic name
    since it now serves both RBAC job-watching and IRSA roles). Add serviceAccountName to
    compression-worker and query-worker deployments (previously missing, which prevented IRSA
    token injection). Add serviceAccount.annotations to values.yaml so the IRSA role ARN can be
    set declaratively at install time. Bump chart version to 0.1.4-dev.3.
  • Docs: Add new eks-irsa-setup.md guide for configuring IRSA on Amazon EKS. Update
    clp-config.md with the default auth type and links to official AWS SDK credential chain docs
    (Boto3, Rust SDK). Update index.md to list the
    default credential provider chain sources.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title.
    • type: "ec2" renamed to type: "default" in aws_authentication config.
    • Helm ServiceAccount renamed from job-watcher to service-account.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Deployed CLP to an Amazon EKS cluster (EKS Auto Mode, us-east-2) with type: "default"
authentication and IRSA, and validated the full S3 ingestion and compression pipeline end-to-end.

1. Create an EKS cluster

  1. Go to EKS > Create cluster.
  2. Set a cluster name (e.g. clp-test) and region (e.g. us-east-2).
  3. Select EKS Auto Mode (manages node groups automatically).
  4. Create the recommended roles for the cluster and the nodes.
  5. Use defaults for the rest and create the cluster. Wait for status Active.

2. Create an S3 bucket with test data

  1. Go to S3 > Create bucket in the same region.
  2. Upload a test log file (e.g. postgresql.jsonl).

3. Create an IAM policy for S3 access

  1. Go to IAM > Policies > Create policy.
  2. Use the JSON editor:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
          "Resource": "arn:aws:s3:::<bucket-name>/*"
        },
        {
          "Effect": "Allow",
          "Action": "s3:ListBucket",
          "Resource": "arn:aws:s3:::<bucket-name>"
        }
      ]
    }

4. Configure IRSA

  1. Go to EKS > cluster > Overview tab. Copy the OpenID Connect provider URL.
  2. Go to IAM > Identity providers > Add provider.
    • Type: OpenID Connect
    • Provider URL: paste the OIDC URL from above
    • Audience: sts.amazonaws.com
  3. Go to IAM > Roles > Create role.
    • Trusted entity type: Web identity
    • Identity provider: select the OIDC provider from step 2
    • Audience: sts.amazonaws.com
    • Attach the S3 policy from step 3.
  4. After creation, edit the role's Trust policy to add a StringLike condition for the
    ServiceAccount. The default policy only has the :aud condition — add the :sub entry to
    the existing Condition block:
    "Condition": {
      "StringEquals": {
        "oidc.eks.<region>.amazonaws.com/id/<OIDC_ID>:aud": "sts.amazonaws.com"
      },
      "StringLike": {
        "oidc.eks.<region>.amazonaws.com/id/<OIDC_ID>:sub": "system:serviceaccount:<namespace>:*"
      }
    }

5. Build a custom CLP package image

Push this branch to your fork's main branch to trigger the CI image build:

git push <fork-remote> HEAD:main

The built image will be available at ghcr.io/<owner>/clp/clp-package:main.

6. Prepare hostPath directories on the worker node

Third-party containers (MariaDB, RabbitMQ, etc.) run as non-root (uid 1000). On EKS, hostPath
directories are created as root by default, so they need to be pre-created with the correct
ownership.

for node in $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); do
  kubectl debug "node/${node}" -it --image=busybox -- sh -c '
    mkdir -p /host/tmp/clp/var/data/database \
             /host/tmp/clp/var/data/queue \
             /host/tmp/clp/var/data/redis \
             /host/tmp/clp/var/data/results_cache \
             /host/tmp/clp/var/data/archives \
             /host/tmp/clp/var/data/streams \
             /host/tmp/clp/var/data/staged-archives \
             /host/tmp/clp/var/data/staged-streams \
             /host/tmp/clp/var/log/database \
             /host/tmp/clp/var/log/queue \
             /host/tmp/clp/var/log/redis \
             /host/tmp/clp/var/log/results_cache \
             /host/tmp/clp/var/log/compression_scheduler \
             /host/tmp/clp/var/log/compression_worker \
             /host/tmp/clp/var/log/user \
             /host/tmp/clp/var/log/query_scheduler \
             /host/tmp/clp/var/log/query_worker \
             /host/tmp/clp/var/log/reducer \
             /host/tmp/clp/var/log/api_server \
             /host/tmp/clp/var/log/garbage_collector \
             /host/tmp/clp/var/log/log_ingestor \
             /host/tmp/clp/var/log/mcp_server \
             /host/tmp/clp/var/tmp &&
    chown -R 999:999 /host/tmp/clp/var/data/database \
                     /host/tmp/clp/var/log/database \
                     /host/tmp/clp/var/data/queue \
                     /host/tmp/clp/var/log/queue \
                     /host/tmp/clp/var/data/redis \
                     /host/tmp/clp/var/log/redis \
                     /host/tmp/clp/var/data/results_cache \
                     /host/tmp/clp/var/log/results_cache &&
    chown -R 0:0 /host/tmp/clp/var/data/archives \
                       /host/tmp/clp/var/data/streams \
                       /host/tmp/clp/var/data/staged-archives \
                       /host/tmp/clp/var/data/staged-streams \
                       /host/tmp/clp/var/log/compression_scheduler \
                       /host/tmp/clp/var/log/compression_worker \
                       /host/tmp/clp/var/log/user \
                       /host/tmp/clp/var/log/query_scheduler \
                       /host/tmp/clp/var/log/query_worker \
                       /host/tmp/clp/var/log/reducer \
                       /host/tmp/clp/var/log/api_server \
                       /host/tmp/clp/var/log/garbage_collector \
                       /host/tmp/clp/var/log/log_ingestor \
                       /host/tmp/clp/var/log/mcp_server \
                       /host/tmp/clp/var/tmp
  '
done
kubectl get pods --field-selector=status.phase==Succeeded -o name | xargs -r kubectl delete

Note: busybox sh does not support brace expansion ({a,b}), so each directory must be listed
explicitly.

7. Deploy CLP via Helm

aws eks update-kubeconfig --name <cluster-name> --region <region>

cd tools/deployment/package-helm
helm install test . -f eks-values.yaml

Example eks-values.yaml:

securityContext:
  firstParty:
    uid: 0
    gid: 0
  thirdParty:
    uid: 999
    gid: 999

serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: "arn:aws:iam::<ACCOUNT_ID>:role/<web-identity-role-name>"

image:
  clpPackage:
    repository: "ghcr.io/<owner>/clp/clp-package"
    tag: "main"

distributedDeployment: true

clpConfig:
  logs_input:
    type: "s3"
    aws_authentication:
      type: "default"

  archive_output:
    storage:
      type: "s3"
      staging_directory: "/tmp/clp/var/data/staged-archives"
      s3_config:
        region_code: "<region>"
        bucket: "<bucket-name>"
        key_prefix: "archives/"
        aws_authentication:
          type: "default"

  stream_output:
    storage:
      type: "s3"
      staging_directory: "/tmp/clp/var/data/staged-streams"
      s3_config:
        region_code: "<region>"
        bucket: "<bucket-name>"
        key_prefix: "streams/"
        aws_authentication:
          type: "default"

Verify all pods reach Running/Completed:

kubectl get pods

8. Submit an S3 scanner job and verify compression

kubectl port-forward svc/test-clp-log-ingestor 3002:3002 &

curl -X POST http://127.0.0.1:3002/s3_scanner \
  -H "Content-Type: application/json" \
  -d '{"bucket_name": "<bucket>", "key_prefix": "<prefix>", "region": "<region>", "dataset": "default"}'

Verify the log-ingestor scanned the object:

kubectl logs -l app.kubernetes.io/component=log-ingestor --tail=10
# Should show: "Scanned new object metadata on S3."

Wait for the buffer flush (default 300s timeout), then verify compression:

kubectl logs -l app.kubernetes.io/component=compression-scheduler --tail=20
kubectl logs -l app.kubernetes.io/component=compression-worker --tail=20
[2026-02-19  23:42:51]	2026-02-19 15:42:51,919 compression_scheduler [INFO] Starting compression_scheduler
[2026-02-20  00:11:10]	2026-02-19 16:11:10,514 compression_scheduler [INFO] Dispatched job 1 with 1 tasks (0 remaining).
[2026-02-20  00:11:11]	2026-02-19 16:11:11,263 compression_scheduler [INFO] Compression task job-1-task-1 completed in 0.726594 second(s).
[2026-02-20  00:11:11]	2026-02-19 16:11:11,263 compression_scheduler [INFO] Job 1 succeeded (1 tasks completed).

The archive should appear in the S3 bucket under the configured key_prefix.

Summary by CodeRabbit

  • New Features

    • Default AWS authentication provider added (renamed from EC2); clients now accept optional inline credentials and otherwise use the SDK default credential chain.
  • Documentation

    • AWS S3 guide updated to describe the default credential provider chain and examples.
    • Added EKS IRSA (IAM Roles for Service Accounts) setup guide.
  • Tests

    • Added/designed tests covering deserialization and default-auth behaviour.
  • Chores

    • Consolidated Helm service-account usage, added service-account template and bumped chart version.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a "default" AWS authentication mode, makes AWS credentials optional for S3/SQS clients (credentials passed as Option<(access_key, secret)>), centralizes AwsAuthentication usage across Rust/Python, updates Helm to a single service account with IRSA-ready annotations, and adds EKS IRSA docs and tests.

Changes

Cohort / File(s) Summary
AWS auth enum & helpers
components/clp-rust-utils/src/clp_config/s3_config.rs, components/clp-py-utils/clp_py_utils/clp_config.py
Replace ec2 with default auth variant; add AwsAuthentication::Default and credentials_pair() returning optional credentials; update Python enum/validation to accept default.
Rust S3/SQS clients
components/clp-rust-utils/src/s3/client.rs, components/clp-rust-utils/src/sqs/client.rs
Change create_new_client() signatures to accept region and credentials: Option<(&str,&str)>; conditionally attach explicit credentials or rely on the AWS SDK default provider chain.
Call sites & auth propagation
components/log-ingestor/src/aws_client_manager.rs, components/log-ingestor/src/compression/compression_job_submitter.rs, components/log-ingestor/src/ingestion_job_manager.rs, components/api-server/src/client.rs, components/log-ingestor/tests/test_ingestion_job.rs
Refactor call sites to pass AwsAuthentication / use credentials_pair() and derived region string; update tests to supply credentials as Some((access_key_id, secret_access_key)) where required.
Python S3 utils
components/clp-py-utils/clp_py_utils/s3_utils.py
Treat default like SDK default provider chain instead of EC2 metadata; update credential retrieval and error messaging accordingly.
Config & tests
components/clp-rust-utils/src/clp_config/package/config.rs, components/log-ingestor/tests/test_ingestion_job.rs
Add unit test for deserializing default auth; adjust existing tests to account for Default variant.
API server client
components/api-server/src/client.rs
Adjust client creation to derive region string and pass credentials_pair() instead of separate access key/secret args.
Helm: service accounts & templates
tools/deployment/package-helm/templates/service-account.yaml, tools/deployment/package-helm/templates/service-account-role.yaml, tools/deployment/package-helm/templates/service-account-rolebinding.yaml, tools/deployment/package-helm/templates/*-deployment.yaml, tools/deployment/package-helm/templates/*-job.yaml, tools/deployment/package-helm/templates/job-watcher-serviceaccount.yaml
Consolidate to a single -service-account; remove job-watcher SA template; add ServiceAccount template; update Role/RoleBinding and pod templates to use -service-account.
Helm values & chart
tools/deployment/package-helm/values.yaml, tools/deployment/package-helm/Chart.yaml
Add serviceAccount.annotations (IRSA support) and bump chart version to 0.1.4-dev.3.
Docs: AWS S3 auth & IRSA
docs/src/user-docs/guides-using-object-storage/aws-s3/clp-config.md, docs/src/user-docs/guides-using-object-storage/aws-s3/index.md, docs/src/user-docs/guides-using-object-storage/aws-s3/eks-irsa-setup.md
Replace EC2-specific guidance with AWS SDK default credential provider chain documentation; add EKS IRSA setup guide and examples.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Pod as Pod (CLP)
participant Kube as Kubernetes API / ServiceAccount
participant SDK as AWS SDK (default provider chain)
participant AWS as AWS STS/S3/SQS
Pod->>Kube: Pod runs with serviceAccount annotated for IRSA
Pod->>SDK: SDK requests credentials via default provider chain
SDK->>Kube: Read web identity token / container env / metadata
SDK->>AWS: Exchange token for temporary credentials (STS) or use env creds
SDK->>AWS: Call S3/SQS with obtained credentials
AWS-->>SDK: Return response
SDK-->>Pod: Return data

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main changes: introducing default AWS SDK credential provider chain support and adding IRSA support to Helm chart.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@junhaoliao junhaoliao linked an issue Feb 9, 2026 that may be closed by this pull request
@junhaoliao junhaoliao added this to the Mid-February 2026 milestone Feb 9, 2026
@junhaoliao junhaoliao changed the title feat(log-ingestor): Support EC2 IAM role authentication using the default AWS SDK credential provider chain (resolves #1939). feat(log-ingestor)!: Support the default AWS SDK credential provider chain (resolves #1939); Add serviceAccount.annotations to the Helm chart for IRSA. Feb 19, 2026
@junhaoliao junhaoliao marked this pull request as ready for review February 19, 2026 05:51
@junhaoliao junhaoliao requested a review from a team as a code owner February 19, 2026 05:51
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/clp-py-utils/clp_py_utils/s3_utils.py`:
- Line 96: The long literal passed to ValueError at the raise statement in
s3_utils.py should be shortened or replaced with a dedicated exception class;
create a specific exception (e.g., S3AuthenticationError or AuthenticationError)
in this module, give it a concise message (or no message) at the raise site, and
move any detailed explanation into the exception class docstring or its
instantiation where needed; update the raise ValueError(...) to raise
S3AuthenticationError(...) (or raise S3AuthenticationError without a long
message) so the linter warning (TRY003) is resolved while preserving detailed
context in the exception definition.

In `@components/clp-rust-utils/src/clp_config/s3_config.rs`:
- Around line 31-38: The method credentials_pair is unnecessarily declared as
const fn, imposing a newer Rust version constraint due to calling
String::as_str() in const contexts; change its signature from pub const fn
credentials_pair(&self) -> Option<(&str, &str)> to a normal pub fn
credentials_pair(&self) -> Option<(&str, &str)> and keep the match on
Self::{Credentials, Default} and the returned tuple using
credentials.access_key_id.as_str() and credentials.secret_access_key.as_str();
update nothing else in the call sites (aws_client_manager and api-server) since
they are runtime-only.

In `@docs/src/user-docs/guides-using-object-storage/aws-s3/clp-config.md`:
- Line 197: Update the broken cross-document anchor in the markdown line
containing "EC2 instance metadata / [IAM roles
attached](index.md#ec2-instance-iam-roles)" by replacing the old anchor
"#ec2-instance-iam-roles" with the new "#default-credential-provider-chain" so
the link becomes "EC2 instance metadata / [IAM roles
attached](index.md#default-credential-provider-chain)"; ensure the visible link
text remains unchanged and verify the MyST build passes.

In `@docs/src/user-docs/guides-using-object-storage/aws-s3/eks-irsa-setup.md`:
- Around line 87-91: Replace the helm command that always runs a fresh install
with an idempotent upgrade-or-install invocation: change the usage of "helm
install <release> clp/clp -f values.yaml" to use "helm upgrade --install
<release> clp/clp -f values.yaml" so that deploying or upgrading the CLP Helm
release (referenced as <release> and chart clp/clp) works whether the release
already exists or not.
- Around line 42-66: The trust-policy JSON references a fourth placeholder
`<region>` (in the OIDC provider ARN and the two condition keys) but the lead-in
only lists `<ACCOUNT_ID>`, `<OIDC_ID>`, and `<namespace>`; update the
instructions to include `<region>` as a required replacement and explicitly
state that users must replace all occurrences of `<region>` in the trust policy
(the ARN
"arn:aws:iam::<ACCOUNT_ID>:oidc-provider/oidc.eks.<region>.amazonaws.com/id/<OIDC_ID>"
and the condition keys "oidc.eks.<region>.amazonaws.com/id/<OIDC_ID>:aud" and
"...:sub"), so the generated policy is valid.

In `@tools/deployment/package-helm/templates/service-account-role.yaml`:
- Line 4: Rename the Role resource to make its purpose explicit by changing the
name template from 'name: {{ include "clp.fullname" . }}-service-account' to
include a '-role' suffix (e.g. '{{ include "clp.fullname" .
}}-service-account-role'); then update the corresponding RoleBinding resource
reference in service-account-rolebinding.yaml to point to the new Role name so
the RoleBinding still binds the ServiceAccount to '{{ include "clp.fullname" .
}}-service-account-role'.

In `@tools/deployment/package-helm/templates/service-account-rolebinding.yaml`:
- Around line 4-13: The Helm template files containing constructs like {{
include "clp.fullname" . }} are causing YAMLlint false positives; add a
project-level .yamllint configuration to either ignore the Helm templates
directory (e.g., ignore patterns matching templates/) or add rule overrides to
ignore/template braces so YAMLlint won't parse Go template syntax, and update
the CI lint job to use that .yamllint so files under templates/ (including the
service-account-rolebinding template) are skipped or exempted from those
specific rules.

In `@tools/deployment/package-helm/templates/service-account.yaml`:
- Around line 1-10: The chart currently creates a single ServiceAccount via the
ServiceAccount template (name: {{ include "clp.fullname" . }}-service-account)
which consolidates IAM/RBAC for all components; split this into per-component
ServiceAccounts to preserve least-privilege. Add separate templates (e.g.,
service-account-api, service-account-worker, service-account-scheduler,
service-account-jobwatcher) that use distinct names and annotations (keep the
existing use of .Values.serviceAccount.annotations but expose per-component keys
in values.yaml), update Deployments/StatefulSets/Jobs to reference the
appropriate serviceAccountName for each component, and make creation conditional
via new values (e.g., .Values.api.serviceAccount.create,
.Values.worker.serviceAccount.create) so you can assign different IRSA roles and
RBAC bindings per component.

aws_credentials = _get_session_credentials()
if aws_credentials is None:
raise ValueError("Failed to authenticate with EC2 metadata.")
raise ValueError("Failed to authenticate with the default credential provider chain.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Ruff TRY003: long error message should be kept concise or moved to an exception class.

The Ruff linter flags line 96 (TRY003) because lengthy messages passed directly to ValueError are discouraged; the message should either be shortened or encapsulated in a dedicated exception subclass.

🧰 Tools
🪛 Ruff (0.15.1)

[warning] 96-96: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/clp-py-utils/clp_py_utils/s3_utils.py` at line 96, The long
literal passed to ValueError at the raise statement in s3_utils.py should be
shortened or replaced with a dedicated exception class; create a specific
exception (e.g., S3AuthenticationError or AuthenticationError) in this module,
give it a concise message (or no message) at the raise site, and move any
detailed explanation into the exception class docstring or its instantiation
where needed; update the raise ValueError(...) to raise
S3AuthenticationError(...) (or raise S3AuthenticationError without a long
message) so the linter warning (TRY003) is resolved while preserving detailed
context in the exception definition.

Comment on lines +31 to +38
pub const fn credentials_pair(&self) -> Option<(&str, &str)> {
match self {
Self::Credentials { credentials } => Some((
credentials.access_key_id.as_str(),
credentials.secret_access_key.as_str(),
)),
Self::Default => None,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find rust-toolchain or rust-toolchain.toml files
echo "=== Searching for Rust toolchain files ==="
find . -type f \( -name "rust-toolchain" -o -name "rust-toolchain.toml" \) 2>/dev/null

# Search for rust-version (MSRV) declarations in Cargo.toml files
echo ""
echo "=== Searching for rust-version (MSRV) in Cargo.toml files ==="
rg -n 'rust-version\s*=' --type toml 2>/dev/null | head -20

# Also check for any MSRV mentions in root Cargo.toml
echo ""
echo "=== Checking root Cargo.toml ==="
if [ -f "Cargo.toml" ]; then
    cat Cargo.toml | head -50
fi

Repository: y-scope/clp

Length of output: 327


🏁 Script executed:

#!/bin/bash
# Find all call sites of credentials_pair() in the codebase
echo "=== Finding all call sites of credentials_pair() ==="
rg -n 'credentials_pair\(' --type rust

Repository: y-scope/clp

Length of output: 577


🌐 Web query:

String::as_str() stabilized as const fn Rust version release notes

💡 Result:

String::as_str() became usable in const contexts (i.e., effectively “stabilized as const”) in Rust 1.87.0, released 2025-05-15. This is listed under “These previously stable APIs are now stable in const contexts”, including String::as_str. [1][2]

Sources (release notes / announcement):
[1] Rust 1.87.0 release notes entry in the official release notes index (lists String::as_str under “stable in const contexts”). (doc.rust-lang.org)
[2] “Announcing Rust 1.87.0…” blog post (same list, includes String::as_str). (blog.rust-lang.org)

Citations:


const fn on credentials_pair adds unnecessary Rust version constraint without benefit.

String::as_str() became stable in const contexts in Rust 1.87.0 (May 2025). All three call sites of credentials_pair() in this PR are runtime-only (lines 57 and 91 in log-ingestor/src/aws_client_manager.rs, and line 389 in api-server/src/client.rs), so the const annotation provides no compile-time advantage. The project has no declared MSRV, but removing const would ensure the code compiles on any Rust version without this constraint.

♻️ Proposed fix
-    pub const fn credentials_pair(&self) -> Option<(&str, &str)> {
+    pub fn credentials_pair(&self) -> Option<(&str, &str)> {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub const fn credentials_pair(&self) -> Option<(&str, &str)> {
match self {
Self::Credentials { credentials } => Some((
credentials.access_key_id.as_str(),
credentials.secret_access_key.as_str(),
)),
Self::Default => None,
}
pub fn credentials_pair(&self) -> Option<(&str, &str)> {
match self {
Self::Credentials { credentials } => Some((
credentials.access_key_id.as_str(),
credentials.secret_access_key.as_str(),
)),
Self::Default => None,
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/clp-rust-utils/src/clp_config/s3_config.rs` around lines 31 - 38,
The method credentials_pair is unnecessarily declared as const fn, imposing a
newer Rust version constraint due to calling String::as_str() in const contexts;
change its signature from pub const fn credentials_pair(&self) -> Option<(&str,
&str)> to a normal pub fn credentials_pair(&self) -> Option<(&str, &str)> and
keep the match on Self::{Credentials, Default} and the returned tuple using
credentials.access_key_id.as_str() and credentials.secret_access_key.as_str();
update nothing else in the call sites (aws_client_manager and api-server) since
they are runtime-only.

Comment thread docs/src/user-docs/guides-using-object-storage/aws-s3/clp-config.md Outdated
Comment on lines +42 to +66
Create a role with a trust policy that allows CLP's ServiceAccount to assume it. Replace
`<ACCOUNT_ID>`, `<OIDC_ID>`, and `<namespace>` with the appropriate values:

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<ACCOUNT_ID>:oidc-provider/oidc.eks.<region>.amazonaws.com/id/<OIDC_ID>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.<region>.amazonaws.com/id/<OIDC_ID>:aud": "sts.amazonaws.com"
},
"StringLike": {
"oidc.eks.<region>.amazonaws.com/id/<OIDC_ID>:sub": "system:serviceaccount:<namespace>:*"
}
}
}
]
}
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

<region> placeholder missing from the replacement instructions

Lines 52, 57, and 60 all contain <region> — three separate occurrences inside the trust-policy JSON — but the lead-in on line 43 only lists <ACCOUNT_ID>, <OIDC_ID>, and <namespace>. A user who follows the instructions literally will produce an invalid policy with literal <region> strings.

📝 Suggested fix
-Create a role with a trust policy that allows CLP's ServiceAccount to assume it. Replace
-`<ACCOUNT_ID>`, `<OIDC_ID>`, and `<namespace>` with the appropriate values:
+Create a role with a trust policy that allows CLP's ServiceAccount to assume it. Replace
+`<ACCOUNT_ID>`, `<OIDC_ID>`, `<region>`, and `<namespace>` with the appropriate values:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/user-docs/guides-using-object-storage/aws-s3/eks-irsa-setup.md`
around lines 42 - 66, The trust-policy JSON references a fourth placeholder
`<region>` (in the OIDC provider ARN and the two condition keys) but the lead-in
only lists `<ACCOUNT_ID>`, `<OIDC_ID>`, and `<namespace>`; update the
instructions to include `<region>` as a required replacement and explicitly
state that users must replace all occurrences of `<region>` in the trust policy
(the ARN
"arn:aws:iam::<ACCOUNT_ID>:oidc-provider/oidc.eks.<region>.amazonaws.com/id/<OIDC_ID>"
and the condition keys "oidc.eks.<region>.amazonaws.com/id/<OIDC_ID>:aud" and
"...:sub"), so the generated policy is valid.

Comment on lines +87 to +91
Then install (or upgrade) the Helm chart:

```bash
helm install <release> clp/clp -f values.yaml
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use helm upgrade --install instead of helm install

The prerequisites state that CLP is already "deployed via Helm," so helm install would fail for an existing release. helm upgrade --install handles both the first-install and upgrade cases safely.

📝 Suggested fix
-helm install <release> clp/clp -f values.yaml
+helm upgrade --install <release> clp/clp -f values.yaml
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Then install (or upgrade) the Helm chart:
```bash
helm install <release> clp/clp -f values.yaml
```
Then install (or upgrade) the Helm chart:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/user-docs/guides-using-object-storage/aws-s3/eks-irsa-setup.md`
around lines 87 - 91, Replace the helm command that always runs a fresh install
with an idempotent upgrade-or-install invocation: change the usage of "helm
install <release> clp/clp -f values.yaml" to use "helm upgrade --install
<release> clp/clp -f values.yaml" so that deploying or upgrading the CLP Helm
release (referenced as <release> and chart clp/clp) works whether the release
already exists or not.

kind: "Role"
metadata:
name: {{ include "clp.fullname" . }}-job-watcher
name: {{ include "clp.fullname" . }}-service-account
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Optional: disambiguate the Role name from the ServiceAccount name.

The Role is now named {fullname}-service-account — identical to the ServiceAccount in service-account.yaml. Running kubectl get role,serviceaccount in the namespace will list two resources sharing the same base name but different kinds, which can be confusing during operations/debugging.

Consider adding a -role suffix (e.g., {fullname}-service-account-role) to make the resource type self-evident from its name.

♻️ Proposed rename
-  name: {{ include "clp.fullname" . }}-service-account
+  name: {{ include "clp.fullname" . }}-service-account-role

Remember to update service-account-rolebinding.yaml accordingly if this change is applied.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
name: {{ include "clp.fullname" . }}-service-account
name: {{ include "clp.fullname" . }}-service-account-role
🧰 Tools
🪛 YAMLlint (1.38.0)

[error] 4-4: too many spaces inside braces

(braces)


[error] 4-4: too many spaces inside braces

(braces)


[error] 4-4: syntax error: expected , but found ''

(syntax)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/deployment/package-helm/templates/service-account-role.yaml` at line 4,
Rename the Role resource to make its purpose explicit by changing the name
template from 'name: {{ include "clp.fullname" . }}-service-account' to include
a '-role' suffix (e.g. '{{ include "clp.fullname" . }}-service-account-role');
then update the corresponding RoleBinding resource reference in
service-account-rolebinding.yaml to point to the new Role name so the
RoleBinding still binds the ServiceAccount to '{{ include "clp.fullname" .
}}-service-account-role'.

Comment on lines +4 to +13
name: {{ include "clp.fullname" . }}-service-account
labels:
{{- include "clp.labels" . | nindent 4 }}
subjects:
- kind: "ServiceAccount"
name: {{ include "clp.fullname" . }}-job-watcher
name: {{ include "clp.fullname" . }}-service-account
namespace: {{ .Release.Namespace }}
roleRef:
kind: "Role"
name: {{ include "clp.fullname" . }}-job-watcher
name: {{ include "clp.fullname" . }}-service-account
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider suppressing YAMLlint false positives for Helm templates.

All YAMLlint errors reported by static analysis (braces spacing, syntax error) are false positives caused by Helm's Go template {{ }} syntax, which YAMLlint cannot parse. If YAMLlint runs in CI on this path, it will always produce noise for every template file. A project-level .yamllint configuration can suppress these:

🔧 Suggested .yamllint config
+rules:
+  braces:
+    forbid: false
+    min-spaces-inside: 0
+    max-spaces-inside: -1  # ignore — Helm templates use {{ expr }}
+  syntax: disable          # Go template directives are not valid YAML

Alternatively, exclude the templates/ directory from YAMLlint scans entirely:

ignore: |
  tools/deployment/package-helm/templates/
🧰 Tools
🪛 YAMLlint (1.38.0)

[error] 4-4: too many spaces inside braces

(braces)


[error] 4-4: too many spaces inside braces

(braces)


[error] 4-4: syntax error: expected , but found ''

(syntax)


[error] 9-9: too many spaces inside braces

(braces)


[error] 9-9: too many spaces inside braces

(braces)


[error] 10-10: too many spaces inside braces

(braces)


[error] 10-10: too many spaces inside braces

(braces)


[error] 13-13: too many spaces inside braces

(braces)


[error] 13-13: too many spaces inside braces

(braces)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/deployment/package-helm/templates/service-account-rolebinding.yaml`
around lines 4 - 13, The Helm template files containing constructs like {{
include "clp.fullname" . }} are causing YAMLlint false positives; add a
project-level .yamllint configuration to either ignore the Helm templates
directory (e.g., ignore patterns matching templates/) or add rule overrides to
ignore/template braces so YAMLlint won't parse Go template syntax, and update
the CI lint job to use that .yamllint so files under templates/ (including the
service-account-rolebinding template) are skipped or exempted from those
specific rules.

Comment on lines +1 to +10
apiVersion: "v1"
kind: "ServiceAccount"
metadata:
name: {{ include "clp.fullname" . }}-service-account
labels:
{{- include "clp.labels" . | nindent 4 }}
{{- with .Values.serviceAccount.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Operational note: all components now share one IAM role and one Kubernetes RBAC role.

Consolidating every pod (api-server, schedulers, workers, jobs) under a single ServiceAccount simplifies IRSA configuration, but it means:

  1. AWS permissions: any pod can assume the IRSA role. If a worker is compromised, it has the same S3/SQS access as every other component.
  2. Kubernetes RBAC: the associated Role (get/list/watch on batch/jobs and pods) was previously scoped only to the job-watcher. All components now inherit those permissions.

Both are read-oriented and low blast radius, but if stricter least-privilege is required in future (e.g., separating S3-needing components from non-S3 ones), separate service accounts per component tier would be the path forward.

🧰 Tools
🪛 YAMLlint (1.38.0)

[error] 4-4: too many spaces inside braces

(braces)


[error] 4-4: too many spaces inside braces

(braces)


[error] 4-4: syntax error: expected , but found ''

(syntax)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/deployment/package-helm/templates/service-account.yaml` around lines 1
- 10, The chart currently creates a single ServiceAccount via the ServiceAccount
template (name: {{ include "clp.fullname" . }}-service-account) which
consolidates IAM/RBAC for all components; split this into per-component
ServiceAccounts to preserve least-privilege. Add separate templates (e.g.,
service-account-api, service-account-worker, service-account-scheduler,
service-account-jobwatcher) that use distinct names and annotations (keep the
existing use of .Values.serviceAccount.annotations but expose per-component keys
in values.yaml), update Deployments/StatefulSets/Jobs to reference the
appropriate serviceAccountName for each component, and make creation conditional
via new values (e.g., .Values.api.serviceAccount.create,
.Values.worker.serviceAccount.create) so you can assign different IRSA roles and
RBAC bindings per component.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/src/user-docs/guides-using-object-storage/aws-s3/clp-config.md (1)

163-173: 🧹 Nitpick | 🔵 Trivial

Consider adding a disambiguation note for env_vars vs. default.

Both env_vars and default pick up AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY from the environment, which can be confusing to users choosing between them. A one-liner clarifying that env_vars restricts resolution exclusively to those environment variables (and fails if they are absent), whereas default cascades through multiple sources with environment variables as the highest priority, would remove that ambiguity.

📝 Suggested addition to the `env_vars` section
 aws_authentication:
   type: "env_vars"

The environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY should be used to specify
a set of long-term IAM user credentials.
+
+:::{note}
+Unlike default, env_vars does not fall back to other credential sources if the environment
+variables are absent.
+:::

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/src/user-docs/guides-using-object-storage/aws-s3/clp-config.md around
lines 163 - 173, Add a short disambiguation note under the env_vars section
clarifying the difference between env_vars and default: state that env_vars uses
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY exclusively and will fail if those
environment variables are absent, whereas default will cascade through multiple
credential sources with environment variables taking highest priority; reference
the config type names env_vars and default and insert the note immediately after
the paragraph that mentions the environment variables.


</details>

</blockquote></details>

</blockquote></details>
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@docs/src/user-docs/guides-using-object-storage/aws-s3/clp-config.md`:
- Around line 163-173: Add a short disambiguation note under the env_vars
section clarifying the difference between env_vars and default: state that
env_vars uses AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY exclusively and will fail
if those environment variables are absent, whereas default will cascade through
multiple credential sources with environment variables taking highest priority;
reference the config type names env_vars and default and insert the note
immediately after the paragraph that mentions the environment variables.

Copy link
Copy Markdown
Contributor

@hoophalab hoophalab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments before merging and resolving conflicts

Comment thread components/clp-rust-utils/src/clp_config/package/config.rs Outdated
secret_access_key: &str,
region_id: &str,
endpoint: Option<&NonEmptyString>,
credentials: Option<(&str, &str)>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zhihao suggested that instead of Option<(&str, &str)> it might be best to take an AwsAuthentication as argument in both create_new_client

/// credentials, or `None` for authentication methods that rely on the default credential
/// provider chain.
#[must_use]
pub const fn credentials_pair(&self) -> Option<(&str, &str)> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be removed if the suggestion for create_new_client is accepted.

junhaoliao and others added 2 commits March 11, 2026 02:52
Co-authored-by: hoophalab <200652805+hoophalab@users.noreply.github.com>
…tency and adjust `credentials_pair` match arm accordingly
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/clp-py-utils/clp_py_utils/clp_config.py`:
- Around line 167-171: validate_aws_config_dir currently rejects
aws_config_directory unless AwsAuthType.profile is used; update
ClpConfig.validate_aws_config_dir to permit aws_config_directory when
self.aws_auth_type is either AwsAuthType.profile or AwsAuthType.default (i.e.,
allow boto3's default chain to use mounted ~/.aws), and only raise the rejection
error for other AwsAuthType values; reference AwsAuthType (enum) and
ClpConfig.validate_aws_config_dir to locate and change the conditional check
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8994c154-a3bb-4852-acf2-f9c5e0ed04a6

📥 Commits

Reviewing files that changed from the base of the PR and between 00b69b9 and 95116bb.

📒 Files selected for processing (2)
  • components/clp-py-utils/clp_py_utils/clp_config.py
  • components/clp-rust-utils/src/clp_config/s3_config.rs

Comment thread components/clp-py-utils/clp_py_utils/clp_config.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)

969-983: ⚠️ Potential issue | 🟡 Minor

Reject non-directory aws_config_directory values.

This block only checks exists(). An existing file will pass validation here, then fail later when startup tries to use it as an AWS config directory. Catch that misconfiguration here with an is_dir() check as well.

Suggested fix
         if self.aws_config_directory is not None:
             if not config_dir_allowed:
                 raise ValueError(
                     "aws_config_directory is only supported with 'profile' or 'default'"
                     " authentication"
                 )
             resolved_aws_config_dir = (
                 resolve_host_path_in_container(self.aws_config_directory)
                 if use_host_mount
                 else self.aws_config_directory
             )
             if not resolved_aws_config_dir.exists():
                 raise ValueError(
                     f"aws_config_directory does not exist: '{self.aws_config_directory}'"
                 )
+            if not resolved_aws_config_dir.is_dir():
+                raise ValueError(
+                    f"aws_config_directory is not a directory: '{self.aws_config_directory}'"
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/clp-py-utils/clp_py_utils/clp_config.py` around lines 969 - 983,
The validation for aws_config_directory (in the branch that checks
config_dir_allowed) only verifies existence but not that the path is a
directory; update the check after resolving resolved_aws_config_dir (which may
be transformed by resolve_host_path_in_container when use_host_mount is true) to
raise ValueError if not resolved_aws_config_dir.is_dir(), ensuring
aws_config_directory is an existing directory rather than a file; keep the
existing error message pattern referencing the original aws_config_directory
value and perform this is_dir() check alongside the current exists() check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@components/clp-py-utils/clp_py_utils/clp_config.py`:
- Around line 969-983: The validation for aws_config_directory (in the branch
that checks config_dir_allowed) only verifies existence but not that the path is
a directory; update the check after resolving resolved_aws_config_dir (which may
be transformed by resolve_host_path_in_container when use_host_mount is true) to
raise ValueError if not resolved_aws_config_dir.is_dir(), ensuring
aws_config_directory is an existing directory rather than a file; keep the
existing error message pattern referencing the original aws_config_directory
value and perform this is_dir() check alongside the current exists() check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 99303d68-e1b0-4cd8-b742-374b363286a6

📥 Commits

Reviewing files that changed from the base of the PR and between 95116bb and 9574f9d.

📒 Files selected for processing (1)
  • components/clp-py-utils/clp_py_utils/clp_config.py

# Conflicts:
#	components/log-ingestor/src/compression/compression_job_submitter.rs
#	components/log-ingestor/src/ingestion_job_manager.rs
#	tools/deployment/package-helm/Chart.yaml
Copy link
Copy Markdown
Contributor

@hoophalab hoophalab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Validations: Successfully ingested logs and stored archives to S3 on EKS using IRSA

@junhaoliao junhaoliao merged commit 7b03667 into y-scope:main Mar 12, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support IAM roles for AWS authentication in Helm deployments

2 participants