feat(log-ingestor)!: Support the default AWS SDK credential provider chain (resolves #1939); Add serviceAccount.annotations to the Helm chart for IRSA.#1962
Conversation
…ault AWS SDK credential provider chain (resolves y-scope#1939).
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a "default" AWS authentication mode, makes AWS credentials optional for S3/SQS clients (credentials passed as Option<(access_key, secret)>), centralizes AwsAuthentication usage across Rust/Python, updates Helm to a single service account with IRSA-ready annotations, and adds EKS IRSA docs and tests. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…atting for better readability
… authentication type to default
serviceAccount.annotations to the Helm chart for IRSA.
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/clp-py-utils/clp_py_utils/s3_utils.py`:
- Line 96: The long literal passed to ValueError at the raise statement in
s3_utils.py should be shortened or replaced with a dedicated exception class;
create a specific exception (e.g., S3AuthenticationError or AuthenticationError)
in this module, give it a concise message (or no message) at the raise site, and
move any detailed explanation into the exception class docstring or its
instantiation where needed; update the raise ValueError(...) to raise
S3AuthenticationError(...) (or raise S3AuthenticationError without a long
message) so the linter warning (TRY003) is resolved while preserving detailed
context in the exception definition.
In `@components/clp-rust-utils/src/clp_config/s3_config.rs`:
- Around line 31-38: The method credentials_pair is unnecessarily declared as
const fn, imposing a newer Rust version constraint due to calling
String::as_str() in const contexts; change its signature from pub const fn
credentials_pair(&self) -> Option<(&str, &str)> to a normal pub fn
credentials_pair(&self) -> Option<(&str, &str)> and keep the match on
Self::{Credentials, Default} and the returned tuple using
credentials.access_key_id.as_str() and credentials.secret_access_key.as_str();
update nothing else in the call sites (aws_client_manager and api-server) since
they are runtime-only.
In `@docs/src/user-docs/guides-using-object-storage/aws-s3/clp-config.md`:
- Line 197: Update the broken cross-document anchor in the markdown line
containing "EC2 instance metadata / [IAM roles
attached](index.md#ec2-instance-iam-roles)" by replacing the old anchor
"#ec2-instance-iam-roles" with the new "#default-credential-provider-chain" so
the link becomes "EC2 instance metadata / [IAM roles
attached](index.md#default-credential-provider-chain)"; ensure the visible link
text remains unchanged and verify the MyST build passes.
In `@docs/src/user-docs/guides-using-object-storage/aws-s3/eks-irsa-setup.md`:
- Around line 87-91: Replace the helm command that always runs a fresh install
with an idempotent upgrade-or-install invocation: change the usage of "helm
install <release> clp/clp -f values.yaml" to use "helm upgrade --install
<release> clp/clp -f values.yaml" so that deploying or upgrading the CLP Helm
release (referenced as <release> and chart clp/clp) works whether the release
already exists or not.
- Around line 42-66: The trust-policy JSON references a fourth placeholder
`<region>` (in the OIDC provider ARN and the two condition keys) but the lead-in
only lists `<ACCOUNT_ID>`, `<OIDC_ID>`, and `<namespace>`; update the
instructions to include `<region>` as a required replacement and explicitly
state that users must replace all occurrences of `<region>` in the trust policy
(the ARN
"arn:aws:iam::<ACCOUNT_ID>:oidc-provider/oidc.eks.<region>.amazonaws.com/id/<OIDC_ID>"
and the condition keys "oidc.eks.<region>.amazonaws.com/id/<OIDC_ID>:aud" and
"...:sub"), so the generated policy is valid.
In `@tools/deployment/package-helm/templates/service-account-role.yaml`:
- Line 4: Rename the Role resource to make its purpose explicit by changing the
name template from 'name: {{ include "clp.fullname" . }}-service-account' to
include a '-role' suffix (e.g. '{{ include "clp.fullname" .
}}-service-account-role'); then update the corresponding RoleBinding resource
reference in service-account-rolebinding.yaml to point to the new Role name so
the RoleBinding still binds the ServiceAccount to '{{ include "clp.fullname" .
}}-service-account-role'.
In `@tools/deployment/package-helm/templates/service-account-rolebinding.yaml`:
- Around line 4-13: The Helm template files containing constructs like {{
include "clp.fullname" . }} are causing YAMLlint false positives; add a
project-level .yamllint configuration to either ignore the Helm templates
directory (e.g., ignore patterns matching templates/) or add rule overrides to
ignore/template braces so YAMLlint won't parse Go template syntax, and update
the CI lint job to use that .yamllint so files under templates/ (including the
service-account-rolebinding template) are skipped or exempted from those
specific rules.
In `@tools/deployment/package-helm/templates/service-account.yaml`:
- Around line 1-10: The chart currently creates a single ServiceAccount via the
ServiceAccount template (name: {{ include "clp.fullname" . }}-service-account)
which consolidates IAM/RBAC for all components; split this into per-component
ServiceAccounts to preserve least-privilege. Add separate templates (e.g.,
service-account-api, service-account-worker, service-account-scheduler,
service-account-jobwatcher) that use distinct names and annotations (keep the
existing use of .Values.serviceAccount.annotations but expose per-component keys
in values.yaml), update Deployments/StatefulSets/Jobs to reference the
appropriate serviceAccountName for each component, and make creation conditional
via new values (e.g., .Values.api.serviceAccount.create,
.Values.worker.serviceAccount.create) so you can assign different IRSA roles and
RBAC bindings per component.
| aws_credentials = _get_session_credentials() | ||
| if aws_credentials is None: | ||
| raise ValueError("Failed to authenticate with EC2 metadata.") | ||
| raise ValueError("Failed to authenticate with the default credential provider chain.") |
There was a problem hiding this comment.
Ruff TRY003: long error message should be kept concise or moved to an exception class.
The Ruff linter flags line 96 (TRY003) because lengthy messages passed directly to ValueError are discouraged; the message should either be shortened or encapsulated in a dedicated exception subclass.
🧰 Tools
🪛 Ruff (0.15.1)
[warning] 96-96: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/clp-py-utils/clp_py_utils/s3_utils.py` at line 96, The long
literal passed to ValueError at the raise statement in s3_utils.py should be
shortened or replaced with a dedicated exception class; create a specific
exception (e.g., S3AuthenticationError or AuthenticationError) in this module,
give it a concise message (or no message) at the raise site, and move any
detailed explanation into the exception class docstring or its instantiation
where needed; update the raise ValueError(...) to raise
S3AuthenticationError(...) (or raise S3AuthenticationError without a long
message) so the linter warning (TRY003) is resolved while preserving detailed
context in the exception definition.
| pub const fn credentials_pair(&self) -> Option<(&str, &str)> { | ||
| match self { | ||
| Self::Credentials { credentials } => Some(( | ||
| credentials.access_key_id.as_str(), | ||
| credentials.secret_access_key.as_str(), | ||
| )), | ||
| Self::Default => None, | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find rust-toolchain or rust-toolchain.toml files
echo "=== Searching for Rust toolchain files ==="
find . -type f \( -name "rust-toolchain" -o -name "rust-toolchain.toml" \) 2>/dev/null
# Search for rust-version (MSRV) declarations in Cargo.toml files
echo ""
echo "=== Searching for rust-version (MSRV) in Cargo.toml files ==="
rg -n 'rust-version\s*=' --type toml 2>/dev/null | head -20
# Also check for any MSRV mentions in root Cargo.toml
echo ""
echo "=== Checking root Cargo.toml ==="
if [ -f "Cargo.toml" ]; then
cat Cargo.toml | head -50
fiRepository: y-scope/clp
Length of output: 327
🏁 Script executed:
#!/bin/bash
# Find all call sites of credentials_pair() in the codebase
echo "=== Finding all call sites of credentials_pair() ==="
rg -n 'credentials_pair\(' --type rustRepository: y-scope/clp
Length of output: 577
🌐 Web query:
String::as_str() stabilized as const fn Rust version release notes
💡 Result:
String::as_str() became usable in const contexts (i.e., effectively “stabilized as const”) in Rust 1.87.0, released 2025-05-15. This is listed under “These previously stable APIs are now stable in const contexts”, including String::as_str. [1][2]
Sources (release notes / announcement):
[1] Rust 1.87.0 release notes entry in the official release notes index (lists String::as_str under “stable in const contexts”). (doc.rust-lang.org)
[2] “Announcing Rust 1.87.0…” blog post (same list, includes String::as_str). (blog.rust-lang.org)
Citations:
- 1: https://doc.rust-lang.org/stable/releases.html
- 2: https://blog.rust-lang.org/2025/05/15/Rust-1.87.0/?utm_source=openai
const fn on credentials_pair adds unnecessary Rust version constraint without benefit.
String::as_str() became stable in const contexts in Rust 1.87.0 (May 2025). All three call sites of credentials_pair() in this PR are runtime-only (lines 57 and 91 in log-ingestor/src/aws_client_manager.rs, and line 389 in api-server/src/client.rs), so the const annotation provides no compile-time advantage. The project has no declared MSRV, but removing const would ensure the code compiles on any Rust version without this constraint.
♻️ Proposed fix
- pub const fn credentials_pair(&self) -> Option<(&str, &str)> {
+ pub fn credentials_pair(&self) -> Option<(&str, &str)> {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pub const fn credentials_pair(&self) -> Option<(&str, &str)> { | |
| match self { | |
| Self::Credentials { credentials } => Some(( | |
| credentials.access_key_id.as_str(), | |
| credentials.secret_access_key.as_str(), | |
| )), | |
| Self::Default => None, | |
| } | |
| pub fn credentials_pair(&self) -> Option<(&str, &str)> { | |
| match self { | |
| Self::Credentials { credentials } => Some(( | |
| credentials.access_key_id.as_str(), | |
| credentials.secret_access_key.as_str(), | |
| )), | |
| Self::Default => None, | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/clp-rust-utils/src/clp_config/s3_config.rs` around lines 31 - 38,
The method credentials_pair is unnecessarily declared as const fn, imposing a
newer Rust version constraint due to calling String::as_str() in const contexts;
change its signature from pub const fn credentials_pair(&self) -> Option<(&str,
&str)> to a normal pub fn credentials_pair(&self) -> Option<(&str, &str)> and
keep the match on Self::{Credentials, Default} and the returned tuple using
credentials.access_key_id.as_str() and credentials.secret_access_key.as_str();
update nothing else in the call sites (aws_client_manager and api-server) since
they are runtime-only.
| Create a role with a trust policy that allows CLP's ServiceAccount to assume it. Replace | ||
| `<ACCOUNT_ID>`, `<OIDC_ID>`, and `<namespace>` with the appropriate values: | ||
|
|
||
| ```json | ||
| { | ||
| "Version": "2012-10-17", | ||
| "Statement": [ | ||
| { | ||
| "Effect": "Allow", | ||
| "Principal": { | ||
| "Federated": "arn:aws:iam::<ACCOUNT_ID>:oidc-provider/oidc.eks.<region>.amazonaws.com/id/<OIDC_ID>" | ||
| }, | ||
| "Action": "sts:AssumeRoleWithWebIdentity", | ||
| "Condition": { | ||
| "StringEquals": { | ||
| "oidc.eks.<region>.amazonaws.com/id/<OIDC_ID>:aud": "sts.amazonaws.com" | ||
| }, | ||
| "StringLike": { | ||
| "oidc.eks.<region>.amazonaws.com/id/<OIDC_ID>:sub": "system:serviceaccount:<namespace>:*" | ||
| } | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| ``` |
There was a problem hiding this comment.
<region> placeholder missing from the replacement instructions
Lines 52, 57, and 60 all contain <region> — three separate occurrences inside the trust-policy JSON — but the lead-in on line 43 only lists <ACCOUNT_ID>, <OIDC_ID>, and <namespace>. A user who follows the instructions literally will produce an invalid policy with literal <region> strings.
📝 Suggested fix
-Create a role with a trust policy that allows CLP's ServiceAccount to assume it. Replace
-`<ACCOUNT_ID>`, `<OIDC_ID>`, and `<namespace>` with the appropriate values:
+Create a role with a trust policy that allows CLP's ServiceAccount to assume it. Replace
+`<ACCOUNT_ID>`, `<OIDC_ID>`, `<region>`, and `<namespace>` with the appropriate values:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/src/user-docs/guides-using-object-storage/aws-s3/eks-irsa-setup.md`
around lines 42 - 66, The trust-policy JSON references a fourth placeholder
`<region>` (in the OIDC provider ARN and the two condition keys) but the lead-in
only lists `<ACCOUNT_ID>`, `<OIDC_ID>`, and `<namespace>`; update the
instructions to include `<region>` as a required replacement and explicitly
state that users must replace all occurrences of `<region>` in the trust policy
(the ARN
"arn:aws:iam::<ACCOUNT_ID>:oidc-provider/oidc.eks.<region>.amazonaws.com/id/<OIDC_ID>"
and the condition keys "oidc.eks.<region>.amazonaws.com/id/<OIDC_ID>:aud" and
"...:sub"), so the generated policy is valid.
| Then install (or upgrade) the Helm chart: | ||
|
|
||
| ```bash | ||
| helm install <release> clp/clp -f values.yaml | ||
| ``` |
There was a problem hiding this comment.
Use helm upgrade --install instead of helm install
The prerequisites state that CLP is already "deployed via Helm," so helm install would fail for an existing release. helm upgrade --install handles both the first-install and upgrade cases safely.
📝 Suggested fix
-helm install <release> clp/clp -f values.yaml
+helm upgrade --install <release> clp/clp -f values.yaml📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Then install (or upgrade) the Helm chart: | |
| ```bash | |
| helm install <release> clp/clp -f values.yaml | |
| ``` | |
| Then install (or upgrade) the Helm chart: | |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/src/user-docs/guides-using-object-storage/aws-s3/eks-irsa-setup.md`
around lines 87 - 91, Replace the helm command that always runs a fresh install
with an idempotent upgrade-or-install invocation: change the usage of "helm
install <release> clp/clp -f values.yaml" to use "helm upgrade --install
<release> clp/clp -f values.yaml" so that deploying or upgrading the CLP Helm
release (referenced as <release> and chart clp/clp) works whether the release
already exists or not.
| kind: "Role" | ||
| metadata: | ||
| name: {{ include "clp.fullname" . }}-job-watcher | ||
| name: {{ include "clp.fullname" . }}-service-account |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Optional: disambiguate the Role name from the ServiceAccount name.
The Role is now named {fullname}-service-account — identical to the ServiceAccount in service-account.yaml. Running kubectl get role,serviceaccount in the namespace will list two resources sharing the same base name but different kinds, which can be confusing during operations/debugging.
Consider adding a -role suffix (e.g., {fullname}-service-account-role) to make the resource type self-evident from its name.
♻️ Proposed rename
- name: {{ include "clp.fullname" . }}-service-account
+ name: {{ include "clp.fullname" . }}-service-account-roleRemember to update service-account-rolebinding.yaml accordingly if this change is applied.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| name: {{ include "clp.fullname" . }}-service-account | |
| name: {{ include "clp.fullname" . }}-service-account-role |
🧰 Tools
🪛 YAMLlint (1.38.0)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/deployment/package-helm/templates/service-account-role.yaml` at line 4,
Rename the Role resource to make its purpose explicit by changing the name
template from 'name: {{ include "clp.fullname" . }}-service-account' to include
a '-role' suffix (e.g. '{{ include "clp.fullname" . }}-service-account-role');
then update the corresponding RoleBinding resource reference in
service-account-rolebinding.yaml to point to the new Role name so the
RoleBinding still binds the ServiceAccount to '{{ include "clp.fullname" .
}}-service-account-role'.
| name: {{ include "clp.fullname" . }}-service-account | ||
| labels: | ||
| {{- include "clp.labels" . | nindent 4 }} | ||
| subjects: | ||
| - kind: "ServiceAccount" | ||
| name: {{ include "clp.fullname" . }}-job-watcher | ||
| name: {{ include "clp.fullname" . }}-service-account | ||
| namespace: {{ .Release.Namespace }} | ||
| roleRef: | ||
| kind: "Role" | ||
| name: {{ include "clp.fullname" . }}-job-watcher | ||
| name: {{ include "clp.fullname" . }}-service-account |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider suppressing YAMLlint false positives for Helm templates.
All YAMLlint errors reported by static analysis (braces spacing, syntax error) are false positives caused by Helm's Go template {{ }} syntax, which YAMLlint cannot parse. If YAMLlint runs in CI on this path, it will always produce noise for every template file. A project-level .yamllint configuration can suppress these:
🔧 Suggested .yamllint config
+rules:
+ braces:
+ forbid: false
+ min-spaces-inside: 0
+ max-spaces-inside: -1 # ignore — Helm templates use {{ expr }}
+ syntax: disable # Go template directives are not valid YAMLAlternatively, exclude the templates/ directory from YAMLlint scans entirely:
ignore: |
tools/deployment/package-helm/templates/🧰 Tools
🪛 YAMLlint (1.38.0)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
[error] 9-9: too many spaces inside braces
(braces)
[error] 9-9: too many spaces inside braces
(braces)
[error] 10-10: too many spaces inside braces
(braces)
[error] 10-10: too many spaces inside braces
(braces)
[error] 13-13: too many spaces inside braces
(braces)
[error] 13-13: too many spaces inside braces
(braces)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/deployment/package-helm/templates/service-account-rolebinding.yaml`
around lines 4 - 13, The Helm template files containing constructs like {{
include "clp.fullname" . }} are causing YAMLlint false positives; add a
project-level .yamllint configuration to either ignore the Helm templates
directory (e.g., ignore patterns matching templates/) or add rule overrides to
ignore/template braces so YAMLlint won't parse Go template syntax, and update
the CI lint job to use that .yamllint so files under templates/ (including the
service-account-rolebinding template) are skipped or exempted from those
specific rules.
| apiVersion: "v1" | ||
| kind: "ServiceAccount" | ||
| metadata: | ||
| name: {{ include "clp.fullname" . }}-service-account | ||
| labels: | ||
| {{- include "clp.labels" . | nindent 4 }} | ||
| {{- with .Values.serviceAccount.annotations }} | ||
| annotations: | ||
| {{- toYaml . | nindent 4 }} | ||
| {{- end }} |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Operational note: all components now share one IAM role and one Kubernetes RBAC role.
Consolidating every pod (api-server, schedulers, workers, jobs) under a single ServiceAccount simplifies IRSA configuration, but it means:
- AWS permissions: any pod can assume the IRSA role. If a worker is compromised, it has the same S3/SQS access as every other component.
- Kubernetes RBAC: the associated Role (
get/list/watchonbatch/jobsandpods) was previously scoped only to the job-watcher. All components now inherit those permissions.
Both are read-oriented and low blast radius, but if stricter least-privilege is required in future (e.g., separating S3-needing components from non-S3 ones), separate service accounts per component tier would be the path forward.
🧰 Tools
🪛 YAMLlint (1.38.0)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: too many spaces inside braces
(braces)
[error] 4-4: syntax error: expected , but found ''
(syntax)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/deployment/package-helm/templates/service-account.yaml` around lines 1
- 10, The chart currently creates a single ServiceAccount via the ServiceAccount
template (name: {{ include "clp.fullname" . }}-service-account) which
consolidates IAM/RBAC for all components; split this into per-component
ServiceAccounts to preserve least-privilege. Add separate templates (e.g.,
service-account-api, service-account-worker, service-account-scheduler,
service-account-jobwatcher) that use distinct names and annotations (keep the
existing use of .Values.serviceAccount.annotations but expose per-component keys
in values.yaml), update Deployments/StatefulSets/Jobs to reference the
appropriate serviceAccountName for each component, and make creation conditional
via new values (e.g., .Values.api.serviceAccount.create,
.Values.worker.serviceAccount.create) so you can assign different IRSA roles and
RBAC bindings per component.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/src/user-docs/guides-using-object-storage/aws-s3/clp-config.md (1)
163-173: 🧹 Nitpick | 🔵 TrivialConsider adding a disambiguation note for
env_varsvs.default.Both
env_varsanddefaultpick upAWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEYfrom the environment, which can be confusing to users choosing between them. A one-liner clarifying thatenv_varsrestricts resolution exclusively to those environment variables (and fails if they are absent), whereasdefaultcascades through multiple sources with environment variables as the highest priority, would remove that ambiguity.📝 Suggested addition to the `env_vars` section
aws_authentication: type: "env_vars"The environment variables
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYshould be used to specify
a set of long-term IAM user credentials.
+
+:::{note}
+Unlikedefault,env_varsdoes not fall back to other credential sources if the environment
+variables are absent.
+:::</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@docs/src/user-docs/guides-using-object-storage/aws-s3/clp-config.mdaround
lines 163 - 173, Add a short disambiguation note under the env_vars section
clarifying the difference between env_vars and default: state that env_vars uses
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY exclusively and will fail if those
environment variables are absent, whereas default will cascade through multiple
credential sources with environment variables taking highest priority; reference
the config type names env_vars and default and insert the note immediately after
the paragraph that mentions the environment variables.</details> </blockquote></details> </blockquote></details>🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed. Outside diff comments: In `@docs/src/user-docs/guides-using-object-storage/aws-s3/clp-config.md`: - Around line 163-173: Add a short disambiguation note under the env_vars section clarifying the difference between env_vars and default: state that env_vars uses AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY exclusively and will fail if those environment variables are absent, whereas default will cascade through multiple credential sources with environment variables taking highest priority; reference the config type names env_vars and default and insert the note immediately after the paragraph that mentions the environment variables.
hoophalab
left a comment
There was a problem hiding this comment.
Some comments before merging and resolving conflicts
| secret_access_key: &str, | ||
| region_id: &str, | ||
| endpoint: Option<&NonEmptyString>, | ||
| credentials: Option<(&str, &str)>, |
There was a problem hiding this comment.
Zhihao suggested that instead of Option<(&str, &str)> it might be best to take an AwsAuthentication as argument in both create_new_client
| /// credentials, or `None` for authentication methods that rely on the default credential | ||
| /// provider chain. | ||
| #[must_use] | ||
| pub const fn credentials_pair(&self) -> Option<(&str, &str)> { |
There was a problem hiding this comment.
this can be removed if the suggestion for create_new_client is accepted.
Co-authored-by: hoophalab <200652805+hoophalab@users.noreply.github.com>
…tency and adjust `credentials_pair` match arm accordingly
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/clp-py-utils/clp_py_utils/clp_config.py`:
- Around line 167-171: validate_aws_config_dir currently rejects
aws_config_directory unless AwsAuthType.profile is used; update
ClpConfig.validate_aws_config_dir to permit aws_config_directory when
self.aws_auth_type is either AwsAuthType.profile or AwsAuthType.default (i.e.,
allow boto3's default chain to use mounted ~/.aws), and only raise the rejection
error for other AwsAuthType values; reference AwsAuthType (enum) and
ClpConfig.validate_aws_config_dir to locate and change the conditional check
accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8994c154-a3bb-4852-acf2-f9c5e0ed04a6
📒 Files selected for processing (2)
components/clp-py-utils/clp_py_utils/clp_config.pycomponents/clp-rust-utils/src/clp_config/s3_config.rs
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
969-983:⚠️ Potential issue | 🟡 MinorReject non-directory
aws_config_directoryvalues.This block only checks
exists(). An existing file will pass validation here, then fail later when startup tries to use it as an AWS config directory. Catch that misconfiguration here with anis_dir()check as well.Suggested fix
if self.aws_config_directory is not None: if not config_dir_allowed: raise ValueError( "aws_config_directory is only supported with 'profile' or 'default'" " authentication" ) resolved_aws_config_dir = ( resolve_host_path_in_container(self.aws_config_directory) if use_host_mount else self.aws_config_directory ) if not resolved_aws_config_dir.exists(): raise ValueError( f"aws_config_directory does not exist: '{self.aws_config_directory}'" ) + if not resolved_aws_config_dir.is_dir(): + raise ValueError( + f"aws_config_directory is not a directory: '{self.aws_config_directory}'" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/clp-py-utils/clp_py_utils/clp_config.py` around lines 969 - 983, The validation for aws_config_directory (in the branch that checks config_dir_allowed) only verifies existence but not that the path is a directory; update the check after resolving resolved_aws_config_dir (which may be transformed by resolve_host_path_in_container when use_host_mount is true) to raise ValueError if not resolved_aws_config_dir.is_dir(), ensuring aws_config_directory is an existing directory rather than a file; keep the existing error message pattern referencing the original aws_config_directory value and perform this is_dir() check alongside the current exists() check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@components/clp-py-utils/clp_py_utils/clp_config.py`:
- Around line 969-983: The validation for aws_config_directory (in the branch
that checks config_dir_allowed) only verifies existence but not that the path is
a directory; update the check after resolving resolved_aws_config_dir (which may
be transformed by resolve_host_path_in_container when use_host_mount is true) to
raise ValueError if not resolved_aws_config_dir.is_dir(), ensuring
aws_config_directory is an existing directory rather than a file; keep the
existing error message pattern referencing the original aws_config_directory
value and perform this is_dir() check alongside the current exists() check.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 99303d68-e1b0-4cd8-b742-374b363286a6
📒 Files selected for processing (1)
components/clp-py-utils/clp_py_utils/clp_config.py
# Conflicts: # components/log-ingestor/src/compression/compression_job_submitter.rs # components/log-ingestor/src/ingestion_job_manager.rs # tools/deployment/package-helm/Chart.yaml
…ion` in CLP ingestion connector
hoophalab
left a comment
There was a problem hiding this comment.
LGTM
Validations: Successfully ingested logs and stored archives to S3 on EKS using IRSA
AwsAuthType.ec2toAwsAuthType.default.job-watchertoservice-account; addserviceAccountNametocompression-workerandquery-workerdeployments.Description
Add support for the AWS SDK's default credential provider chain (
type: "default") in CLP's Rustand Python components, enabling deployment on platforms where credentials are provided
automatically (e.g., EKS with IRSA, ECS, EC2 instance roles).
Previously, S3/SQS access required explicit
type: "credentials"with an access key pair. The newtype: "default"option omits explicit credentials and lets the AWS SDK discover them from theenvironment (env vars, config files, IRSA web identity tokens, container credentials, EC2 metadata).
Key changes:
clp-rust-utils): AddAwsAuthentication::Defaultvariant with#[serde(rename = "default")]. Addcredentials_pair()method returningOption<(&str, &str)>(
Nonefor Default,Some(...)for Credentials). Update S3 and SQS client creation to acceptoptional credentials -- when
None, the default AWS SDK provider chain is used.log-ingestor,api-server): ThreadAwsAuthenticationthrough instead of rawkey/secret pairs. Update
aws_client_manager,compression_job_submitter,ingestion_job_manager, andapi-server/clientto usecredentials_pair().clp-py-utils): RenameAwsAuthType.ec2toAwsAuthType.default. Updatevalidation logic and error messages in
clp_config.pyands3_utils.py.job-watchertoservice-account(more generic namesince it now serves both RBAC job-watching and IRSA roles). Add
serviceAccountNametocompression-workerandquery-workerdeployments (previously missing, which prevented IRSAtoken injection). Add
serviceAccount.annotationstovalues.yamlso the IRSA role ARN can beset declaratively at install time. Bump chart version to
0.1.4-dev.3.eks-irsa-setup.mdguide for configuring IRSA on Amazon EKS. Updateclp-config.mdwith thedefaultauth type and links to official AWS SDK credential chain docs(Boto3, Rust SDK). Update
index.mdto list thedefault credential provider chain sources.
Checklist
type: "ec2"renamed totype: "default"inaws_authenticationconfig.job-watchertoservice-account.Validation performed
Deployed CLP to an Amazon EKS cluster (EKS Auto Mode, us-east-2) with
type: "default"authentication and IRSA, and validated the full S3 ingestion and compression pipeline end-to-end.
1. Create an EKS cluster
clp-test) and region (e.g.us-east-2).2. Create an S3 bucket with test data
postgresql.jsonl).3. Create an IAM policy for S3 access
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"], "Resource": "arn:aws:s3:::<bucket-name>/*" }, { "Effect": "Allow", "Action": "s3:ListBucket", "Resource": "arn:aws:s3:::<bucket-name>" } ] }4. Configure IRSA
sts.amazonaws.comsts.amazonaws.comStringLikecondition for theServiceAccount. The default policy only has the
:audcondition — add the:subentry tothe existing
Conditionblock:5. Build a custom CLP package image
Push this branch to your fork's
mainbranch to trigger the CI image build:The built image will be available at
ghcr.io/<owner>/clp/clp-package:main.6. Prepare hostPath directories on the worker node
Third-party containers (MariaDB, RabbitMQ, etc.) run as non-root (uid 1000). On EKS, hostPath
directories are created as root by default, so they need to be pre-created with the correct
ownership.
Note: busybox
shdoes not support brace expansion ({a,b}), so each directory must be listedexplicitly.
7. Deploy CLP via Helm
Example
eks-values.yaml:Verify all pods reach Running/Completed:
8. Submit an S3 scanner job and verify compression
Verify the log-ingestor scanned the object:
kubectl logs -l app.kubernetes.io/component=log-ingestor --tail=10 # Should show: "Scanned new object metadata on S3."Wait for the buffer flush (default 300s timeout), then verify compression:
The archive should appear in the S3 bucket under the configured
key_prefix.Summary by CodeRabbit
New Features
Documentation
Tests
Chores