Skip to content

Add semantic validator for policy template datastream categories#1095

Merged
jsoriano merged 15 commits into
elastic:mainfrom
JDKurma:add-datastream-categories-validato
May 18, 2026
Merged

Add semantic validator for policy template datastream categories#1095
jsoriano merged 15 commits into
elastic:mainfrom
JDKurma:add-datastream-categories-validato

Conversation

@JDKurma
Copy link
Copy Markdown
Contributor

@JDKurma JDKurma commented Feb 23, 2026

What does this PR do?

Adds semantic validators to verify new datastream manifest categories are aligned and synced with policy template datastream categories when present as well as parent categories alongside package level categories.

Why is it important?

Verify and prevent drift between datastream specific categorization

Checklist

Related issues

N/A

@JDKurma JDKurma self-assigned this Feb 23, 2026
@JDKurma JDKurma force-pushed the add-datastream-categories-validato branch 2 times, most recently from a796064 to 4d739c2 Compare February 24, 2026 23:36
@JDKurma
Copy link
Copy Markdown
Contributor Author

JDKurma commented Feb 25, 2026

test integrations

@elastic-vault-github-plugin-prod
Copy link
Copy Markdown

Created or updated PR in integrations repository to test this version. Check elastic/integrations#17560

@JDKurma JDKurma added the enhancement New feature or request label Feb 27, 2026
@JDKurma JDKurma marked this pull request as ready for review February 27, 2026 15:20
@JDKurma JDKurma requested a review from a team as a code owner February 27, 2026 15:20
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds two semantic validators for datastream category consistency: one validates that policy template categories match categories declared in referenced data_stream manifests; the other validates that package top-level categories include any registry-defined parent categories referenced by datastreams. Both validators are registered in the spec rule set, parse package and data_stream manifest.yml files, fetch registry categories as needed, and return structured spec validation errors. Includes unit tests and test-package fixtures demonstrating matching and failing scenarios.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@JDKurma JDKurma force-pushed the add-datastream-categories-validato branch from 4d739c2 to b57aaed Compare February 27, 2026 15:24
@trisch-me
Copy link
Copy Markdown
Contributor

Do we have a propagation from policy to package category? This also should be not fine grained categories, but parents according to the tree structure

@teresaromero
Copy link
Copy Markdown
Contributor

can you adjust the pr description to the template provided?

teresaromero
teresaromero previously approved these changes Mar 3, 2026
@JDKurma JDKurma force-pushed the add-datastream-categories-validato branch from e7cc89f to 3c49b8d Compare March 5, 2026 00:27
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@code/go/internal/validator/semantic/validate_datastream_package_categories.go`:
- Around line 68-126: The validator ValidateDatastreamPackageCategories
currently reads and parses manifest.yml using fs.ReadFile and yaml.Unmarshal;
replace that logic with the repo-standard pkgpath.Files() + file.Values()
pattern and move the file-reading/parsing into a new helper (e.g.,
parsePackageManifest or loadPackageManifest) to follow guidelines; update
ValidateDatastreamPackageCategories to call the helper to obtain a
packageManifestWithPackageCategories and keep the rest of the logic intact, and
ensure any other YAML reads (e.g., readDataStreamManifestCategories) follow the
same pkgpath.Files()/file.Values() approach if applicable.
- Around line 23-57: In fetchRegistryParentCategories ensure you check HTTP
response status and validate parsed categories: after
client.Get(packageRegistryCategoriesURL) verify resp.StatusCode == http.StatusOK
and return a descriptive error including resp.Status and resp.StatusCode if not
200; perform status check before reading the body to avoid treating error pages
as valid YAML; after yaml.Unmarshal ensure rc.Categories is non-nil and
non-empty and return an error if empty so validation isn't silently skipped;
keep the existing defer resp.Body.Close and include the
packageRegistryCategoriesURL or HTTP status in error messages for easier
debugging.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: aee60147-cd89-45e6-bb33-24a70d185d38

📥 Commits

Reviewing files that changed from the base of the PR and between e7cc89f and 3c49b8d.

📒 Files selected for processing (15)
  • code/go/internal/validator/semantic/validate_datastream_package_categories.go
  • code/go/internal/validator/semantic/validate_datastream_package_categories_test.go
  • code/go/internal/validator/semantic/validate_policy_template_datastream_categories_test.go
  • code/go/internal/validator/spec.go
  • code/go/pkg/validator/validator_test.go
  • test/packages/bad_datastream_package_categories/changelog.yml
  • test/packages/bad_datastream_package_categories/data_stream/mylogs/agent/stream/stream.yml.hbs
  • test/packages/bad_datastream_package_categories/data_stream/mylogs/fields/fields.yml
  • test/packages/bad_datastream_package_categories/data_stream/mylogs/manifest.yml
  • test/packages/bad_datastream_package_categories/manifest.yml
  • test/packages/good_datastream_package_categories/changelog.yml
  • test/packages/good_datastream_package_categories/data_stream/mylogs/agent/stream/stream.yml.hbs
  • test/packages/good_datastream_package_categories/data_stream/mylogs/fields/fields.yml
  • test/packages/good_datastream_package_categories/data_stream/mylogs/manifest.yml
  • test/packages/good_datastream_package_categories/manifest.yml
✅ Files skipped from review due to trivial changes (4)
  • test/packages/bad_datastream_package_categories/data_stream/mylogs/manifest.yml
  • test/packages/good_datastream_package_categories/data_stream/mylogs/agent/stream/stream.yml.hbs
  • code/go/pkg/validator/validator_test.go
  • test/packages/good_datastream_package_categories/changelog.yml
🚧 Files skipped from review as they are similar to previous changes (1)
  • code/go/internal/validator/spec.go

Comment thread code/go/internal/validator/semantic/validate_datastream_package_categories.go Outdated
@JDKurma JDKurma force-pushed the add-datastream-categories-validato branch from d7b1451 to 3d214a1 Compare March 5, 2026 01:24
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@code/go/internal/validator/semantic/validate_policy_template_datastream_categories.go`:
- Around line 43-46: The code silently skips validation when manifest.type is
not a string by returning ("", nil, nil); in the function in
validate_policy_template_datastream_categories.go replace that silent return
with an explicit error return (e.g., return "", nil, fmt.Errorf(...)) so callers
receive a validation failure; detect the failed type assertion on typeVal ->
pkgType and return a descriptive error mentioning the invalid manifest.type
value/type and the function name (or "manifest.type") so the validation pipeline
can surface the issue.
- Around line 33-57: The function readPackageManifestPolicyTemplates currently
reads/parses the manifest via raw fs/yaml; replace this with the package
manifest helpers: use pkgpath.Files() (with the package manifest glob) to load
the manifest file(s), then call file.Values("$.type") to get pkgType and
file.Values(...) JSONPath queries to extract PolicyTemplates and their
categories into packageManifestWithCategories instead of yaml.Unmarshal; update
readPackageManifestPolicyTemplates to return pkgType and pkg.PolicyTemplates
from the file.Values results and remove direct fs/yaml usage; apply the same
replacement for the similar parsing block later (the other function using
fs.ReadFile/yaml.Unmarshal) so both use pkgpath.Files() and file.Values()
JSONPath helpers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8ea0e2af-c551-4be2-879f-88a658148ab3

📥 Commits

Reviewing files that changed from the base of the PR and between 3c49b8d and d7b1451.

📒 Files selected for processing (7)
  • code/go/internal/validator/semantic/validate_datastream_package_categories.go
  • code/go/internal/validator/semantic/validate_policy_template_datastream_categories.go
  • code/go/internal/validator/spec.go
  • code/go/pkg/validator/validator_test.go
  • spec/changelog.yml
  • test/packages/bad_datastream_categories_mismatch/manifest.yml
  • test/packages/good_datastream_categories_match/manifest.yml
🚧 Files skipped from review as they are similar to previous changes (5)
  • code/go/pkg/validator/validator_test.go
  • test/packages/bad_datastream_categories_mismatch/manifest.yml
  • test/packages/good_datastream_categories_match/manifest.yml
  • code/go/internal/validator/spec.go
  • code/go/internal/validator/semantic/validate_datastream_package_categories.go

@JDKurma JDKurma force-pushed the add-datastream-categories-validato branch from 85b399b to 5db2607 Compare March 5, 2026 02:25
@JDKurma JDKurma force-pushed the add-datastream-categories-validato branch 2 times, most recently from 768ed0a to 7f03b1f Compare March 5, 2026 04:20
@JDKurma JDKurma force-pushed the add-datastream-categories-validato branch 3 times, most recently from b8218ee to 96ed6b4 Compare April 13, 2026 05:28
Comment thread code/go/internal/validator/semantic/validate_datastream_package_categories.go Outdated
trisch-me
trisch-me previously approved these changes Apr 22, 2026
@JDKurma
Copy link
Copy Markdown
Contributor Author

JDKurma commented Apr 27, 2026

@teresaromero Could I get a re-review, thanks!

@teresaromero
Copy link
Copy Markdown
Contributor

i would like some extra 👀 from @elastic/ecosystem

Comment thread code/go/internal/validator/semantic/validate_datastream_package_categories.go Outdated
Comment thread code/go/internal/validator/semantic/validate_datastream_package_categories.go Outdated
@jsoriano
Copy link
Copy Markdown
Member

test integrations

@elastic-vault-github-plugin-prod
Copy link
Copy Markdown

Created or updated PR in integrations repository to test this version. Check elastic/integrations#18958

…e_categories.go

Co-authored-by: Jaime Soriano Pastor <jaime.soriano@elastic.co>
@elasticmachine
Copy link
Copy Markdown

💚 Build Succeeded

History

cc @JDKurma

@JDKurma
Copy link
Copy Markdown
Contributor Author

JDKurma commented May 12, 2026

test integrations

@elastic-vault-github-plugin-prod
Copy link
Copy Markdown

Created or updated PR in integrations repository to test this version. Check elastic/integrations#18972

)

const packageRegistryCategoriesURL = "https://raw.githubusercontent.com/elastic/package-registry/main/categories/categories.yml"
const packageRegistryCategoriesURL = "https://raw.githubusercontent.com/elastic/package-registry/v1.38.0/categories/categories.yml"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add an updatecli configuration to ensure this stays updated? It could be similar to this one: https://github.com/elastic/elastic-package/blob/09286eb1b16dea302739292ce353a8a1876be1b7/.github/workflows/updatecli/updatecli.d/bump-package-registry-version.yml

If not done in this PR please do it in a follow up or open an issue so we don't lose track of this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll add it in a immediate follow-up PR.

BTW, merging is blocked for me, any chance you could merge it or grant me permission to do so? Thanks!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged, thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsoriano #1171 for the updatecli configuration

@jsoriano jsoriano merged commit 5a8b241 into elastic:main May 18, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants