Skip to content

Fail fast on invalid config regexes and enabled config#461

Open
prk-Jr wants to merge 13 commits intomainfrom
fix/config-regex-hardening
Open

Fail fast on invalid config regexes and enabled config#461
prk-Jr wants to merge 13 commits intomainfrom
fix/config-regex-hardening

Conversation

@prk-Jr
Copy link
Collaborator

@prk-Jr prk-Jr commented Mar 8, 2026

Summary

  • Fail startup instead of panicking or silently disabling when config-derived regexes or enabled integration/provider configs are invalid.
  • Prepare handler and Next.js rewrite artifacts before request handling so invalid config returns descriptive TrustedServerError responses.
  • Add regression coverage for handler overrides, disabled-config bypasses, Next.js RSC/__NEXT_DATA__ rewrites, and publisher fallback encoding behavior.

Changes

File Change
.claude/agents/pr-creator.md Require PRs touching config-derived regex or pattern compilation to document startup hardening and validation coverage.
.claude/agents/pr-reviewer.md Treat panic-prone config compilation and silent invalid-enabled-config disablement as blocking review findings.
crates/common/src/auction/mod.rs Make auction provider construction fallible and propagate startup configuration errors.
crates/common/src/auth.rs Make basic-auth evaluation fallible so invalid handler regexes return config errors instead of panicking.
crates/common/src/integrations/adserver_mock.rs Return startup errors for invalid enabled provider config instead of logging and skipping it.
crates/common/src/integrations/aps.rs Return startup errors for invalid enabled provider config instead of logging and skipping it.
crates/common/src/integrations/datadome.rs Make enabled config registration fail fast with descriptive validation errors.
crates/common/src/integrations/didomi.rs Make enabled config registration fail fast with descriptive validation errors.
crates/common/src/integrations/google_tag_manager.rs Make enabled config registration fail fast with descriptive validation errors.
crates/common/src/integrations/gpt.rs Make enabled config registration fail fast with descriptive validation errors.
crates/common/src/integrations/lockr.rs Make enabled config registration fail fast with descriptive validation errors.
crates/common/src/integrations/mod.rs Update shared integration builder signatures and helpers for fallible startup registration.
crates/common/src/integrations/nextjs/html_post_process.rs Pass request host and scheme through the post-processor RSC rewrite path.
crates/common/src/integrations/nextjs/mod.rs Build Next.js rewriters during registration and add fixture-style regressions for RSC and __NEXT_DATA__ coverage.
crates/common/src/integrations/nextjs/rsc.rs Reuse the new RSC rewrite helpers while preserving payload sizing and chunk handling.
crates/common/src/integrations/nextjs/script_rewriter.rs Replace config-derived expect() regex compilation with fallible rewriters and add hostname, port, whitespace, and metacharacter regressions.
crates/common/src/integrations/nextjs/shared.rs Replace origin-specific regex construction with static generic patterns and safe hostname-boundary helpers.
crates/common/src/integrations/permutive.rs Make enabled config registration fail fast with descriptive validation errors.
crates/common/src/integrations/prebid.rs Fail startup on invalid enabled config, including empty server_url, and return fallible provider builders.
crates/common/src/integrations/registry.rs Propagate integration registration errors during startup instead of silently dropping invalid enabled integrations.
crates/common/src/integrations/testlight.rs Make enabled config registration fail fast with descriptive validation errors.
crates/common/src/publisher.rs Restrict publisher fallback Accept-Encoding to codecs the rewrite pipeline supports and add regression coverage.
crates/common/src/settings.rs Add runtime preparation for handler regexes, short-circuit raw enabled = false configs, and add startup hardening regressions.
crates/common/src/settings_data.rs Run runtime preparation when loading baked settings data so invalid handler regexes fail during startup.
crates/fastly/src/main.rs Surface orchestrator and auth configuration errors through the existing Fastly error-response path.

Hardening note

Invalid handlers[].path regexes now fail during startup preparation and still return descriptive configuration errors if request-time code encounters unprepared settings. Enabled integrations and auction providers now fail registry/orchestrator startup instead of logging and disabling themselves, while raw enabled = false configs still short-circuit before validation. Regression coverage includes invalid handler TOML and env overrides, disabled-invalid config bypasses, enabled-invalid integration/provider startup failures, empty prebid.server_url, Next.js __NEXT_DATA__ and RSC hostname/port/metacharacter rewrites, and the publisher fallback encoding path.

Closes

Closes #403

Test plan

  • cargo test --workspace
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo fmt --all -- --check
  • JS tests: cd crates/js/lib && npx vitest run
  • JS format: cd crates/js/lib && npm run format
  • Docs format: cd docs && npm run format
  • WASM build: cargo build --bin trusted-server-fastly --release --target wasm32-wasip1
  • Manual testing via fastly compute serve
  • Other: npx vitest run was skipped per explicit user instruction because of the unrelated repo-wide ERR_REQUIRE_ESM failure in html-encoding-sniffer.

Checklist

  • Changes follow CLAUDE.md conventions
  • No unwrap() in production code — use expect(\"should ...\")
  • Uses tracing macros (not println!)
  • New code has tests
  • No secrets or credentials committed

@prk-Jr prk-Jr self-assigned this Mar 8, 2026
Copy link
Collaborator

@ChristianPavilonis ChristianPavilonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: "Fail fast on invalid config regexes and enabled config"

Verdict: Looks good — no blockers.


Findings

P2 — Medium (Informational)

  1. RscUrlRewriter now uses a generic regex (crates/common/src/integrations/nextjs/shared.rs:22-27) — The static regex matches all URLs instead of per-origin URLs, so every URL in an RSC payload goes through the closure. The existing early-exit check (input.contains(&self.origin_host) at line 139) short-circuits the common case, so performance impact is negligible in practice. No action needed.

  2. Prebid server_url validation behavioral change (crates/common/src/integrations/prebid.rs:210-219) — Empty server_url now causes a hard error instead of silently disabling. This is the stated intent of the PR and is covered by the new empty_prebid_server_url_fails_orchestrator_startup test. Acceptable.


Highlights

  • is_explicitly_disabled short-circuit — Inspects raw JSON before deserialization so enabled: false configs with otherwise invalid fields don't cause spurious startup failures. Smart design, well-tested.
  • Comprehensive test coverage — Invalid handler regexes (TOML + env var), disabled+invalid config short-circuits for integrations/providers/auction providers, enabled+invalid startup failures, __NEXT_DATA__ rewrite fixtures with ports/metacharacters/whitespace/hostname boundaries.
  • Consistent fallible builder pattern — Every integration's build() uniformly returns Result<Option<T>, Report<TrustedServerError>>, making the codebase very predictable.
  • Lockr regex moved to Lazy<Regex> static (crates/common/src/integrations/lockr.rs:37-42) — Avoids recompiling on every request.
  • strip_origin_host_with_optional_port boundary helper (crates/common/src/integrations/nextjs/shared.rs:67-100) — Properly prevents origin.example.com.evil from matching while allowing origin.example.com:8443/path. Edge cases handled correctly and tested.

All CI checks pass. High-quality hardening PR with thorough, consistent changes across all 25 files.

Copy link
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Solid defensive hardening that converts config-derived regex compilation and integration registration from panic/log-and-swallow to Result-based propagation with fail-fast at startup. The is_explicitly_disabled short-circuit is a well-targeted addition. Consistent application across all 10+ integrations with thorough test coverage.

Non-blocking

🤔 thinking

  • Inconsistent parameter storage across rewriters: RscUrlRewriter stores origin_host, UrlRewriter is stateless — sibling types with different conventions (shared.rs:118, script_rewriter.rs:113)
  • restrict_accept_encoding unconditionally overwrites client preferences: prevents unsupported encodings from origin but does not honor clients that only accept identity or a subset (publisher.rs:19)

♻️ refactor

  • rewrite_nextjs_values_with_rewriter is a pass-through wrapper: delegates to rewriter.rewrite_embedded() with no added logic — can be inlined (script_rewriter.rs:74)
  • validate_path compiles regex redundantly: from_toml now calls prepare_runtime() which compiles via OnceLock, making the validation-time compile redundant (settings.rs:473)
  • Redundant prepare_runtime() call in get_settings: from_toml already calls it internally (settings_data.rs:37)

🌱 seedling

  • once_cell::sync::Lazystd::sync::LazyLock: project uses Rust 2024 edition where LazyLock is stable — opportunity to drop the once_cell dep in a follow-up
  • strip_origin_host_with_optional_port does not handle IPv6: RSC_URL_PATTERN captures bracketed IPv6 but this helper cannot match them — worth a doc comment (shared.rs:67)

👍 praise

  • is_explicitly_disabled short-circuit: checking raw JSON before deserialization means disabled integrations with placeholder config do not blow up startup (settings.rs:123)
  • Excellent hardening test matrix: disabled-invalid-skips, enabled-invalid-fails, provider configs, registry/orchestrator startup, handler regex, env var overrides, and a real-world-shaped __NEXT_DATA__ fixture
  • Consistent Result-based registration: uniform pattern across all integrations with no stragglers

CI Status

  • cargo fmt: PASS
  • cargo test: PASS
  • vitest: PASS
  • CodeQL/Analyze: PASS
  • format-docs: PASS
  • format-typescript: PASS

Copy link
Collaborator

@ChristianPavilonis ChristianPavilonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Approve

Systematically converts all integration/provider registration from "log-and-skip" to "fail-fast" error propagation, eagerly validates config-derived regexes at startup, replaces once_cell with std::sync::LazyLock, and adds comprehensive regression tests. The design is sound: invalid enabled config fails startup, disabled invalid config is silently skipped via is_explicitly_disabled, and handler regexes are eagerly prepared.

No blocking issues.

Findings

P2 — RscUrlRewriter regex now matches all URL-like patterns, not just origin
shared.rs:23-28 — The old code compiled a per-instance regex with the origin host baked in (escape(origin_host) in the pattern), so the regex engine only ever fired the callback for origin URLs. The new static RSC_URL_PATTERN matches any URL-like pattern and the callback checks strip_origin_host_with_optional_port to decide whether to rewrite. Rewriting output is identical, but non-origin URLs now pay a callback invocation + .to_string() allocation before being returned unchanged. Mitigated by the !input.contains(origin_host) fast-path at line 137 and the fact that per-instance regex compilation was the larger cost. Net performance win for the common case — just noting the tradeoff.

P2 — Cell<bool> in script rewriter
script_rewriter.rs:166 — Tracking mutation inside replace_all via Cell<bool>. Could be simplified to checking matches!(result, Cow::Borrowed(_)) since replace_all returns Cow::Borrowed when the callback always returns the original text. Current approach is correct for single-threaded wasm32-wasip1 — minor style nit.

Highlights

  • Excellent regression test coverage: disabled invalid config bypass, enabled invalid config failure, handler regex validation, __NEXT_DATA__ fixture rewrites, explicit port URL handling, publisher encoding fallback
  • is_explicitly_disabled early exit before deserialization/validation is a thoughtful design
  • Consistent Ok(None) / Ok(Some(...)) / Err(...) pattern across all integrations
  • RscUrlRewriter simplification from per-instance compiled regex to zero-sized type with shared LazyLock<Regex> eliminates per-request regex compilation
  • restrict_accept_encoding prevents the origin from responding with encodings the rewrite pipeline can't handle

prk-Jr added 3 commits March 18, 2026 15:23
restrict_accept_encoding was unconditionally setting Accept-Encoding:
gzip, deflate, br on every proxied request, even when the client sent
no preference. This caused compression-capable origins (Next.js) to
respond with gzip, which the pipeline preserved in Content-Encoding.
Test clients without a decompression feature then received raw gzip
bytes instead of HTML, breaking the HtmlInjection integration test.

Only restrict the header when the client already sent one.
After restrict_accept_encoding was fixed to early-return when no
Accept-Encoding header is present, SUPPORTED_ENCODINGS became dead code
since it was only used in the None fallback path. Inline the literal in
the unit test and drop the constant.
Copy link
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This branch hardens config-derived regex compilation and integration startup error handling. The core change converts panic-prone expect() on config-derived regex into fallible Result returns, and converts every integration register/register_providers function from silently swallowing errors (log::error! + return None) to propagating them. It also migrates once_cell::sync::Lazy to std::sync::LazyLock, adds Accept-Encoding filtering for the publisher proxy, and refactors the Next.js UrlRewriter/RscUrlRewriter to separate config-time construction from request-time parameters.

Well-structured change overall with excellent test coverage of the enabled/disabled x valid/invalid config matrix.

Blocking

🔧 wrench

  • uncovered_admin_endpoints silently swallows regex errors: .unwrap_or(false) treats invalid regex as "no match" rather than surfacing the real error. Currently safe because prepare_runtime() catches it first, but fragile. (crates/common/src/settings.rs:537)
  • Performance regression in RSC URL rewriting: Global RSC_URL_PATTERN now matches every URL in the payload (not just origin URLs), then filters in the closure. For large RSC payloads with many non-origin URLs, this is slower than the previous per-request origin-specific regex. (crates/common/src/integrations/nextjs/shared.rs:23)

❓ question

  • lol_html version downgrade: 2.7.2 -> 2.7.1 in workspace Cargo.toml — intentional or lockfile artifact? (Cargo.toml:72)

Non-blocking

🤔 thinking

  • RscUrlRewriter is now a zero-sized struct: All state was moved to function parameters. Free functions would be simpler unless config-time state is planned. (crates/common/src/integrations/nextjs/shared.rs:123)
  • is_explicitly_disabled edge case: Only fires for literal JSON false, not string "false". This is correct behavior but worth noting. (crates/common/src/settings.rs:139)

♻️ refactor

  • accept_encoding_qvalue returns last match, not first: Most HTTP implementations use first-match semantics for duplicate tokens. (crates/common/src/publisher.rs:89)
  • select_supported_accept_encoding(None) is dead code: The only caller returns early when header is absent. (crates/common/src/publisher.rs:37)

🌱 seedling

  • Permissive host capture in RSC_URL_PATTERN: The character class is broad. Safe today due to strip_origin_host_with_optional_port, but watch for evolution. (crates/common/src/integrations/nextjs/shared.rs:25)

!self
.handlers
.iter()
.any(|h| h.matches_path(path).unwrap_or(false))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrenchunwrap_or(false) silently swallows regex errors

uncovered_admin_endpoints uses .unwrap_or(false) which treats a bad regex as "this handler doesn't match" — so an invalid handler regex causes validate_admin_coverage to report the admin endpoint as uncovered rather than reporting the real error (invalid regex).

In the current call graph this is safe because prepare_runtime() runs first and would catch the bad regex. But this is fragile — a future caller of from_toml without prepare_runtime() would get a misleading error.

Fix: Consider making uncovered_admin_endpoints return Result, or at minimum add a // SAFETY: comment explaining the prepare_runtime() invariant.

});

/// Generic URL pattern for RSC payload rewriting.
pub(crate) static RSC_URL_PATTERN: LazyLock<Regex> = LazyLock::new(|| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Performance regression: global RSC_URL_PATTERN matches all URLs, not just origin

The old code built a per-request regex that embedded the escaped origin_host in the pattern, so only origin URLs were matched. The new static RSC_URL_PATTERN matches every URL in the payload, then filters via strip_origin_host_with_optional_port inside the closure.

The contains(origin_host) fast-path helps, but once triggered, every URL gets the regex+closure overhead. For large RSC payloads with many non-origin URLs this is a regression.

Suggestion: At minimum document the tradeoff. Consider benchmarking with a realistic payload (100+ URLs). If perf matters, a per-origin compiled regex cached via OnceLock or small LRU would restore the previous selectivity.

log-fastly = "0.11.12"
lol_html = "2.7.2"
once_cell = "1.21"
lol_html = "2.7.1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question — Is the lol_html downgrade from 2.7.2 to 2.7.1 intentional?

This seems unrelated to config regex hardening. Was this a lockfile resolution artifact, or intentional?

pattern: Regex,
}
#[derive(Clone, Default)]
pub(crate) struct RscUrlRewriter;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinkingRscUrlRewriter is now a zero-sized struct with no state

RscUrlRewriter derives Clone, Default, holds no fields, and every method now takes origin_host, request_host, request_scheme as arguments. The struct is purely ceremonial — free functions would be simpler.

I can see the argument for keeping it as a future extension point, but if no config-time state is planned, module-level functions would reduce indirection.

}
}

fn is_explicitly_disabled(raw: &JsonValue) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinkingis_explicitly_disabled edge case with non-bool enabled

If someone writes "enabled": "false" (string instead of bool), as_bool() returns None, so the integration proceeds to deserialization and may fail validation. This is correct behavior (malformed config should fail), but worth noting that the bypass only fires for the literal JSON boolean false.

Also worth noting: integrations that default enabled to true with no enabled field in the config will correctly proceed to validation, which is the desired behavior.

}
}

matched_qvalue = Some(qvalue);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ refactoraccept_encoding_qvalue returns the last match, not the first

If Accept-Encoding contains a duplicate token (e.g., gzip;q=1, gzip;q=0), this returns the last match's qvalue. Most HTTP implementations use first-match semantics.

Suggestion:

matched_qvalue = Some(qvalue);
break; // first match wins per RFC convention

In practice duplicate tokens are rare and unlikely to cause issues, but first-match is more conventional.

);
}

fn select_supported_accept_encoding(client_accept_encoding: Option<&str>) -> String {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ refactorselect_supported_accept_encoding(None) is dead code

The only caller restrict_accept_encoding returns early when Accept-Encoding is absent, so the None branch (returning "identity") is unreachable. You could simplify the parameter to &str if this function isn't intended for other callers.

/// Generic URL pattern for RSC payload rewriting.
pub(crate) static RSC_URL_PATTERN: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(
r#"(https?)?(:)?(\\\\\\\\\\\\\\\\//|\\\\\\\\//|\\/\\/|//)(?P<host>\[[^\]]+\](?::\d+)?|[^/"'\\\s?#<>,)]+)"#,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌱 seedling — Host capture group [^/"'\\\s?#<>,)]+ is very permissive

This character class matches anything not in the exclusion set, which could theoretically match non-hostname content in unusual payloads. Currently safe because strip_origin_host_with_optional_port filters properly, but worth keeping an eye on if RSC payload formats evolve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Panicking .expect() on regex compilation from user configuration

3 participants