Fail fast on invalid config regexes and enabled config#461
Fail fast on invalid config regexes and enabled config#461
Conversation
ChristianPavilonis
left a comment
There was a problem hiding this comment.
PR Review: "Fail fast on invalid config regexes and enabled config"
Verdict: Looks good — no blockers.
Findings
P2 — Medium (Informational)
-
RscUrlRewriternow uses a generic regex (crates/common/src/integrations/nextjs/shared.rs:22-27) — The static regex matches all URLs instead of per-origin URLs, so every URL in an RSC payload goes through the closure. The existing early-exit check (input.contains(&self.origin_host)at line 139) short-circuits the common case, so performance impact is negligible in practice. No action needed. -
Prebid
server_urlvalidation behavioral change (crates/common/src/integrations/prebid.rs:210-219) — Emptyserver_urlnow causes a hard error instead of silently disabling. This is the stated intent of the PR and is covered by the newempty_prebid_server_url_fails_orchestrator_startuptest. Acceptable.
Highlights
is_explicitly_disabledshort-circuit — Inspects raw JSON before deserialization soenabled: falseconfigs with otherwise invalid fields don't cause spurious startup failures. Smart design, well-tested.- Comprehensive test coverage — Invalid handler regexes (TOML + env var), disabled+invalid config short-circuits for integrations/providers/auction providers, enabled+invalid startup failures,
__NEXT_DATA__rewrite fixtures with ports/metacharacters/whitespace/hostname boundaries. - Consistent fallible builder pattern — Every integration's
build()uniformly returnsResult<Option<T>, Report<TrustedServerError>>, making the codebase very predictable. - Lockr regex moved to
Lazy<Regex>static (crates/common/src/integrations/lockr.rs:37-42) — Avoids recompiling on every request. strip_origin_host_with_optional_portboundary helper (crates/common/src/integrations/nextjs/shared.rs:67-100) — Properly preventsorigin.example.com.evilfrom matching while allowingorigin.example.com:8443/path. Edge cases handled correctly and tested.
All CI checks pass. High-quality hardening PR with thorough, consistent changes across all 25 files.
aram356
left a comment
There was a problem hiding this comment.
Summary
Solid defensive hardening that converts config-derived regex compilation and integration registration from panic/log-and-swallow to Result-based propagation with fail-fast at startup. The is_explicitly_disabled short-circuit is a well-targeted addition. Consistent application across all 10+ integrations with thorough test coverage.
Non-blocking
🤔 thinking
- Inconsistent parameter storage across rewriters:
RscUrlRewriterstoresorigin_host,UrlRewriteris stateless — sibling types with different conventions (shared.rs:118,script_rewriter.rs:113) restrict_accept_encodingunconditionally overwrites client preferences: prevents unsupported encodings from origin but does not honor clients that only acceptidentityor a subset (publisher.rs:19)
♻️ refactor
rewrite_nextjs_values_with_rewriteris a pass-through wrapper: delegates torewriter.rewrite_embedded()with no added logic — can be inlined (script_rewriter.rs:74)validate_pathcompiles regex redundantly:from_tomlnow callsprepare_runtime()which compiles viaOnceLock, making the validation-time compile redundant (settings.rs:473)- Redundant
prepare_runtime()call inget_settings:from_tomlalready calls it internally (settings_data.rs:37)
🌱 seedling
once_cell::sync::Lazy→std::sync::LazyLock: project uses Rust 2024 edition whereLazyLockis stable — opportunity to drop theonce_celldep in a follow-upstrip_origin_host_with_optional_portdoes not handle IPv6:RSC_URL_PATTERNcaptures bracketed IPv6 but this helper cannot match them — worth a doc comment (shared.rs:67)
👍 praise
is_explicitly_disabledshort-circuit: checking raw JSON before deserialization means disabled integrations with placeholder config do not blow up startup (settings.rs:123)- Excellent hardening test matrix: disabled-invalid-skips, enabled-invalid-fails, provider configs, registry/orchestrator startup, handler regex, env var overrides, and a real-world-shaped
__NEXT_DATA__fixture - Consistent
Result-based registration: uniform pattern across all integrations with no stragglers
CI Status
- cargo fmt: PASS
- cargo test: PASS
- vitest: PASS
- CodeQL/Analyze: PASS
- format-docs: PASS
- format-typescript: PASS
There was a problem hiding this comment.
Review: Approve
Systematically converts all integration/provider registration from "log-and-skip" to "fail-fast" error propagation, eagerly validates config-derived regexes at startup, replaces once_cell with std::sync::LazyLock, and adds comprehensive regression tests. The design is sound: invalid enabled config fails startup, disabled invalid config is silently skipped via is_explicitly_disabled, and handler regexes are eagerly prepared.
No blocking issues.
Findings
P2 — RscUrlRewriter regex now matches all URL-like patterns, not just origin
shared.rs:23-28 — The old code compiled a per-instance regex with the origin host baked in (escape(origin_host) in the pattern), so the regex engine only ever fired the callback for origin URLs. The new static RSC_URL_PATTERN matches any URL-like pattern and the callback checks strip_origin_host_with_optional_port to decide whether to rewrite. Rewriting output is identical, but non-origin URLs now pay a callback invocation + .to_string() allocation before being returned unchanged. Mitigated by the !input.contains(origin_host) fast-path at line 137 and the fact that per-instance regex compilation was the larger cost. Net performance win for the common case — just noting the tradeoff.
P2 — Cell<bool> in script rewriter
script_rewriter.rs:166 — Tracking mutation inside replace_all via Cell<bool>. Could be simplified to checking matches!(result, Cow::Borrowed(_)) since replace_all returns Cow::Borrowed when the callback always returns the original text. Current approach is correct for single-threaded wasm32-wasip1 — minor style nit.
Highlights
- Excellent regression test coverage: disabled invalid config bypass, enabled invalid config failure, handler regex validation,
__NEXT_DATA__fixture rewrites, explicit port URL handling, publisher encoding fallback is_explicitly_disabledearly exit before deserialization/validation is a thoughtful design- Consistent
Ok(None)/Ok(Some(...))/Err(...)pattern across all integrations RscUrlRewritersimplification from per-instance compiled regex to zero-sized type with sharedLazyLock<Regex>eliminates per-request regex compilationrestrict_accept_encodingprevents the origin from responding with encodings the rewrite pipeline can't handle
restrict_accept_encoding was unconditionally setting Accept-Encoding: gzip, deflate, br on every proxied request, even when the client sent no preference. This caused compression-capable origins (Next.js) to respond with gzip, which the pipeline preserved in Content-Encoding. Test clients without a decompression feature then received raw gzip bytes instead of HTML, breaking the HtmlInjection integration test. Only restrict the header when the client already sent one.
After restrict_accept_encoding was fixed to early-return when no Accept-Encoding header is present, SUPPORTED_ENCODINGS became dead code since it was only used in the None fallback path. Inline the literal in the unit test and drop the constant.
aram356
left a comment
There was a problem hiding this comment.
Summary
This branch hardens config-derived regex compilation and integration startup error handling. The core change converts panic-prone expect() on config-derived regex into fallible Result returns, and converts every integration register/register_providers function from silently swallowing errors (log::error! + return None) to propagating them. It also migrates once_cell::sync::Lazy to std::sync::LazyLock, adds Accept-Encoding filtering for the publisher proxy, and refactors the Next.js UrlRewriter/RscUrlRewriter to separate config-time construction from request-time parameters.
Well-structured change overall with excellent test coverage of the enabled/disabled x valid/invalid config matrix.
Blocking
🔧 wrench
uncovered_admin_endpointssilently swallows regex errors:.unwrap_or(false)treats invalid regex as "no match" rather than surfacing the real error. Currently safe becauseprepare_runtime()catches it first, but fragile. (crates/common/src/settings.rs:537)- Performance regression in RSC URL rewriting: Global
RSC_URL_PATTERNnow matches every URL in the payload (not just origin URLs), then filters in the closure. For large RSC payloads with many non-origin URLs, this is slower than the previous per-request origin-specific regex. (crates/common/src/integrations/nextjs/shared.rs:23)
❓ question
lol_htmlversion downgrade: 2.7.2 -> 2.7.1 in workspaceCargo.toml— intentional or lockfile artifact? (Cargo.toml:72)
Non-blocking
🤔 thinking
RscUrlRewriteris now a zero-sized struct: All state was moved to function parameters. Free functions would be simpler unless config-time state is planned. (crates/common/src/integrations/nextjs/shared.rs:123)is_explicitly_disablededge case: Only fires for literal JSONfalse, not string"false". This is correct behavior but worth noting. (crates/common/src/settings.rs:139)
♻️ refactor
accept_encoding_qvaluereturns last match, not first: Most HTTP implementations use first-match semantics for duplicate tokens. (crates/common/src/publisher.rs:89)select_supported_accept_encoding(None)is dead code: The only caller returns early when header is absent. (crates/common/src/publisher.rs:37)
🌱 seedling
- Permissive host capture in
RSC_URL_PATTERN: The character class is broad. Safe today due tostrip_origin_host_with_optional_port, but watch for evolution. (crates/common/src/integrations/nextjs/shared.rs:25)
| !self | ||
| .handlers | ||
| .iter() | ||
| .any(|h| h.matches_path(path).unwrap_or(false)) |
There was a problem hiding this comment.
🔧 wrench — unwrap_or(false) silently swallows regex errors
uncovered_admin_endpoints uses .unwrap_or(false) which treats a bad regex as "this handler doesn't match" — so an invalid handler regex causes validate_admin_coverage to report the admin endpoint as uncovered rather than reporting the real error (invalid regex).
In the current call graph this is safe because prepare_runtime() runs first and would catch the bad regex. But this is fragile — a future caller of from_toml without prepare_runtime() would get a misleading error.
Fix: Consider making uncovered_admin_endpoints return Result, or at minimum add a // SAFETY: comment explaining the prepare_runtime() invariant.
| }); | ||
|
|
||
| /// Generic URL pattern for RSC payload rewriting. | ||
| pub(crate) static RSC_URL_PATTERN: LazyLock<Regex> = LazyLock::new(|| { |
There was a problem hiding this comment.
🔧 wrench — Performance regression: global RSC_URL_PATTERN matches all URLs, not just origin
The old code built a per-request regex that embedded the escaped origin_host in the pattern, so only origin URLs were matched. The new static RSC_URL_PATTERN matches every URL in the payload, then filters via strip_origin_host_with_optional_port inside the closure.
The contains(origin_host) fast-path helps, but once triggered, every URL gets the regex+closure overhead. For large RSC payloads with many non-origin URLs this is a regression.
Suggestion: At minimum document the tradeoff. Consider benchmarking with a realistic payload (100+ URLs). If perf matters, a per-origin compiled regex cached via OnceLock or small LRU would restore the previous selectivity.
| log-fastly = "0.11.12" | ||
| lol_html = "2.7.2" | ||
| once_cell = "1.21" | ||
| lol_html = "2.7.1" |
There was a problem hiding this comment.
❓ question — Is the lol_html downgrade from 2.7.2 to 2.7.1 intentional?
This seems unrelated to config regex hardening. Was this a lockfile resolution artifact, or intentional?
| pattern: Regex, | ||
| } | ||
| #[derive(Clone, Default)] | ||
| pub(crate) struct RscUrlRewriter; |
There was a problem hiding this comment.
🤔 thinking — RscUrlRewriter is now a zero-sized struct with no state
RscUrlRewriter derives Clone, Default, holds no fields, and every method now takes origin_host, request_host, request_scheme as arguments. The struct is purely ceremonial — free functions would be simpler.
I can see the argument for keeping it as a future extension point, but if no config-time state is planned, module-level functions would reduce indirection.
| } | ||
| } | ||
|
|
||
| fn is_explicitly_disabled(raw: &JsonValue) -> bool { |
There was a problem hiding this comment.
🤔 thinking — is_explicitly_disabled edge case with non-bool enabled
If someone writes "enabled": "false" (string instead of bool), as_bool() returns None, so the integration proceeds to deserialization and may fail validation. This is correct behavior (malformed config should fail), but worth noting that the bypass only fires for the literal JSON boolean false.
Also worth noting: integrations that default enabled to true with no enabled field in the config will correctly proceed to validation, which is the desired behavior.
| } | ||
| } | ||
|
|
||
| matched_qvalue = Some(qvalue); |
There was a problem hiding this comment.
♻️ refactor — accept_encoding_qvalue returns the last match, not the first
If Accept-Encoding contains a duplicate token (e.g., gzip;q=1, gzip;q=0), this returns the last match's qvalue. Most HTTP implementations use first-match semantics.
Suggestion:
matched_qvalue = Some(qvalue);
break; // first match wins per RFC conventionIn practice duplicate tokens are rare and unlikely to cause issues, but first-match is more conventional.
| ); | ||
| } | ||
|
|
||
| fn select_supported_accept_encoding(client_accept_encoding: Option<&str>) -> String { |
There was a problem hiding this comment.
♻️ refactor — select_supported_accept_encoding(None) is dead code
The only caller restrict_accept_encoding returns early when Accept-Encoding is absent, so the None branch (returning "identity") is unreachable. You could simplify the parameter to &str if this function isn't intended for other callers.
| /// Generic URL pattern for RSC payload rewriting. | ||
| pub(crate) static RSC_URL_PATTERN: LazyLock<Regex> = LazyLock::new(|| { | ||
| Regex::new( | ||
| r#"(https?)?(:)?(\\\\\\\\\\\\\\\\//|\\\\\\\\//|\\/\\/|//)(?P<host>\[[^\]]+\](?::\d+)?|[^/"'\\\s?#<>,)]+)"#, |
There was a problem hiding this comment.
🌱 seedling — Host capture group [^/"'\\\s?#<>,)]+ is very permissive
This character class matches anything not in the exclusion set, which could theoretically match non-hostname content in unusual payloads. Currently safe because strip_origin_host_with_optional_port filters properly, but worth keeping an eye on if RSC payload formats evolve.
Summary
TrustedServerErrorresponses.__NEXT_DATA__rewrites, and publisher fallback encoding behavior.Changes
.claude/agents/pr-creator.md.claude/agents/pr-reviewer.mdcrates/common/src/auction/mod.rscrates/common/src/auth.rscrates/common/src/integrations/adserver_mock.rscrates/common/src/integrations/aps.rscrates/common/src/integrations/datadome.rscrates/common/src/integrations/didomi.rscrates/common/src/integrations/google_tag_manager.rscrates/common/src/integrations/gpt.rscrates/common/src/integrations/lockr.rscrates/common/src/integrations/mod.rscrates/common/src/integrations/nextjs/html_post_process.rscrates/common/src/integrations/nextjs/mod.rs__NEXT_DATA__coverage.crates/common/src/integrations/nextjs/rsc.rscrates/common/src/integrations/nextjs/script_rewriter.rsexpect()regex compilation with fallible rewriters and add hostname, port, whitespace, and metacharacter regressions.crates/common/src/integrations/nextjs/shared.rscrates/common/src/integrations/permutive.rscrates/common/src/integrations/prebid.rsserver_url, and return fallible provider builders.crates/common/src/integrations/registry.rscrates/common/src/integrations/testlight.rscrates/common/src/publisher.rsAccept-Encodingto codecs the rewrite pipeline supports and add regression coverage.crates/common/src/settings.rsenabled = falseconfigs, and add startup hardening regressions.crates/common/src/settings_data.rscrates/fastly/src/main.rsHardening note
Invalid
handlers[].pathregexes now fail during startup preparation and still return descriptive configuration errors if request-time code encounters unprepared settings. Enabled integrations and auction providers now fail registry/orchestrator startup instead of logging and disabling themselves, while rawenabled = falseconfigs still short-circuit before validation. Regression coverage includes invalid handler TOML and env overrides, disabled-invalid config bypasses, enabled-invalid integration/provider startup failures, emptyprebid.server_url, Next.js__NEXT_DATA__and RSC hostname/port/metacharacter rewrites, and the publisher fallback encoding path.Closes
Closes #403
Test plan
cargo test --workspacecargo clippy --all-targets --all-features -- -D warningscargo fmt --all -- --checkcd crates/js/lib && npx vitest runcd crates/js/lib && npm run formatcd docs && npm run formatcargo build --bin trusted-server-fastly --release --target wasm32-wasip1fastly compute servenpx vitest runwas skipped per explicit user instruction because of the unrelated repo-wideERR_REQUIRE_ESMfailure inhtml-encoding-sniffer.Checklist
unwrap()in production code — useexpect(\"should ...\")tracingmacros (notprintln!)