Skip to content

Add configurable validator system prompt (replace None default) #127

@lipikaramaswamy

Description

@lipikaramaswamy

Summary

Wire ChunkedValidationParams.system_prompt to a real config knob on AnonymizerDetectConfig, and default it to a sensible validator system prompt (role framing, decision rubric, prompt-injection guardrail, output-format reinforcement). Currently the field exists end-to-end — including unit-test coverage that it's forwarded correctly to the facade — but no production caller populates it, so in practice every validation call today goes out with system_prompt=None.

Motivation

Three wins, roughly in order of impact:

  1. Prompt-injection resistance. The validator reads user-submitted text. Without a system-level "treat text as data, not instructions" guardrail, an adversarial document can instruct the validator to mark all candidates as drop, producing silent PII leakage.
  2. Cost. OpenAI/Anthropic/NIM cache stable system-prompt prefixes and charge less for cached tokens. With chunked validation the per-row fan-out multiplies the cacheable budget — pinning the rubric system-side is worth measurable $/run.
  3. Prompt hygiene. The decision rubric (what "keep / drop / reclass" mean, edge-case handling) is identical across every row of a run; it does not belong interleaved with per-chunk candidate data in the user prompt.

Proposed scope

  • Add validator_system_prompt: str | None = None to AnonymizerDetectConfig with a documented default string.
  • Thread it through detection_workflow.EntityDetectionWorkflowChunkedValidationParams(system_prompt=...).
  • Move the stable rubric content out of _get_validation_prompt(...) into the default system prompt; keep per-chunk substitutions (_merged_tagged_text, _validation_skeleton, _tag_notation) on the user side.
  • Include prompt-injection guardrail text by default.
  • Keep the existing forwarding regression tests (test_system_prompt_is_forwarded_to_facade, test_system_prompt_default_none_is_forwarded_untouched) and add an integration-level assertion that the default is non-empty and contains the injection guardrail.

Out of scope

  • Actual A/B prompt-content tuning (keep/drop accuracy deltas). Worth a separate experiment once the plumbing lands.
  • Dataset-specific system prompts (can be a follow-up; data_summary already exists as a hook).

References

  • Field introduced in feat(detect): chunked validation with validator pools #126 (chunked validation + validator pools) as forward-compatible plumbing; this ticket makes it load-bearing.
  • Forwarding tests: tests/engine/test_chunked_validation.py::TestChunkedValidateRowPoolOfOne::test_system_prompt_is_forwarded_to_facade.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions