feat: New rail, huggingface lightweight classifiers #1853
Conversation
6398f99 to
930faa6
Compare
004d830 to
9b58539
Compare
863ac8d to
5431798
Compare
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR introduces a pluggable
|
| Filename | Overview |
|---|---|
| nemoguardrails/library/hf_classifier/backends.py | New file: pluggable backends (Local, vLLM, KServe, FMS) for HF classifier inference; addressed previous cache-key and event-loop concerns, but aiohttp sessions are only closed at full shutdown (not on hot-reload). |
| nemoguardrails/library/hf_classifier/actions.py | New file: Colang-callable action functions for input, output, retrieval, tool-input and tool-output classification; tool_message context key is never set by the runtime (flagged in previous review threads), causing tool-input classification to always allow. |
| nemoguardrails/guardrails/actions/hf_classifier_action.py | New file: IORails-path HFClassifierInputAction and HFClassifierOutputAction; threshold/blocked_labels filtering is applied client-side in _parse_classify_response, design looks sound. |
| nemoguardrails/server/api.py | Pre-warm logic now uses max_workers=len(pending) (parallel per-model threads), fixing the previous single-worker sequential issue; close_all_backends() is called on shutdown, but not on config hot-reload. |
| nemoguardrails/guardrails/guardrails.py | Adds hf_classifier flows to IORAILS_INPUT/OUTPUT_FLOWS whitelists; exposes events_history_cache property (no-op for IORails); guards against empty-flows configs routing to IORails. |
| nemoguardrails/rails/llm/config.py | Adds LocalHFClassifierConfig, RemoteHFClassifierConfig, and discriminated-union HFClassifierConfig; validation is thorough (threshold bounds, endpoint scheme, aggregation_strategy guard). |
| nemoguardrails/guardrails/rail_action.py | Adds _param_prefix class attribute so subclasses can customise the error message when a required parameter is missing; change is minimal and correct. |
| nemoguardrails/guardrails/rails_manager.py | Registers HFClassifierInputAction and HFClassifierOutputAction in the IORails action registry; straightforward, no issues. |
| nemoguardrails/library/hf_classifier/flows.co | Colang 2 flows for input, output, retrieval, tool-input and tool-output HF classifier checks; looks correct. |
| tests/test_hf_classifier.py | Comprehensive backend and action unit tests with cache-clearing fixtures; coverage is extensive. |
Sequence Diagram
sequenceDiagram
participant Client
participant API as server/api.py
participant GR as Guardrails
participant IOR as IORails
participant RA as HFClassifierInputAction
participant BE as ClassifierBackend
Note over API: startup: _prewarm_local_hf_classifiers()
API->>BE: _get_or_create_pipeline() [thread pool]
BE-->>API: pipeline cached
Client->>API: POST /v1/chat/completions
API->>GR: _get_rails(config_ids)
GR->>IOR: startup() [if IORails config]
API->>GR: generate_async(messages)
GR->>IOR: generate(messages)
IOR->>RA: "run(flow='hf classifier check input $classifier=toxicity')"
RA->>BE: get_backend(cfg, name)
RA->>BE: classify(text)
BE-->>RA: [ClassificationResult, ...]
RA-->>IOR: "RailResult(is_safe=True/False)"
alt "is_safe=False"
IOR-->>GR: blocked response
else "is_safe=True"
IOR-->>GR: pass-through / LLM call
end
GR-->>API: response
API-->>Client: chat completion
Note over API: shutdown: close_all_backends()
API->>BE: close() all sessions
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
nemoguardrails/library/hf_classifier/backends.py:710-730
**Backend sessions leaked on config hot-reload**
`close_all_backends()` is only called at full app shutdown. When auto-reload fires and removes a stale config from `llm_rails_instances`, its backends stay in the process-global `_backend_instances` dict with open `aiohttp.ClientSession` objects. If the reloaded config changes `endpoint`, `threshold`, or any other field, `get_backend` will create a fresh backend under a new key while the old session's TCP connections remain open indefinitely, leaking file descriptors until the Python GC eventually raises `ResourceWarning: Unclosed client session`.
Reviews (8): Last reviewed commit: ":construction: support hf classifiers as..." | Re-trigger Greptile
📝 WalkthroughWalkthroughAdds a HuggingFace classifier subsystem integrated into Guardrails/IORails: new configs, pluggable backends (local/vLLM/KServe/FMS), action helpers and IORails actions, flow definitions, server prewarm/streaming updates, rails wiring, and comprehensive tests. ChangesHF Classifier Guardrails System
Sequence DiagramssequenceDiagram
actor User
participant ServerAPI as Server API
participant Guardrails
participant IORails as IORails Flow
participant HFBackend as HF Classifier Backend
participant LLM
User->>ServerAPI: POST /chat/completions (message)
ServerAPI->>Guardrails: generate_async()
Guardrails->>IORails: execute input flow
IORails->>HFBackend: classify(user_input)
HFBackend-->>IORails: labels & scores
IORails-->>Guardrails: allowed / blocked
alt Allowed
Guardrails->>LLM: forward message
LLM-->>Guardrails: bot_output
Guardrails->>IORails: execute output flow
IORails->>HFBackend: classify(bot_output)
HFBackend-->>IORails: labels & scores
IORails-->>Guardrails: allowed / blocked
Guardrails-->>ServerAPI: bot response (if allowed)
else Blocked
Guardrails-->>ServerAPI: refusal response
end
sequenceDiagram
participant Startup
participant APIConfig as API Config
participant ConfigLoader as RailsConfig Loader
participant PipelineCache as Local HF Pipeline Cache
participant Transformers as transformers
Startup->>APIConfig: lifespan startup
APIConfig->>ConfigLoader: load configs
ConfigLoader-->>APIConfig: {hf_classifier: configs}
loop per local classifier
APIConfig->>PipelineCache: _get_or_create_pipeline(model_id, task)
alt cached
PipelineCache-->>APIConfig: cached pipeline
else
PipelineCache->>Transformers: pipeline(model_id, task)
Transformers-->>PipelineCache: loaded pipeline
PipelineCache->>PipelineCache: cache and return
end
end
APIConfig-->>Startup: prewarm complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
| def __init__(self, config: RemoteHFClassifierConfig) -> None: | ||
| self._url = config.endpoint.rstrip("/") + "/api/v1/text/contents" | ||
| self._threshold = config.threshold | ||
| self._headers = _build_headers(config) | ||
| self._timeout = _get_timeout(config) | ||
| self._ssl = _build_ssl_context(config) |
There was a problem hiding this comment.
FMS backend threshold baked into instance, not reflected in cache key
FMSBackend.__init__ stores self._threshold = config.threshold and embeds it in every request payload. However, get_backend caches the instance by "{name}:{backend}:{model_name}:{endpoint}" — which does not include the threshold. If two deployed configs share the same classifier name, backend="fms", model_name, and endpoint but specify different threshold values, the second config's requests will silently use the first config's threshold. This is a present correctness defect in multi-config server deployments: the FMS server will filter at the wrong sensitivity level without any warning.
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemoguardrails/library/hf_classifier/backends.py
Line: 363-368
Comment:
**FMS backend threshold baked into instance, not reflected in cache key**
`FMSBackend.__init__` stores `self._threshold = config.threshold` and embeds it in every request payload. However, `get_backend` caches the instance by `"{name}:{backend}:{model_name}:{endpoint}"` — which does not include the threshold. If two deployed configs share the same classifier `name`, `backend="fms"`, `model_name`, and `endpoint` but specify different `threshold` values, the second config's requests will silently use the first config's threshold. This is a present correctness defect in multi-config server deployments: the FMS server will filter at the wrong sensitivity level without any warning.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
nemoguardrails/guardrails/rail_action.py (1)
65-65: ⚡ Quick winKeep parameter parsing aligned with
_param_prefix.Right now
_param_prefixonly affects the Line 92 error text. Line 113 still parses via_get_flow_model(flow), so the base class remains hard-wired to$model=. That makes this hook easy to misuse: a subclass can override_param_prefixand still silently parse the wrong flow argument.Also applies to: 90-113
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nemoguardrails/guardrails/rail_action.py` at line 65, The class currently hard-codes parsing of the flow model to "$model=" via _get_flow_model(flow) while allowing subclasses to override _param_prefix; change the parsing to use the instance _param_prefix everywhere instead of the literal "$model=": update the logic that extracts the model from flow (the call/implementation of _get_flow_model and the flow-parsing block around lines 90–113) to read self._param_prefix when matching and extracting the value, and also update any error messages that reference the prefix to use self._param_prefix so subclasses that override _param_prefix get consistent parsing and errors.tests/test_hf_classifier.py (1)
739-766: 💤 Low valueVerify TimeoutError type in timeout test.
At line 754, the test patches
_get_or_create_pipelineto raiseTimeoutError. However, the actual timeout in_prewarm_local_hf_classifiersis caught viafuture.result(timeout=...), which raisesconcurrent.futures.TimeoutError. The test'sside_effect=TimeoutError("timed out")would raise the builtinTimeoutError, notconcurrent.futures.TimeoutError.Since line 165 in
api.pycatchesTimeoutErrorimported fromconcurrent.futures, and the builtinTimeoutErroris a parent class, this should still work, but the test doesn't accurately simulate the real timeout scenario (where the executor times out waiting for the future, not the function itself raising TimeoutError).Consider clarifying the test or mocking at the executor level for more accurate simulation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_hf_classifier.py` around lines 739 - 766, The test currently uses builtin TimeoutError as side_effect for _get_or_create_pipeline, but _prewarm_local_hf_classifiers actually observes a concurrent.futures.TimeoutError raised from future.result; update the test to accurately simulate that by either setting side_effect to concurrent.futures.TimeoutError("timed out") (importing concurrent.futures) or by patching the Future.result method returned by the executor so it raises concurrent.futures.TimeoutError; keep the assertions and references to _prewarm_local_hf_classifiers, _get_or_create_pipeline, _PipelineLoadError and _pipeline_cache_key unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@nemoguardrails/library/hf_classifier/actions.py`:
- Around line 128-140: The current hf_classifier_check_tool_output assumes
tool_calls are JSON-serializable and calls json.dumps(calls) which can raise on
custom objects; update hf_classifier_check_tool_output to safely convert
tool_calls to text before calling _classify_and_check (e.g., try
json.dumps(calls, default=str) or catch TypeError and fall back to joining
repr()/str() of each call) so non-serializable tool-call objects produce a
stable string for classification instead of crashing; keep references to the
existing parameters and call to _classify_and_check unchanged.
In `@nemoguardrails/library/hf_classifier/backends.py`:
- Around line 431-442: get_backend currently builds cache_key from
name/backend/model_name/endpoint which misses config fields like
task/parameters/threshold and leads to stale cached backends; change cache_key
construction in get_backend to include a deterministic serialization of the
entire HFClassifierConfig (for example use dataclasses.asdict(config) or
config.__dict__ then json.dumps(..., sort_keys=True, default=repr)) and append
that serialized string to cache_key so different configs produce distinct keys;
keep using _backend_instances and _BACKENDS and ensure the serialization is
stable and does not include non-serializable transient state.
---
Nitpick comments:
In `@nemoguardrails/guardrails/rail_action.py`:
- Line 65: The class currently hard-codes parsing of the flow model to "$model="
via _get_flow_model(flow) while allowing subclasses to override _param_prefix;
change the parsing to use the instance _param_prefix everywhere instead of the
literal "$model=": update the logic that extracts the model from flow (the
call/implementation of _get_flow_model and the flow-parsing block around lines
90–113) to read self._param_prefix when matching and extracting the value, and
also update any error messages that reference the prefix to use
self._param_prefix so subclasses that override _param_prefix get consistent
parsing and errors.
In `@tests/test_hf_classifier.py`:
- Around line 739-766: The test currently uses builtin TimeoutError as
side_effect for _get_or_create_pipeline, but _prewarm_local_hf_classifiers
actually observes a concurrent.futures.TimeoutError raised from future.result;
update the test to accurately simulate that by either setting side_effect to
concurrent.futures.TimeoutError("timed out") (importing concurrent.futures) or
by patching the Future.result method returned by the executor so it raises
concurrent.futures.TimeoutError; keep the assertions and references to
_prewarm_local_hf_classifiers, _get_or_create_pipeline, _PipelineLoadError and
_pipeline_cache_key unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 062a145f-97ca-4108-8dff-e5b00f23bb9b
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (14)
nemoguardrails/guardrails/actions/hf_classifier_action.pynemoguardrails/guardrails/guardrails.pynemoguardrails/guardrails/rail_action.pynemoguardrails/guardrails/rails_manager.pynemoguardrails/library/hf_classifier/__init__.pynemoguardrails/library/hf_classifier/actions.pynemoguardrails/library/hf_classifier/backends.pynemoguardrails/library/hf_classifier/flows.conemoguardrails/library/hf_classifier/flows.v1.conemoguardrails/rails/llm/config.pynemoguardrails/server/api.pypyproject.tomltests/test_hf_classifier.pytests/test_hf_classifier_iorails.py
| async def hf_classifier_check_tool_output( | ||
| classifier: str, | ||
| tool_calls: Optional[Any] = None, | ||
| config: Optional[RailsConfig] = None, | ||
| context: Optional[dict] = None, | ||
| **kwargs, | ||
| ) -> bool: | ||
| """Check tool output against a HuggingFace classifier.""" | ||
| if not tool_calls and context: | ||
| tool_calls = context.get("tool_calls") or [] | ||
| calls = tool_calls or [] | ||
| text = json.dumps(calls) if calls else "" | ||
| return await _classify_and_check(classifier, text, config) |
There was a problem hiding this comment.
Don't assume tool calls are directly JSON-serializable.
Line 139 will fail for common tool-call objects that are not plain dict/list primitives. Since this action accepts Any, a non-serializable tool call currently turns the rail into a runtime error instead of a classification decision.
Suggested fix
`@action`(is_system_action=True)
async def hf_classifier_check_tool_output(
classifier: str,
tool_calls: Optional[Any] = None,
config: Optional[RailsConfig] = None,
context: Optional[dict] = None,
**kwargs,
) -> bool:
"""Check tool output against a HuggingFace classifier."""
if not tool_calls and context:
tool_calls = context.get("tool_calls") or []
calls = tool_calls or []
- text = json.dumps(calls) if calls else ""
+ text = (
+ json.dumps(
+ calls,
+ default=lambda value: value.model_dump() if hasattr(value, "model_dump") else str(value),
+ )
+ if calls
+ else ""
+ )
return await _classify_and_check(classifier, text, config)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@nemoguardrails/library/hf_classifier/actions.py` around lines 128 - 140, The
current hf_classifier_check_tool_output assumes tool_calls are JSON-serializable
and calls json.dumps(calls) which can raise on custom objects; update
hf_classifier_check_tool_output to safely convert tool_calls to text before
calling _classify_and_check (e.g., try json.dumps(calls, default=str) or catch
TypeError and fall back to joining repr()/str() of each call) so
non-serializable tool-call objects produce a stable string for classification
instead of crashing; keep references to the existing parameters and call to
_classify_and_check unchanged.
| def get_backend(config: HFClassifierConfig, name: str = "") -> ClassifierBackend: | ||
| """Get or create a cached backend instance from classifier config.""" | ||
| endpoint = getattr(config, "endpoint", "local") | ||
| cache_key = f"{name}:{config.backend}:{config.model_name}:{endpoint}" | ||
| cached = _backend_instances.get(cache_key) | ||
| if cached is not None: | ||
| return cached | ||
| cls = _BACKENDS.get(config.backend) | ||
| if cls is None: | ||
| raise ValueError(f"Unknown hf_classifier backend: '{config.backend}'. Supported: {', '.join(_BACKENDS)}") | ||
| _backend_instances[cache_key] = cls(config) | ||
| return _backend_instances[cache_key] |
There was a problem hiding this comment.
Include the full classifier config in the backend cache key.
Line 434 only keys on name/backend/model_name/endpoint, but backend behavior also depends on fields you omit here: e.g. task and parameters for LocalBackend, and threshold for FMSBackend. Two different classifier configs can therefore reuse the same cached instance and run with stale settings.
Suggested fix
+import json
+
def get_backend(config: HFClassifierConfig, name: str = "") -> ClassifierBackend:
"""Get or create a cached backend instance from classifier config."""
- endpoint = getattr(config, "endpoint", "local")
- cache_key = f"{name}:{config.backend}:{config.model_name}:{endpoint}"
+ cache_key = json.dumps(
+ {
+ "name": name,
+ "config": config.model_dump(mode="json"),
+ },
+ sort_keys=True,
+ )
cached = _backend_instances.get(cache_key)
if cached is not None:
return cached🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@nemoguardrails/library/hf_classifier/backends.py` around lines 431 - 442,
get_backend currently builds cache_key from name/backend/model_name/endpoint
which misses config fields like task/parameters/threshold and leads to stale
cached backends; change cache_key construction in get_backend to include a
deterministic serialization of the entire HFClassifierConfig (for example use
dataclasses.asdict(config) or config.__dict__ then json.dumps(...,
sort_keys=True, default=repr)) and append that serialized string to cache_key so
different configs produce distinct keys; keep using _backend_instances and
_BACKENDS and ensure the serialization is stable and does not include
non-serializable transient state.
b53ba6f to
beae802
Compare
|
Want your agent to iterate on Greptile's feedback? Try greploops. |
bc805f3 to
a8e424e
Compare
|
@m-misiura Thanks for the PR, Looks good, I have some high-level comments: I think this PR should keep the server path on LLMRails and leave the experimental Guardrails/IORails routing out of scope. Ideally we shouldn't (need to) touch the server code in library integrations. There are two HF classifier state issues that still need attention even if IORails is removed: remote classifier backends cache aiohttp I know aiohttp is one of our core dependencies but we might drop it, any issues with using httpx? I would not make this a blocker if it turns into a larger refactor. Let's remove the IORails path from this PR, but keep its commit so we can come back to it later if needed. I can do a final review once those changes are in. 👍🏻 |
Description
Motivation
Currently, the
content_safetyrail is a good demonstration of a flexible pattern where users can point to any supported engine (i.e. nim, openai, vllm, etc) and configure (generative) transformer model to carry out detectionsBased on customer feedback, there is a considerable interest in using lightweight (discriminative) transformer models (specifically token and sequence classifiers) to carry out detections. These models cannot be readily configured via the
content_safetyrail since:different inference APIs: a BERT model served on KServe uses the v1/v2 predict API, not the OpenAI chat completions API that NeMo's LangChain providers expect.
gliner,jailbreak_detection) hardcode their inference backend; users cannot swap between e.g. local HuggingFace Transformers, KServe, vLLM, or HuggingFace Inference Endpoints without changing code.Currently, the recommended approach to configure such models is via custom actions, which can be cumbersome.
Proposed feature
This PR adds a pluggable
hf_classifierrail that lets users run HuggingFace-compatible classification models as input/output guardrails. Classifiers are defined in rails.config.hf_classifier and referenced by name in flows via$classifier=.It support the following backends:
Test plan
test_hf_classifier.pytest_hf_classifier_iorails.pyChecklist
Summary by CodeRabbit
New Features
Chores
Tests