Skip to content

feat: New rail, huggingface lightweight classifiers #1853

Open
m-misiura wants to merge 1 commit into
NVIDIA-NeMo:developfrom
m-misiura:hf_classifiers
Open

feat: New rail, huggingface lightweight classifiers #1853
m-misiura wants to merge 1 commit into
NVIDIA-NeMo:developfrom
m-misiura:hf_classifiers

Conversation

@m-misiura
Copy link
Copy Markdown
Contributor

@m-misiura m-misiura commented May 6, 2026

Description

Motivation

Currently, the content_safety rail is a good demonstration of a flexible pattern where users can point to any supported engine (i.e. nim, openai, vllm, etc) and configure (generative) transformer model to carry out detections
Based on customer feedback, there is a considerable interest in using lightweight (discriminative) transformer models (specifically token and sequence classifiers) to carry out detections. These models cannot be readily configured via the content_safety rail since:

  • different output format: classifiers return label/score pairs, not generated text to be parsed
    different inference APIs: a BERT model served on KServe uses the v1/v2 predict API, not the OpenAI chat completions API that NeMo's LangChain providers expect.
  • no backend abstraction: existing HF-based rails (gliner, jailbreak_detection) hardcode their inference backend; users cannot swap between e.g. local HuggingFace Transformers, KServe, vLLM, or HuggingFace Inference Endpoints without changing code.

Currently, the recommended approach to configure such models is via custom actions, which can be cumbersome.

Proposed feature

This PR adds a pluggable hf_classifier rail that lets users run HuggingFace-compatible classification models as input/output guardrails. Classifiers are defined in rails.config.hf_classifier and referenced by name in flows via $classifier=.

It support the following backends:

  • local — HuggingFace Transformers pipeline (runs in-process, supports text-classification and token-classification tasks)
  • vllm — vLLM /classify endpoint
  • kserve — KServe v1 /v1/models/{name}:predict endpoint
  • fms — FMS guardrails-detectors /api/v1/text/contents endpoint

Test plan

  1. Added additional test files:
  • test_hf_classifier.py
  • test_hf_classifier_iorails.py
  1. Created a demo to test against the live server

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @mentions of the person or team responsible for reviewing proposed changes.

Summary by CodeRabbit

  • New Features

    • HuggingFace classifier integration for moderation across user input, bot output, retrievals, and tool I/O.
    • Support for multiple classifier backends (Local, vLLM, KServe, FMS) and pre-warming for faster responses.
    • Exposed events history cache for improved observability.
    • Improved streaming responses with structured error handling.
  • Chores

    • Added optional HF-classifier dependencies (transformers, torch).
  • Tests

    • Comprehensive tests covering HF classifier behavior and integration.

@Pouyanpi Pouyanpi force-pushed the develop branch 2 times, most recently from 6398f99 to 930faa6 Compare May 6, 2026 15:51
@m-misiura m-misiura force-pushed the hf_classifiers branch 2 times, most recently from 004d830 to 9b58539 Compare May 6, 2026 15:59
@m-misiura m-misiura force-pushed the hf_classifiers branch 2 times, most recently from 863ac8d to 5431798 Compare May 6, 2026 16:04
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

@m-misiura m-misiura marked this pull request as ready for review May 6, 2026 16:30
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 6, 2026

Greptile Summary

This PR introduces a pluggable hf_classifier rail that lets users run HuggingFace-compatible classification models (local Transformers pipeline, vLLM, KServe, FMS) as input/output guardrails without requiring a generative LLM. The implementation adds config models, IORails action classes, Colang flows, backend abstractions, startup prewarming, and comprehensive tests.

  • New hf_classifier rail: adds LocalHFClassifierConfig / RemoteHFClassifierConfig (discriminated union), four backend classes (LocalBackend, VLLMBackend, KServeBackend, FMSBackend), and Colang flows for input, output, retrieval, and tool I/O checks.
  • IORails integration: HFClassifierInputAction / HFClassifierOutputAction are registered in the IORails engine registry, and the corresponding flow names are added to IORAILS_INPUT_FLOWS / IORAILS_OUTPUT_FLOWS, enabling fast classifier-only deployments with no LLM dependency.
  • Server changes: _prewarm_local_hf_classifiers eagerly loads local pipelines at startup (parallel threads, one per model), close_all_backends() is wired to app shutdown, and llm_rails_instances now stores Guardrails (the IORails/LLMRails wrapper) instead of bare LLMRails.

Confidence Score: 4/5

Safe to merge with awareness of the open issues flagged in previous review rounds.

The tool-input classifier always receives an empty string because tool_message is never written to the context by the runtime (flagged in prior review), meaning tool-input classification silently passes every request. That functional gap, combined with a smaller session-leak concern on config hot-reload, keeps the score from the top.

nemoguardrails/library/hf_classifier/actions.py (tool_message context key) and nemoguardrails/library/hf_classifier/backends.py (session lifecycle on hot-reload).

Important Files Changed

Filename Overview
nemoguardrails/library/hf_classifier/backends.py New file: pluggable backends (Local, vLLM, KServe, FMS) for HF classifier inference; addressed previous cache-key and event-loop concerns, but aiohttp sessions are only closed at full shutdown (not on hot-reload).
nemoguardrails/library/hf_classifier/actions.py New file: Colang-callable action functions for input, output, retrieval, tool-input and tool-output classification; tool_message context key is never set by the runtime (flagged in previous review threads), causing tool-input classification to always allow.
nemoguardrails/guardrails/actions/hf_classifier_action.py New file: IORails-path HFClassifierInputAction and HFClassifierOutputAction; threshold/blocked_labels filtering is applied client-side in _parse_classify_response, design looks sound.
nemoguardrails/server/api.py Pre-warm logic now uses max_workers=len(pending) (parallel per-model threads), fixing the previous single-worker sequential issue; close_all_backends() is called on shutdown, but not on config hot-reload.
nemoguardrails/guardrails/guardrails.py Adds hf_classifier flows to IORAILS_INPUT/OUTPUT_FLOWS whitelists; exposes events_history_cache property (no-op for IORails); guards against empty-flows configs routing to IORails.
nemoguardrails/rails/llm/config.py Adds LocalHFClassifierConfig, RemoteHFClassifierConfig, and discriminated-union HFClassifierConfig; validation is thorough (threshold bounds, endpoint scheme, aggregation_strategy guard).
nemoguardrails/guardrails/rail_action.py Adds _param_prefix class attribute so subclasses can customise the error message when a required parameter is missing; change is minimal and correct.
nemoguardrails/guardrails/rails_manager.py Registers HFClassifierInputAction and HFClassifierOutputAction in the IORails action registry; straightforward, no issues.
nemoguardrails/library/hf_classifier/flows.co Colang 2 flows for input, output, retrieval, tool-input and tool-output HF classifier checks; looks correct.
tests/test_hf_classifier.py Comprehensive backend and action unit tests with cache-clearing fixtures; coverage is extensive.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as server/api.py
    participant GR as Guardrails
    participant IOR as IORails
    participant RA as HFClassifierInputAction
    participant BE as ClassifierBackend

    Note over API: startup: _prewarm_local_hf_classifiers()
    API->>BE: _get_or_create_pipeline() [thread pool]
    BE-->>API: pipeline cached

    Client->>API: POST /v1/chat/completions
    API->>GR: _get_rails(config_ids)
    GR->>IOR: startup() [if IORails config]
    API->>GR: generate_async(messages)
    GR->>IOR: generate(messages)
    IOR->>RA: "run(flow='hf classifier check input $classifier=toxicity')"
    RA->>BE: get_backend(cfg, name)
    RA->>BE: classify(text)
    BE-->>RA: [ClassificationResult, ...]
    RA-->>IOR: "RailResult(is_safe=True/False)"
    alt "is_safe=False"
        IOR-->>GR: blocked response
    else "is_safe=True"
        IOR-->>GR: pass-through / LLM call
    end
    GR-->>API: response
    API-->>Client: chat completion

    Note over API: shutdown: close_all_backends()
    API->>BE: close() all sessions
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
nemoguardrails/library/hf_classifier/backends.py:710-730
**Backend sessions leaked on config hot-reload**

`close_all_backends()` is only called at full app shutdown. When auto-reload fires and removes a stale config from `llm_rails_instances`, its backends stay in the process-global `_backend_instances` dict with open `aiohttp.ClientSession` objects. If the reloaded config changes `endpoint`, `threshold`, or any other field, `get_backend` will create a fresh backend under a new key while the old session's TCP connections remain open indefinitely, leaking file descriptors until the Python GC eventually raises `ResourceWarning: Unclosed client session`.

Reviews (8): Last reviewed commit: ":construction: support hf classifiers as..." | Re-trigger Greptile

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

📝 Walkthrough

Walkthrough

Adds a HuggingFace classifier subsystem integrated into Guardrails/IORails: new configs, pluggable backends (local/vLLM/KServe/FMS), action helpers and IORails actions, flow definitions, server prewarm/streaming updates, rails wiring, and comprehensive tests.

Changes

HF Classifier Guardrails System

Layer / File(s) Summary
Configuration & Types
nemoguardrails/rails/llm/config.py, pyproject.toml
Adds HFClassifierConfig discriminated union with LocalHFClassifierConfig and RemoteHFClassifierConfig; adds hf_classifier field to RailsConfigData; declares optional deps transformers and torch and hf-classifier extra.
Data / Cache Utilities
nemoguardrails/library/hf_classifier/__init__.py, nemoguardrails/library/hf_classifier/backends.py
Introduces ClassificationResult TypedDict, backend registry, pipeline cache, SSL/timeout helpers, and sentinel for failed loads.
Backend Implementations
nemoguardrails/library/hf_classifier/backends.py
Implements ClassifierBackend interface and concrete backends: LocalBackend, VLLMBackend, KServeBackend, FMSBackend; includes HTTP/SSL/session handling and get_backend factory.
Library-Level Actions
nemoguardrails/library/hf_classifier/actions.py
Adds _classify_and_check helper and five public actions: hf_classifier_check_input, hf_classifier_check_output, hf_classifier_check_retrieval, hf_classifier_check_tool_input, hf_classifier_check_tool_output that extract text and apply threshold-based blocking.
IORails Action Classes
nemoguardrails/guardrails/actions/hf_classifier_action.py
Adds HFClassifierInputAction and HFClassifierOutputAction that extract classifier from flow params, build prompts, call classifier via task manager, and parse responses into RailResult.
Flows
nemoguardrails/library/hf_classifier/flows.co, nemoguardrails/library/hf_classifier/flows.v1.co
Adds flows for input, output, retrieval, tool-input, tool-output checks; includes denial handling (rail exceptions or canned refusal) and a v1 refusal set.
Framework Integration & Wiring
nemoguardrails/guardrails/guardrails.py, nemoguardrails/guardrails/rail_action.py, nemoguardrails/guardrails/rails_manager.py
Adds IORAILS_INPUT_FLOWS/IORAILS_OUTPUT_FLOWS constants; adds events_history_cache property on Guardrails; makes _param_prefix configurable on RailAction; registers HF classifier actions in _ACTION_CLASSES.
Server Integration & Prewarm
nemoguardrails/server/api.py
Switches server instances to use Guardrails, adds _prewarm_local_hf_classifiers() with timeout/pipeline caching, updates chat generation/streaming to use Guardrails APIs, adds streaming chunk/error models and GuardrailsConfigurationError, and closes hf backends on shutdown.
Tests & Validation
tests/test_hf_classifier.py, tests/test_hf_classifier_iorails.py
Adds comprehensive tests for configs, header/SSL/timeout behavior, backend integrations, prewarm logic, sentinel/cache behavior, action wiring, classifier extraction, threshold blocking, and IORails flow registration.

Sequence Diagrams

sequenceDiagram
    actor User
    participant ServerAPI as Server API
    participant Guardrails
    participant IORails as IORails Flow
    participant HFBackend as HF Classifier Backend
    participant LLM

    User->>ServerAPI: POST /chat/completions (message)
    ServerAPI->>Guardrails: generate_async()
    Guardrails->>IORails: execute input flow
    IORails->>HFBackend: classify(user_input)
    HFBackend-->>IORails: labels & scores
    IORails-->>Guardrails: allowed / blocked
    alt Allowed
        Guardrails->>LLM: forward message
        LLM-->>Guardrails: bot_output
        Guardrails->>IORails: execute output flow
        IORails->>HFBackend: classify(bot_output)
        HFBackend-->>IORails: labels & scores
        IORails-->>Guardrails: allowed / blocked
        Guardrails-->>ServerAPI: bot response (if allowed)
    else Blocked
        Guardrails-->>ServerAPI: refusal response
    end
Loading
sequenceDiagram
    participant Startup
    participant APIConfig as API Config
    participant ConfigLoader as RailsConfig Loader
    participant PipelineCache as Local HF Pipeline Cache
    participant Transformers as transformers

    Startup->>APIConfig: lifespan startup
    APIConfig->>ConfigLoader: load configs
    ConfigLoader-->>APIConfig: {hf_classifier: configs}
    loop per local classifier
        APIConfig->>PipelineCache: _get_or_create_pipeline(model_id, task)
        alt cached
            PipelineCache-->>APIConfig: cached pipeline
        else
            PipelineCache->>Transformers: pipeline(model_id, task)
            Transformers-->>PipelineCache: loaded pipeline
            PipelineCache->>PipelineCache: cache and return
        end
    end
    APIConfig-->>Startup: prewarm complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning Major feature PR lacks documented test results. While 87 unit tests exist in the codebase, the PR description provides no test execution results, coverage metrics, or confirmation of test passage. Add test execution summary to PR description showing: test pass count, test execution confirmation (e.g., CI logs), and code coverage metrics. Note that test checklist items are marked incomplete.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: New rail, huggingface lightweight classifiers' accurately describes the main change in the PR—adding a new HuggingFace classifier-based rail system with pluggable backends and configuration support.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines +363 to +368
def __init__(self, config: RemoteHFClassifierConfig) -> None:
self._url = config.endpoint.rstrip("/") + "/api/v1/text/contents"
self._threshold = config.threshold
self._headers = _build_headers(config)
self._timeout = _get_timeout(config)
self._ssl = _build_ssl_context(config)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 FMS backend threshold baked into instance, not reflected in cache key

FMSBackend.__init__ stores self._threshold = config.threshold and embeds it in every request payload. However, get_backend caches the instance by "{name}:{backend}:{model_name}:{endpoint}" — which does not include the threshold. If two deployed configs share the same classifier name, backend="fms", model_name, and endpoint but specify different threshold values, the second config's requests will silently use the first config's threshold. This is a present correctness defect in multi-config server deployments: the FMS server will filter at the wrong sensitivity level without any warning.

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemoguardrails/library/hf_classifier/backends.py
Line: 363-368

Comment:
**FMS backend threshold baked into instance, not reflected in cache key**

`FMSBackend.__init__` stores `self._threshold = config.threshold` and embeds it in every request payload. However, `get_backend` caches the instance by `"{name}:{backend}:{model_name}:{endpoint}"` — which does not include the threshold. If two deployed configs share the same classifier `name`, `backend="fms"`, `model_name`, and `endpoint` but specify different `threshold` values, the second config's requests will silently use the first config's threshold. This is a present correctness defect in multi-config server deployments: the FMS server will filter at the wrong sensitivity level without any warning.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
nemoguardrails/guardrails/rail_action.py (1)

65-65: ⚡ Quick win

Keep parameter parsing aligned with _param_prefix.

Right now _param_prefix only affects the Line 92 error text. Line 113 still parses via _get_flow_model(flow), so the base class remains hard-wired to $model=. That makes this hook easy to misuse: a subclass can override _param_prefix and still silently parse the wrong flow argument.

Also applies to: 90-113

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nemoguardrails/guardrails/rail_action.py` at line 65, The class currently
hard-codes parsing of the flow model to "$model=" via _get_flow_model(flow)
while allowing subclasses to override _param_prefix; change the parsing to use
the instance _param_prefix everywhere instead of the literal "$model=": update
the logic that extracts the model from flow (the call/implementation of
_get_flow_model and the flow-parsing block around lines 90–113) to read
self._param_prefix when matching and extracting the value, and also update any
error messages that reference the prefix to use self._param_prefix so subclasses
that override _param_prefix get consistent parsing and errors.
tests/test_hf_classifier.py (1)

739-766: 💤 Low value

Verify TimeoutError type in timeout test.

At line 754, the test patches _get_or_create_pipeline to raise TimeoutError. However, the actual timeout in _prewarm_local_hf_classifiers is caught via future.result(timeout=...), which raises concurrent.futures.TimeoutError. The test's side_effect=TimeoutError("timed out") would raise the builtin TimeoutError, not concurrent.futures.TimeoutError.

Since line 165 in api.py catches TimeoutError imported from concurrent.futures, and the builtin TimeoutError is a parent class, this should still work, but the test doesn't accurately simulate the real timeout scenario (where the executor times out waiting for the future, not the function itself raising TimeoutError).

Consider clarifying the test or mocking at the executor level for more accurate simulation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_hf_classifier.py` around lines 739 - 766, The test currently uses
builtin TimeoutError as side_effect for _get_or_create_pipeline, but
_prewarm_local_hf_classifiers actually observes a
concurrent.futures.TimeoutError raised from future.result; update the test to
accurately simulate that by either setting side_effect to
concurrent.futures.TimeoutError("timed out") (importing concurrent.futures) or
by patching the Future.result method returned by the executor so it raises
concurrent.futures.TimeoutError; keep the assertions and references to
_prewarm_local_hf_classifiers, _get_or_create_pipeline, _PipelineLoadError and
_pipeline_cache_key unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nemoguardrails/library/hf_classifier/actions.py`:
- Around line 128-140: The current hf_classifier_check_tool_output assumes
tool_calls are JSON-serializable and calls json.dumps(calls) which can raise on
custom objects; update hf_classifier_check_tool_output to safely convert
tool_calls to text before calling _classify_and_check (e.g., try
json.dumps(calls, default=str) or catch TypeError and fall back to joining
repr()/str() of each call) so non-serializable tool-call objects produce a
stable string for classification instead of crashing; keep references to the
existing parameters and call to _classify_and_check unchanged.

In `@nemoguardrails/library/hf_classifier/backends.py`:
- Around line 431-442: get_backend currently builds cache_key from
name/backend/model_name/endpoint which misses config fields like
task/parameters/threshold and leads to stale cached backends; change cache_key
construction in get_backend to include a deterministic serialization of the
entire HFClassifierConfig (for example use dataclasses.asdict(config) or
config.__dict__ then json.dumps(..., sort_keys=True, default=repr)) and append
that serialized string to cache_key so different configs produce distinct keys;
keep using _backend_instances and _BACKENDS and ensure the serialization is
stable and does not include non-serializable transient state.

---

Nitpick comments:
In `@nemoguardrails/guardrails/rail_action.py`:
- Line 65: The class currently hard-codes parsing of the flow model to "$model="
via _get_flow_model(flow) while allowing subclasses to override _param_prefix;
change the parsing to use the instance _param_prefix everywhere instead of the
literal "$model=": update the logic that extracts the model from flow (the
call/implementation of _get_flow_model and the flow-parsing block around lines
90–113) to read self._param_prefix when matching and extracting the value, and
also update any error messages that reference the prefix to use
self._param_prefix so subclasses that override _param_prefix get consistent
parsing and errors.

In `@tests/test_hf_classifier.py`:
- Around line 739-766: The test currently uses builtin TimeoutError as
side_effect for _get_or_create_pipeline, but _prewarm_local_hf_classifiers
actually observes a concurrent.futures.TimeoutError raised from future.result;
update the test to accurately simulate that by either setting side_effect to
concurrent.futures.TimeoutError("timed out") (importing concurrent.futures) or
by patching the Future.result method returned by the executor so it raises
concurrent.futures.TimeoutError; keep the assertions and references to
_prewarm_local_hf_classifiers, _get_or_create_pipeline, _PipelineLoadError and
_pipeline_cache_key unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 062a145f-97ca-4108-8dff-e5b00f23bb9b

📥 Commits

Reviewing files that changed from the base of the PR and between c69efe5 and a95a49d.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • nemoguardrails/guardrails/actions/hf_classifier_action.py
  • nemoguardrails/guardrails/guardrails.py
  • nemoguardrails/guardrails/rail_action.py
  • nemoguardrails/guardrails/rails_manager.py
  • nemoguardrails/library/hf_classifier/__init__.py
  • nemoguardrails/library/hf_classifier/actions.py
  • nemoguardrails/library/hf_classifier/backends.py
  • nemoguardrails/library/hf_classifier/flows.co
  • nemoguardrails/library/hf_classifier/flows.v1.co
  • nemoguardrails/rails/llm/config.py
  • nemoguardrails/server/api.py
  • pyproject.toml
  • tests/test_hf_classifier.py
  • tests/test_hf_classifier_iorails.py

Comment on lines +128 to +140
async def hf_classifier_check_tool_output(
classifier: str,
tool_calls: Optional[Any] = None,
config: Optional[RailsConfig] = None,
context: Optional[dict] = None,
**kwargs,
) -> bool:
"""Check tool output against a HuggingFace classifier."""
if not tool_calls and context:
tool_calls = context.get("tool_calls") or []
calls = tool_calls or []
text = json.dumps(calls) if calls else ""
return await _classify_and_check(classifier, text, config)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't assume tool calls are directly JSON-serializable.

Line 139 will fail for common tool-call objects that are not plain dict/list primitives. Since this action accepts Any, a non-serializable tool call currently turns the rail into a runtime error instead of a classification decision.

Suggested fix
 `@action`(is_system_action=True)
 async def hf_classifier_check_tool_output(
     classifier: str,
     tool_calls: Optional[Any] = None,
     config: Optional[RailsConfig] = None,
     context: Optional[dict] = None,
     **kwargs,
 ) -> bool:
     """Check tool output against a HuggingFace classifier."""
     if not tool_calls and context:
         tool_calls = context.get("tool_calls") or []
     calls = tool_calls or []
-    text = json.dumps(calls) if calls else ""
+    text = (
+        json.dumps(
+            calls,
+            default=lambda value: value.model_dump() if hasattr(value, "model_dump") else str(value),
+        )
+        if calls
+        else ""
+    )
     return await _classify_and_check(classifier, text, config)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nemoguardrails/library/hf_classifier/actions.py` around lines 128 - 140, The
current hf_classifier_check_tool_output assumes tool_calls are JSON-serializable
and calls json.dumps(calls) which can raise on custom objects; update
hf_classifier_check_tool_output to safely convert tool_calls to text before
calling _classify_and_check (e.g., try json.dumps(calls, default=str) or catch
TypeError and fall back to joining repr()/str() of each call) so
non-serializable tool-call objects produce a stable string for classification
instead of crashing; keep references to the existing parameters and call to
_classify_and_check unchanged.

Comment on lines +431 to +442
def get_backend(config: HFClassifierConfig, name: str = "") -> ClassifierBackend:
"""Get or create a cached backend instance from classifier config."""
endpoint = getattr(config, "endpoint", "local")
cache_key = f"{name}:{config.backend}:{config.model_name}:{endpoint}"
cached = _backend_instances.get(cache_key)
if cached is not None:
return cached
cls = _BACKENDS.get(config.backend)
if cls is None:
raise ValueError(f"Unknown hf_classifier backend: '{config.backend}'. Supported: {', '.join(_BACKENDS)}")
_backend_instances[cache_key] = cls(config)
return _backend_instances[cache_key]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Include the full classifier config in the backend cache key.

Line 434 only keys on name/backend/model_name/endpoint, but backend behavior also depends on fields you omit here: e.g. task and parameters for LocalBackend, and threshold for FMSBackend. Two different classifier configs can therefore reuse the same cached instance and run with stale settings.

Suggested fix
+import json
+
 def get_backend(config: HFClassifierConfig, name: str = "") -> ClassifierBackend:
     """Get or create a cached backend instance from classifier config."""
-    endpoint = getattr(config, "endpoint", "local")
-    cache_key = f"{name}:{config.backend}:{config.model_name}:{endpoint}"
+    cache_key = json.dumps(
+        {
+            "name": name,
+            "config": config.model_dump(mode="json"),
+        },
+        sort_keys=True,
+    )
     cached = _backend_instances.get(cache_key)
     if cached is not None:
         return cached
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nemoguardrails/library/hf_classifier/backends.py` around lines 431 - 442,
get_backend currently builds cache_key from name/backend/model_name/endpoint
which misses config fields like task/parameters/threshold and leads to stale
cached backends; change cache_key construction in get_backend to include a
deterministic serialization of the entire HFClassifierConfig (for example use
dataclasses.asdict(config) or config.__dict__ then json.dumps(...,
sort_keys=True, default=repr)) and append that serialized string to cache_key so
different configs produce distinct keys; keep using _backend_instances and
_BACKENDS and ensure the serialization is stable and does not include
non-serializable transient state.

@m-misiura m-misiura force-pushed the hf_classifiers branch 2 times, most recently from b53ba6f to beae802 Compare May 6, 2026 17:23
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 6, 2026

Want your agent to iterate on Greptile's feedback? Try greploops.

@m-misiura m-misiura force-pushed the hf_classifiers branch 2 times, most recently from bc805f3 to a8e424e Compare May 7, 2026 08:24
@Pouyanpi Pouyanpi self-requested a review May 26, 2026 14:07
@Pouyanpi Pouyanpi added the enhancement New feature or request label May 26, 2026
@Pouyanpi Pouyanpi added this to the v0.23.0 milestone May 26, 2026
@Pouyanpi
Copy link
Copy Markdown
Collaborator

@m-misiura Thanks for the PR, Looks good, I have some high-level comments:

I think this PR should keep the server path on LLMRails and leave the experimental Guardrails/IORails routing out of scope. Ideally we shouldn't (need to) touch the server code in library integrations.

There are two HF classifier state issues that still need attention even if IORails is removed: remote classifier backends cache aiohttp ClientSession objects that can be reused after their original event loop is gone, and local classifier prewarm timeouts can leave the model loading thread running long enough to overwrite the timeout sentinel in _pipelines.

I know aiohttp is one of our core dependencies but we might drop it, any issues with using httpx? I would not make this a blocker if it turns into a larger refactor.

Let's remove the IORails path from this PR, but keep its commit so we can come back to it later if needed. I can do a final review once those changes are in. 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants