Skip to content

fix(sdk): abort run loop on repeated Stop-hook denials without progress#3183

Open
VascoSch92 wants to merge 5 commits intomainfrom
vasco/fix-hooks
Open

fix(sdk): abort run loop on repeated Stop-hook denials without progress#3183
VascoSch92 wants to merge 5 commits intomainfrom
vasco/fix-hooks

Conversation

@VascoSch92
Copy link
Copy Markdown
Contributor

@VascoSch92 VascoSch92 commented May 9, 2026

  • A human has tested these changes.

Why

https://openhands-ai.slack.com/archives/C06R25BT5B2/p1778340691700839

Summary

  • Adds max_consecutive_stop_denials (default 5) to LocalConversation. When a Stop hook denies repeatedly with no ActionEvent between denials, the run loop now aborts with a ConversationErrorEvent(code="StopHookLoopDetected") and
    ERROR status instead of looping until max_iteration_per_run (default 500) is exhausted.
  • The streak counter resets whenever the agent emits an ActionEvent between two denials, so legitimate iterative work keeps running.

Surfaced from a trajectory where a misconfigured .openhands/hooks/on_stop.sh (running npm run lint against an env missing dev deps) returned decision: deny on every Stop event. The agent recognized the hook was broken and emitted
text-only "I'm done" turns, but each one re-triggered the Stop hook — 56 cycles before the user manually paused, and nothing tool-actionable for the agent to do. Without this guard the loop would have run to the 500-iteration cap.

Type

  • Bug fix
  • Feature
  • Refactor
  • Breaking change
  • Docs / chore

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:7f8ae29-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-7f8ae29-python \
  ghcr.io/openhands/agent-server:7f8ae29-python

All tags pushed for this build

ghcr.io/openhands/agent-server:7f8ae29-golang-amd64
ghcr.io/openhands/agent-server:7f8ae299c32960a1371731c11ace0bb9720d7c29-golang-amd64
ghcr.io/openhands/agent-server:vasco-fix-hooks-golang-amd64
ghcr.io/openhands/agent-server:7f8ae29-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:7f8ae29-golang-arm64
ghcr.io/openhands/agent-server:7f8ae299c32960a1371731c11ace0bb9720d7c29-golang-arm64
ghcr.io/openhands/agent-server:vasco-fix-hooks-golang-arm64
ghcr.io/openhands/agent-server:7f8ae29-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:7f8ae29-java-amd64
ghcr.io/openhands/agent-server:7f8ae299c32960a1371731c11ace0bb9720d7c29-java-amd64
ghcr.io/openhands/agent-server:vasco-fix-hooks-java-amd64
ghcr.io/openhands/agent-server:7f8ae29-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:7f8ae29-java-arm64
ghcr.io/openhands/agent-server:7f8ae299c32960a1371731c11ace0bb9720d7c29-java-arm64
ghcr.io/openhands/agent-server:vasco-fix-hooks-java-arm64
ghcr.io/openhands/agent-server:7f8ae29-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:7f8ae29-python-amd64
ghcr.io/openhands/agent-server:7f8ae299c32960a1371731c11ace0bb9720d7c29-python-amd64
ghcr.io/openhands/agent-server:vasco-fix-hooks-python-amd64
ghcr.io/openhands/agent-server:7f8ae29-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:7f8ae29-python-arm64
ghcr.io/openhands/agent-server:7f8ae299c32960a1371731c11ace0bb9720d7c29-python-arm64
ghcr.io/openhands/agent-server:vasco-fix-hooks-python-arm64
ghcr.io/openhands/agent-server:7f8ae29-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:7f8ae29-golang
ghcr.io/openhands/agent-server:7f8ae299c32960a1371731c11ace0bb9720d7c29-golang
ghcr.io/openhands/agent-server:vasco-fix-hooks-golang
ghcr.io/openhands/agent-server:7f8ae29-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:7f8ae29-java
ghcr.io/openhands/agent-server:7f8ae299c32960a1371731c11ace0bb9720d7c29-java
ghcr.io/openhands/agent-server:vasco-fix-hooks-java
ghcr.io/openhands/agent-server:7f8ae29-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:7f8ae29-python
ghcr.io/openhands/agent-server:7f8ae299c32960a1371731c11ace0bb9720d7c29-python
ghcr.io/openhands/agent-server:vasco-fix-hooks-python
ghcr.io/openhands/agent-server:7f8ae29-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., 7f8ae29-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 7f8ae29-python-amd64) are also available if needed

@VascoSch92 VascoSch92 requested a review from all-hands-bot May 9, 2026 20:50
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/conversation/impl
   local_conversation.py4632594%303, 308, 440, 486, 523, 539, 604, 860–861, 864, 1024, 1032, 1034, 1038–1039, 1050, 1052–1054, 1079, 1274, 1278, 1348, 1355–1356
TOTAL25993743171% 

all-hands-bot

This comment was marked as outdated.

all-hands-bot

This comment was marked as outdated.

@VascoSch92 VascoSch92 requested a review from all-hands-bot May 9, 2026 21:00
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Pragmatic fix for a real production issue. The critical off-by-one bugs from previous reviews have been addressed.

[EVALUATION REQUIRED]

This PR changes agent loop termination logic, which falls into the eval-risk category per repository guidelines. Before approval, this needs:

  1. Eval evidence: Run a lightweight benchmark (e.g., subset of SWE-bench) to verify the change doesn't regress agent performance
  2. Human review: A maintainer should confirm eval results from the monitor

The change is defensive and only triggers in misconfigured scenarios (broken Stop hooks), so normal operation should be unaffected. However, any loop logic change warrants verification.

[POSITIVE NOTES]

Critical fixes verified:

  • Off-by-one error fixed: now uses >= instead of >
  • Test assertion corrected: expects max_denials not max_denials + 1
  • Behavior now matches documentation

Code quality:

  • Concurrency properly handled (state access within with self._state: lock)
  • Clear error messages with context (count, limit, feedback)
  • Comprehensive test coverage (abort case + streak reset case)
  • Clean integration with existing hook system

[RISK ASSESSMENT]

⚠️ Risk Level: 🟡 MEDIUM

Rationale:

  • ➕ Fixes real production issue (infinite loop on misconfigured hooks)
  • ➕ Only affects edge case (repeated hook denials)
  • ➕ Improves user experience (clear error vs. hitting iteration cap)
  • ⚠️ Changes loop termination logic (requires eval validation)
  • ⚠️ New code path in critical run() method

Recommendation: Request lightweight eval run before merging. The code itself is solid, but eval validation is standard practice for loop logic changes per repo policy.

VERDICT:

⏸️ Needs eval validation before approval - Core logic is sound and previous critical issues are resolved. Once eval results confirm no performance regression, this is ready to merge.

@VascoSch92 VascoSch92 marked this pull request as ready for review May 9, 2026 21:04
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Pragmatic fix for a real production issue. Previous critical bugs have been addressed.

Previous Critical Issues - RESOLVED
The off-by-one error and test assertion issues from previous reviews have been fixed correctly.

[EVALUATION REQUIRED]

This PR changes agent loop termination logic, which falls into the eval-risk category per repository guidelines. Before approval, this needs:

  1. Eval evidence: Run a lightweight benchmark (e.g., subset of SWE-bench) to verify the change doesn't regress agent performance
  2. Human review: A maintainer should confirm eval results showing no unexpected impact on agent decision-making or task completion rates

The QA report validates the feature works as designed, but eval validation is required for changes affecting agent loop behavior.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM

Changes agent execution loop termination conditions. While this addresses a real production issue (Stop hook denial loops), any modification to loop control logic requires eval validation to ensure no unintended impact on benchmark performance. The streak reset logic is sound and preserves legitimate iterative work.

VERDICT:
⏸️ Needs eval validation: Code quality is good, logic is correct, but eval evidence is required per repository policy before approval.

KEY INSIGHT:
The fix properly prevents infinite loops while preserving agent's ability to iterate on work, but needs benchmark validation to confirm no regression in agent effectiveness.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

This PR successfully prevents infinite loops when a stop hook denies repeatedly without progress, while preserving legitimate iterative workflows.

Does this PR achieve its stated goal?

Yes. The PR set out to fix an infinite loop caused by misconfigured stop hooks that deny repeatedly without the agent making progress. I verified that:

  1. The agent now aborts after exactly 5 consecutive stop hook denials (configurable via max_consecutive_stop_denials)
  2. The loop emits a ConversationErrorEvent with code StopHookLoopDetected and sets status to ERROR
  3. The streak counter correctly resets when the agent emits ActionEvents between denials, allowing legitimate iterative work to continue
  4. The parameter is configurable and properly propagated to forked conversations
Phase Result
Environment Setup ✅ Dependencies installed, SDK v1.21.1 ready
CI Status ⚠️ Most checks passing (sdk-tests, agent-server-tests, workspace-tests, tools-tests, pre-commit); 1 unrelated check failing (unresolved-review-threads); 1 pending (python-amd64 build)
Functional Verification ✅ All 4 scenarios verified end-to-end
Functional Verification

Test 1: Always-denying stop hook aborts after 5 denials

Setup: Created a stop hook script that always returns {"decision": "deny", "reason": "always denying for testing"} and configured a LocalConversation with max_consecutive_stop_denials=5.

Step 1 — Baseline (without fix):
Without this PR, such a misconfigured hook would loop 500 times (the max_iteration_per_run limit) before stopping, as described in the PR motivation from the Slack thread.

Step 2 — Apply PR changes:
Checked out branch vasco/fix-hooks (commit a9e46642).

Step 3 — Run with fix in place:
Ran verification script:

# Created a hook that always denies
# Set max_consecutive_stop_denials=5
conversation.send_message("Hello")
conversation.run()

Output:

Final execution status: ConversationExecutionStatus.ERROR
Number of error events: 1
Error code: StopHookLoopDetected
Error detail: Stop hook denied 5 times without progress (limit 5). Last feedback: always denying for testing
Number of stop hook denials: 5

✅ PASSED: Execution status is ERROR
✅ PASSED: Error event has correct code 'StopHookLoopDetected'
✅ PASSED: Exactly 5 stop hook denials occurred before abort

Interpretation: The agent correctly aborts after exactly 5 consecutive denials, emits the expected error event, and transitions to ERROR status instead of looping indefinitely. This confirms the fix works.


Test 2: Streak resets on progress (ActionEvent emission)

Setup: Created a hook that denies 4 times then allows, with an agent that emits an ActionEvent on the 3rd step (between denials 2 and 3). Set max_consecutive_stop_denials=3 to ensure the streak reset is required for the test to pass.

Expected behavior: Without the streak reset, the agent would abort after 3 consecutive denials. With the reset, the ActionEvent on step 3 should reset the counter, allowing all 4 denials to occur before the hook finally allows the stop.

Output:

Final execution status: ConversationExecutionStatus.FINISHED
Total steps: 5
Number of error events: 0
Number of stop hook denials: 4
Number of action events (progress): 1

✅ PASSED: Did not abort (streak reset worked)
✅ PASSED: No error events (no abort)
✅ PASSED: Agent made progress (1 action event(s))
✅ PASSED: Conversation completed successfully

Interpretation: The agent did NOT abort despite encountering 4 denials with a limit of 3, because the ActionEvent emitted on step 3 reset the consecutive denial counter. This confirms that legitimate iterative work (where the agent makes progress between denials) continues uninterrupted.


Test 3: Configurable limit parameter

Setup: Created a conversation with max_consecutive_stop_denials=2 (custom value) and an always-denying hook.

Output:

Custom limit set to: 2
Number of denials before abort: 2
✅ PASSED: Custom limit (2) was respected

Interpretation: The configurable parameter works correctly — the agent aborted after exactly 2 denials instead of the default 5.


Test 4: Parameter propagated to forked conversations

Setup: Created a conversation with max_consecutive_stop_denials=7, then forked it.

Output:

Original conversation max_consecutive_stop_denials: 7
Forked conversation max_consecutive_stop_denials: 7
✅ PASSED: Parameter propagated to forked conversation

Interpretation: The parameter is correctly propagated when forking conversations (line 383 in the diff confirms this).

Issues Found

None.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants