fix(sdk): abort run loop on repeated Stop-hook denials without progress#3183
fix(sdk): abort run loop on repeated Stop-hook denials without progress#3183VascoSch92 wants to merge 5 commits intomainfrom
Conversation
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable - Pragmatic fix for a real production issue. The critical off-by-one bugs from previous reviews have been addressed.
[EVALUATION REQUIRED]
This PR changes agent loop termination logic, which falls into the eval-risk category per repository guidelines. Before approval, this needs:
- Eval evidence: Run a lightweight benchmark (e.g., subset of SWE-bench) to verify the change doesn't regress agent performance
- Human review: A maintainer should confirm eval results from the monitor
The change is defensive and only triggers in misconfigured scenarios (broken Stop hooks), so normal operation should be unaffected. However, any loop logic change warrants verification.
[POSITIVE NOTES]
✅ Critical fixes verified:
- Off-by-one error fixed: now uses
>=instead of> - Test assertion corrected: expects
max_denialsnotmax_denials + 1 - Behavior now matches documentation
✅ Code quality:
- Concurrency properly handled (state access within
with self._state:lock) - Clear error messages with context (count, limit, feedback)
- Comprehensive test coverage (abort case + streak reset case)
- Clean integration with existing hook system
[RISK ASSESSMENT]
Rationale:
- ➕ Fixes real production issue (infinite loop on misconfigured hooks)
- ➕ Only affects edge case (repeated hook denials)
- ➕ Improves user experience (clear error vs. hitting iteration cap)
⚠️ Changes loop termination logic (requires eval validation)⚠️ New code path in criticalrun()method
Recommendation: Request lightweight eval run before merging. The code itself is solid, but eval validation is standard practice for loop logic changes per repo policy.
VERDICT:
⏸️ Needs eval validation before approval - Core logic is sound and previous critical issues are resolved. Once eval results confirm no performance regression, this is ready to merge.
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable - Pragmatic fix for a real production issue. Previous critical bugs have been addressed.
Previous Critical Issues - RESOLVED ✅
The off-by-one error and test assertion issues from previous reviews have been fixed correctly.
[EVALUATION REQUIRED]
This PR changes agent loop termination logic, which falls into the eval-risk category per repository guidelines. Before approval, this needs:
- Eval evidence: Run a lightweight benchmark (e.g., subset of SWE-bench) to verify the change doesn't regress agent performance
- Human review: A maintainer should confirm eval results showing no unexpected impact on agent decision-making or task completion rates
The QA report validates the feature works as designed, but eval validation is required for changes affecting agent loop behavior.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM
Changes agent execution loop termination conditions. While this addresses a real production issue (Stop hook denial loops), any modification to loop control logic requires eval validation to ensure no unintended impact on benchmark performance. The streak reset logic is sound and preserves legitimate iterative work.
VERDICT:
⏸️ Needs eval validation: Code quality is good, logic is correct, but eval evidence is required per repository policy before approval.
KEY INSIGHT:
The fix properly prevents infinite loops while preserving agent's ability to iterate on work, but needs benchmark validation to confirm no regression in agent effectiveness.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
This PR successfully prevents infinite loops when a stop hook denies repeatedly without progress, while preserving legitimate iterative workflows.
Does this PR achieve its stated goal?
Yes. The PR set out to fix an infinite loop caused by misconfigured stop hooks that deny repeatedly without the agent making progress. I verified that:
- The agent now aborts after exactly 5 consecutive stop hook denials (configurable via
max_consecutive_stop_denials) - The loop emits a
ConversationErrorEventwith codeStopHookLoopDetectedand sets status toERROR - The streak counter correctly resets when the agent emits
ActionEvents between denials, allowing legitimate iterative work to continue - The parameter is configurable and properly propagated to forked conversations
| Phase | Result |
|---|---|
| Environment Setup | ✅ Dependencies installed, SDK v1.21.1 ready |
| CI Status | |
| Functional Verification | ✅ All 4 scenarios verified end-to-end |
Functional Verification
Test 1: Always-denying stop hook aborts after 5 denials
Setup: Created a stop hook script that always returns {"decision": "deny", "reason": "always denying for testing"} and configured a LocalConversation with max_consecutive_stop_denials=5.
Step 1 — Baseline (without fix):
Without this PR, such a misconfigured hook would loop 500 times (the max_iteration_per_run limit) before stopping, as described in the PR motivation from the Slack thread.
Step 2 — Apply PR changes:
Checked out branch vasco/fix-hooks (commit a9e46642).
Step 3 — Run with fix in place:
Ran verification script:
# Created a hook that always denies
# Set max_consecutive_stop_denials=5
conversation.send_message("Hello")
conversation.run()Output:
Final execution status: ConversationExecutionStatus.ERROR
Number of error events: 1
Error code: StopHookLoopDetected
Error detail: Stop hook denied 5 times without progress (limit 5). Last feedback: always denying for testing
Number of stop hook denials: 5
✅ PASSED: Execution status is ERROR
✅ PASSED: Error event has correct code 'StopHookLoopDetected'
✅ PASSED: Exactly 5 stop hook denials occurred before abort
Interpretation: The agent correctly aborts after exactly 5 consecutive denials, emits the expected error event, and transitions to ERROR status instead of looping indefinitely. This confirms the fix works.
Test 2: Streak resets on progress (ActionEvent emission)
Setup: Created a hook that denies 4 times then allows, with an agent that emits an ActionEvent on the 3rd step (between denials 2 and 3). Set max_consecutive_stop_denials=3 to ensure the streak reset is required for the test to pass.
Expected behavior: Without the streak reset, the agent would abort after 3 consecutive denials. With the reset, the ActionEvent on step 3 should reset the counter, allowing all 4 denials to occur before the hook finally allows the stop.
Output:
Final execution status: ConversationExecutionStatus.FINISHED
Total steps: 5
Number of error events: 0
Number of stop hook denials: 4
Number of action events (progress): 1
✅ PASSED: Did not abort (streak reset worked)
✅ PASSED: No error events (no abort)
✅ PASSED: Agent made progress (1 action event(s))
✅ PASSED: Conversation completed successfully
Interpretation: The agent did NOT abort despite encountering 4 denials with a limit of 3, because the ActionEvent emitted on step 3 reset the consecutive denial counter. This confirms that legitimate iterative work (where the agent makes progress between denials) continues uninterrupted.
Test 3: Configurable limit parameter
Setup: Created a conversation with max_consecutive_stop_denials=2 (custom value) and an always-denying hook.
Output:
Custom limit set to: 2
Number of denials before abort: 2
✅ PASSED: Custom limit (2) was respected
Interpretation: The configurable parameter works correctly — the agent aborted after exactly 2 denials instead of the default 5.
Test 4: Parameter propagated to forked conversations
Setup: Created a conversation with max_consecutive_stop_denials=7, then forked it.
Output:
Original conversation max_consecutive_stop_denials: 7
Forked conversation max_consecutive_stop_denials: 7
✅ PASSED: Parameter propagated to forked conversation
Interpretation: The parameter is correctly propagated when forking conversations (line 383 in the diff confirms this).
Issues Found
None.
Why
https://openhands-ai.slack.com/archives/C06R25BT5B2/p1778340691700839
Summary
ERROR status instead of looping until max_iteration_per_run (default 500) is exhausted.
Surfaced from a trajectory where a misconfigured
.openhands/hooks/on_stop.sh(running npm run lint against an env missing dev deps) returned decision: deny on every Stop event. The agent recognized the hook was broken and emittedtext-only "I'm done" turns, but each one re-triggered the Stop hook — 56 cycles before the user manually paused, and nothing tool-actionable for the agent to do. Without this guard the loop would have run to the 500-iteration cap.
Type
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:7f8ae29-pythonRun
All tags pushed for this build
About Multi-Architecture Support
7f8ae29-python) is a multi-arch manifest supporting both amd64 and arm647f8ae29-python-amd64) are also available if needed