fix(agent-server): run bash cleanup traps on timeout via process-group SIGTERM#3175
Conversation
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
LGTM! ✅
Clean fix for the trap execution bug. The process-group signaling approach is correct and follows established patterns in the codebase.
Risk: 🟢 LOW - Bug fix with existing test coverage, no agent behavior changes.
all-hands-bot
left a comment
There was a problem hiding this comment.
LGTM! 👍
Clean fix for the cleanup trap bug. The process-group signaling approach is correct and follows the established pattern already used in hooks/executor.py and subprocess_terminal.py.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
Bug fix with existing test coverage. The fix makes bash timeout handling behave correctly (allowing cleanup traps to run), and stress tests verify no zombie processes or event loop blocking. No agent decision-making logic affected.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Verified the fix works end-to-end by running actual bash commands with TERM traps that execute on timeout.
Does this PR achieve its stated goal?
Yes. The PR fixes the bug where bash cleanup traps didn't run on timeout. Previously, the agent-server sent SIGKILL before SIGTERM, preventing user-installed traps from executing. The fix spawns commands in a new session and signals the entire process group with SIGTERM first (1s grace period), then escalates to SIGKILL only if needed. Real-world testing confirms that cleanup traps now execute successfully when commands timeout.
| Phase | Result |
|---|---|
| Environment Setup | ✅ uv sync --dev completed successfully |
| CI Status | ✅ All critical checks passing (agent-server-tests, sdk-tests, pre-commit, coverage-report) |
| Functional Verification | ✅ TERM traps execute on timeout; no regressions in stress tests |
Functional Verification
Test 1: Verify the bug existed on main
Step 1 — Establish baseline (main branch without fix):
Checked out main branch and inspected the test:
$ git checkout main
$ head -60 tests/agent_server/test_bash_service.py | tail -20The test was marked @pytest.mark.xfail with reason:
@pytest.mark.xfail(
strict=True,
reason=(
"bash_service.py:276 sends SIGKILL before SIGTERM, so cleanup "
"traps never run. Fix: terminate() → wait → kill() (see "
"vscode_service.py:78-84). "
"Tracked in https://github.com/OpenHands/software-agent-sdk/issues/3120."
),
)This confirms the bug was known and the test was expected to fail.
Step 2 — Verify the bug in code:
Inspected bash_service.py on main (lines 274-282):
except TimeoutError:
# Kill the process if it times out
process.kill() # Line 276: SIGKILL sent FIRST
try:
# Give the process a short time to die gracefully
await asyncio.wait_for(process.wait(), timeout=1.0)
except TimeoutError:
# If it still won't die, terminate it more forcefully
process.terminate() # SIGTERM sent as fallback (too late)This shows the bug: process.kill() (SIGKILL) is sent immediately, making traps uncatchable. process.terminate() (SIGTERM) only runs if SIGKILL somehow doesn't work, which is backwards.
Ran the test on main:
$ uv run pytest tests/agent_server/test_bash_service.py::test_bash_timeout_runs_sigterm_trap -v
...
test_bash_service.py::test_bash_timeout_runs_sigterm_trap XFAIL (bash_service.py:276 sends SIGKILL before SIGTERM...)
1 xfailed, 5 warnings in 2.54sThe test confirms expected failure (XFAIL).
Test 2: Verify the fix works on PR branch
Step 3 — Apply the PR's changes:
Checked out PR branch:
$ git checkout 3120-agent-server-bash-cleanup-traps-dont-run-because-sigkill-precedes-sigtermInspected the fixed code in bash_service.py (lines 212-226, 295-309):
Key change 1 — New session for process group signaling:
process = await asyncio.create_subprocess_shell(
command.command,
...
start_new_session=True, # NEW: spawn in own session
)Key change 2 — Helper method for process group signals:
def _signal_process_group(
self,
process: asyncio.subprocess.Process,
sig: signal.Signals,
command: str,
) -> None:
try:
os.killpg(os.getpgid(process.pid), sig) # Signal whole group
except ProcessLookupError:
passKey change 3 — SIGTERM before SIGKILL with grace period:
except TimeoutError:
# Send SIGTERM to the whole process group so user-installed
# cleanup traps can run, then escalate to SIGKILL if needed.
self._signal_process_group(process, signal.SIGTERM, command.command)
try:
await asyncio.wait_for(process.wait(), timeout=1.0)
except TimeoutError:
self._signal_process_group(process, signal.SIGKILL, command.command)This is exactly the fix described in the PR: SIGTERM first, 1s grace period, then SIGKILL.
Step 4 — Verify with unit test:
Ran the previously-failing test:
$ uv run pytest tests/agent_server/test_bash_service.py::test_bash_timeout_runs_sigterm_trap -v
...
tests/agent_server/test_bash_service.py::test_bash_timeout_runs_sigterm_trap PASSED [100%]
1 passed, 5 warnings in 1.37s✅ The test now PASSES (xfail decorator removed, trap executes successfully).
Step 5 — Verify with functional demonstration:
Created and ran a standalone script that:
- Creates a bash command with
trap 'touch /tmp/marker; exit 0' TERM - Sleeps for 30 seconds with a 1-second timeout
- Verifies the marker file is created (proving the trap ran)
$ uv run python /tmp/test_trap_on_timeout.py
Marker file path: /tmp/tmp6c1n35ef/cleanup_ran
Executing command: trap 'touch /tmp/tmp6c1n35ef/cleanup_ran; exit 0' TERM; sleep 30
Timeout: 1s
Waiting for command to timeout and TERM trap to run...
✅ SUCCESS: TERM trap executed successfully!
The marker file cleanup_ran was created by the trap handler.
This proves that SIGTERM was sent to the process group,
allowing the shell's trap to run before the process was killed.✅ Real-world execution confirms the fix works.
Test 3: Verify no regressions in stress tests
Ran the stress tests mentioned in the PR description:
Test 3a — Clean process kill:
$ uv run pytest tests/agent_server/stress/test_long_running_command.py::test_bash_timeout_kills_process_cleanly -v -m stress
...
test_bash_timeout_kills_process_cleanly PASSED [100%]
1 passed, 5 warnings in 1.33sTest 3b — Event loop not blocked:
$ uv run pytest tests/agent_server/stress/test_long_running_command.py::test_long_running_bash_does_not_block_event_loop -v -m stress
...
test_long_running_bash_does_not_block_event_loop PASSED [100%]
1 passed, 5 warnings in 5.32s✅ Both stress tests pass. No regressions in process cleanup or async behavior.
Issues Found
None.
csmith49
left a comment
There was a problem hiding this comment.
Not sure I've ever seen the review bot do a "LGTM" before!
Summary
When a bash command exceeds its timeout, the agent-server sent SIGKILL before SIGTERM, so user-installed traps (trap 'cleanup' EXIT, etc.) never ran — leaving stale PID files, temp dirs, and connections behind.
Just swapping the order isn't enough: asyncio.create_subprocess_shell runs the command under a shell, and process.terminate() only signals the shell. While the shell is blocked waiting on a foreground child (e.g. sleep), it cannot
dispatch its trap until the child exits. Signaling only the shell leaves the trap pending forever, then SIGKILL on the shell still bypasses it.
Fix: spawn the command in its own session (start_new_session=True) and signal the whole process group via os.killpg — SIGTERM first, escalating to SIGKILL after a 1s grace period. The shell's children die, the shell unblocks, and
the trap runs. This matches the pattern already used in openhands-sdk/.../hooks/executor.py and openhands-tools/.../subprocess_terminal.py.
Issue Number
Closes #3120.
How to Test
Type
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:4679f60-pythonRun
All tags pushed for this build
About Multi-Architecture Support
4679f60-python) is a multi-arch manifest supporting both amd64 and arm644679f60-python-amd64) are also available if needed