fix(voice/#662): guard response.create at all unguarded call sites (JS + PY)#669
fix(voice/#662): guard response.create at all unguarded call sites (JS + PY)#669drewdrewthis wants to merge 16 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a Changesresponse.create double-fire guard — JS and Python adapters + tests
Sequence Diagram(s)sequenceDiagram
participant Caller
participant Adapter as OpenAIRealtimeAgentAdapter
participant OpenAI as OpenAI Realtime API
Caller->>Adapter: receiveAudio() or sendText()
alt _responseActive = true
Adapter->>Adapter: _deferredResponseCreate = true (suppress)
else _responseActive = false
Adapter->>OpenAI: response.create
OpenAI-->>Adapter: response.created → _responseActive = true
end
OpenAI-->>Adapter: response.done / response.cancelled
Adapter->>Adapter: _responseActive = false
alt _deferredResponseCreate = true
Adapter->>OpenAI: response.create (deferred)
OpenAI-->>Adapter: response.created → _responseActive = true
end
Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
8a197d5 to
9790b71
Compare
Review verdict: READYReviewed at: No code-level blockers. The three CodeRabbit Major flags are triaged and resolved; the scoped re-review (reset-on-reconnect fix + test-timing change) surfaced no must-fix defects.
CodeRabbit Major flags — disposition
Non-blocking (Decide)
Follow-ups (out of scope for this PR)
Verdict is prose, not a GitHub approval. Drew merges; Rogerio reviews next. |
945255e to
b1b2ead
Compare
7d30b81 to
9538bba
Compare
|
**[proof-reviewer] Stale prove-it marker — SHA `945255e9` is not an ancestor of HEAD**
The Action: Fix — re-run |
|
No description provided. |
7768c7c to
661c7f5
Compare
|
Reopening to re-trigger CI workflows on fresh SHA. |
661c7f5 to
2d1bc62
Compare
time.monotonic() was stored in each log tuple but all three consumers (sent_types, ordered_log, first_log_idx) unpacked it as _ — dead data. Remove the timestamp field, drop import time. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… tests
Drop the unreachable `self._agent_turn_pending = False` at py:429 (line 420
already clears it unconditionally before the branch is entered).
Add optional `wsFactory` to `OpenAIRealtimeAgentAdapterInit` (mirrors the
ElevenLabs `webSocketFactory` pattern) so tests can inject a fake WS that
the adapter drives via its real `.on("message", ...)` listener path.
Rewrite `openai-realtime-response-guard.test.ts` to use the factory +
`FakeWS extends EventEmitter` instead of casting to the private
`_enqueueEvent` method. All 4 guard tests pass; existing Python tests
unaffected.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…statement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8441aab to
47cc632
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
python/scenario/voice/adapters/openai_realtime.py (1)
345-357:⚠️ Potential issue | 🟠 Major | ⚡ Quick winClear deferred/active response flags on disconnect.
With the new
send_textdeferral path (Lines 718-723),_deferred_response_createcan be set and persist.disconnect()(Lines 345-357) only closes_ws; it does not reset_response_active/_deferred_response_create, so adapter reuse may carry stale lifecycle state into the next connection.🔧 Suggested teardown reset
async def disconnect(self) -> None: """Close the WebSocket if open.""" if self._ws is not None: try: await self._ws.close() except Exception: # Best-effort: connection may already be half-closed or in an # error state when disconnect() is called. We're tearing down # regardless — propagating here would just leak the WS reference. pass finally: self._ws = None + self._response_active = False + self._deferred_response_create = False logger.debug("OpenAIRealtimeAgentAdapter: disconnected")Also applies to: 718-723
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/scenario/voice/adapters/openai_realtime.py` around lines 345 - 357, The disconnect() method closes the WebSocket but does not reset the _response_active and _deferred_response_create flags, causing stale lifecycle state to persist when the adapter is reused. After closing the WebSocket in the finally block of the disconnect() method, reset both _response_active and _deferred_response_create to their initial values (likely False and None respectively) to ensure a clean state for subsequent connections.
🧹 Nitpick comments (1)
javascript/src/agents/realtime/realtime-event-handler.ts (1)
180-194: ⚡ Quick winMove
isResponseActive()into the public-method section.
isResponseActive()(Lines 188-194) is public but currently sits below privatereset()(Lines 180-186). Keep public methods grouped above private helpers for consistent class structure.♻️ Suggested reorder
- /** - * Resets the internal state for the next response - */ - private reset(): void { - this._active = false; - this.responseResolver = null; - this.errorRejecter = null; - this.currentResponse = ""; - this.currentAudioChunks = []; - } - /** * Returns true while a response is in flight (between response.created * and response.done). Used by RealtimeAgentAdapter to guard response.create. */ isResponseActive(): boolean { return this._active; } + + /** + * Resets the internal state for the next response + */ + private reset(): void { + this._active = false; + this.responseResolver = null; + this.errorRejecter = null; + this.currentResponse = ""; + this.currentAudioChunks = []; + }As per coding guidelines,
**/*.ts: "In TypeScript classes, place public methods first, private methods at the bottom, and group related methods together".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@javascript/src/agents/realtime/realtime-event-handler.ts` around lines 180 - 194, The public method isResponseActive() is currently positioned after the private method reset(), which violates the class organization guideline requiring public methods to be grouped before private methods. Move the isResponseActive() method (which checks if _active is true and includes its JSDoc comment) above the reset() method so that all public methods appear before private helper methods in the class structure.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@javascript/src/agents/__tests__/realtime-response-guard.test.ts`:
- Around line 116-117: The test uses fixed 20ms setTimeout delays at lines 116
and 136 to allow async operations to complete, which creates timing-dependent
flakiness. Replace these arbitrary sleep delays with deterministic
synchronization mechanisms such as awaiting actual promises from the code under
test, using jest.runAllTimers() if using fake timers, or checking that mock
functions have been called with expected arguments before proceeding with
assertions instead of relying on wall-clock delays.
In `@javascript/src/agents/realtime/realtime-agent.adapter.ts`:
- Around line 188-193: The guard condition checking `isResponseActive()`
suppresses the `response.create` event, but this can drop new turns when the
in-flight response completes. Instead of suppressing the send, implement
deferred send behavior by storing the pending `response.create` event (similar
to a `_deferredResponseCreate` pattern used in voice adapters) and firing it
when the current response completes via `response.done` or `response.cancelled`
events. Apply this deferred pattern to both occurrences of the
`isResponseActive()` guard at the specified locations to ensure suppressed
creates are replayed rather than dropped.
In
`@javascript/src/voice/adapters/__tests__/openai-realtime-response-guard.test.ts`:
- Around line 148-149: Replace the hard-coded setTimeout delays at three
locations in the openai-realtime-response-guard.test.ts file (where Promise
timeouts of 20ms are used) with deterministic synchronization mechanisms.
Instead of relying on arbitrary delays, use test utilities such as
jest.useFakeTimers() and jest.advanceTimersByTime() to control time
deterministically, or implement event-based synchronization where the test waits
for actual completion signals from the code under test rather than hoping a
fixed delay is sufficient. This will make the tests reproducible and eliminate
race conditions that can cause flaky test failures.
In `@javascript/src/voice/adapters/openai-realtime.ts`:
- Around line 180-189: The session-lifecycle state flags `_responseActive` and
`_deferredResponseCreate` are not being reset during teardown/disconnect,
causing stale state to persist when the adapter instance is reused for a new
connection. Add logic in the teardown or disconnect method to reset both
`_responseActive` and `_deferredResponseCreate` back to their initial false
state to ensure a clean slate for subsequent connections.
In `@python/tests/voice/demo_response_create_guard.py`:
- Around line 233-245: The return statement at the end of the function is
missing the item_ok validation check. The variable item_ok is calculated to
verify that "conversation.item.create" was sent, but it is not included in the
return statement alongside guard_ok and deferred_ok. Update the return statement
to include all three conditions (guard_ok and deferred_ok and item_ok) so that
the function properly validates all required behaviors before reporting success.
---
Outside diff comments:
In `@python/scenario/voice/adapters/openai_realtime.py`:
- Around line 345-357: The disconnect() method closes the WebSocket but does not
reset the _response_active and _deferred_response_create flags, causing stale
lifecycle state to persist when the adapter is reused. After closing the
WebSocket in the finally block of the disconnect() method, reset both
_response_active and _deferred_response_create to their initial values (likely
False and None respectively) to ensure a clean state for subsequent connections.
---
Nitpick comments:
In `@javascript/src/agents/realtime/realtime-event-handler.ts`:
- Around line 180-194: The public method isResponseActive() is currently
positioned after the private method reset(), which violates the class
organization guideline requiring public methods to be grouped before private
methods. Move the isResponseActive() method (which checks if _active is true and
includes its JSDoc comment) above the reset() method so that all public methods
appear before private helper methods in the class structure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 98a7449d-261f-47ab-b347-066c42764439
📒 Files selected for processing (8)
javascript/src/agents/__tests__/realtime-response-guard.test.tsjavascript/src/agents/realtime/realtime-agent.adapter.tsjavascript/src/agents/realtime/realtime-event-handler.tsjavascript/src/voice/adapters/__tests__/openai-realtime-response-guard.test.tsjavascript/src/voice/adapters/openai-realtime.tspython/scenario/voice/adapters/openai_realtime.pypython/tests/voice/demo_response_create_guard.pypython/tests/voice/test_realtime_response_create_guard.py
…_ok, setTimeout too short - demo_response_create_guard.py Scenario C: `return guard_ok and deferred_ok` dropped `item_ok`, so the demo could print FAIL for item check then return True (false proof). Fixed to `return guard_ok and deferred_ok and item_ok`. - realtime-response-guard.test.ts AC-JS4/JS5: widen 20ms flush wait to 100ms and add comment explaining it is a preamble flush, not a timed window. Prevents flake on loaded CI runners. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…k flush The prior commit widened the pre-assert sleep to 100ms, which dead-heated the adapter's responseTimeout:100 in makeAdapter — on CI the adapter's internal waitForResponse timeout fired before the test's 100ms wait elapsed and fired response.done, throwing "Agent response timeout after 100ms" and failing ci-checks (24.x). The send decision in handleInitialResponse/handleAudioInput is synchronous, before the method's first await (waitForResponse). So a microtask flush (await Promise.resolve()) followed by a prompt response.done is deterministic and settles well inside the timeout — no wall-clock race. Test runs in 7ms. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…econnect CodeRabbit flagged that `_responseActive`/`_deferredResponseCreate` (the #662 guard flags) were never cleared in teardown, so a reused adapter instance could carry stale deferred state into its next connection — suppressing the first turn after a reconnect. Fix: - disconnect(): clear both guard flags on teardown. - connect(): clear both guard flags AND `_closeReason`. The latter is required — `_nextEvent` rejects immediately when `_closeReason` is set, so resetting only the guard flags would leave a reconnected adapter unable to receive (cosmetic half-fix). Test: new "resets response-lifecycle state on disconnect and reconnects clean" asserts the flags clear on disconnect and that a reconnected instance reaches its own receiveAudio timeout (proving the stale close reason was cleared) rather than rejecting instantly. Full voice-adapter suite: 33/33 green. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… macrotask yields CodeRabbit flagged the hard-coded 20/5/10ms delays in the receiveAudio guard tests as brittle. They are not race-prone — receiveAudio's preamble and _nextEvent setup are synchronous and server events flow through an ordered queue, so each delay is a macrotask boundary that drains the loop's microtask chain regardless of magnitude or runner load. Reduce all three to setTimeout 0 to drop the arbitrary magnitudes while keeping the robust macrotask-drain semantics. (Not queueMicrotask, per CodeRabbit's suggestion: a single microtask tick can under-drain a multi-tick continuation; a zero-delay macrotask drains the whole queue.) Deterministic across repeated runs: 5/5 x3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
🐕 PR Hound review brief — assigned reviewer @rogeriochaves Review mode: Deep Review Inspection targets (max 3, each names a file or behavior):
Author questions (optional, max 3):
|
|
Automated low-risk assessment This PR was evaluated against the repository's Low-Risk Pull Requests procedure and does not qualify as low risk.
This PR requires a manual review before merging. |
Why
The OpenAI Realtime API rejects (or silently drops) a `response.create` sent while a response is already streaming — `{"type":"error","message":"Conversation already has an active response in progress"}`. PR #659 fixed this race for the Python `recv_audio` user-audio branch; three JS call sites and one Python call site were left unguarded. Closes #662.
What changed
Test plan
All tests are hermetic mock-transport / mock-WS — no live OpenAI key required.
Python:
```
cd python && pytest tests/voice/test_realtime_response_create_guard.py -v
```
Covers AC-PY1 (deferred when active), AC-PY2 (fires after `response.done`), AC-DEFER (consumed exactly once), and AC-REG1 (normal path unchanged).
JavaScript:
```
cd javascript && npm test -- --testPathPattern="openai-realtime.test|realtime-adapter|realtime-event-handler"
```
Covers AC-JS1–6: `receiveAudio` guard (deferred create, frame ordering), `sendText` guard, `handleInitialResponse`+`handleAudioInput` guard, `isResponseActive()` lifecycle transitions. AC-REG3: existing frame-ordering tests in `realtime-adapter-frames.test.ts` still pass.
Human verification
For a skeptical reviewer who wants to confirm this independently — no OpenAI API key required:
git fetch origin && git checkout fix/issue662-realtime-js-response-create.cd python && pytest tests/voice/test_realtime_response_create_guard.py -v. Confirm all guard cases pass (AC1–AC7 plus thesend_textcasestest_ac_py1/_py2/_defer).cd javascript && npm test -- realtime-response-guard. ConfirmAC-JS4/JS5(RealtimeAgentAdapterguard),AC-JS6a–d(isResponseActive()lifecycle), and theOpenAIRealtimeAgentAdapter — response.create guard (#662)cases pass.python/scenario/voice/adapters/openai_realtime.py, delete theif self._response_active: self._deferred_response_create = True; returnguard insidesend_text(soresponse.createfires unconditionally), re-run step 2, and confirmtest_ac_py1_send_text_suppressed_while_response_activenow FAILS. Restore the guard → it passes again.cd python && python3 tests/voice/demo_response_create_guard.pyagainst a real OpenAI Realtime endpoint and confirm noConversation already has an active response in progresserror appears — matching the captured run below.How I can prove I was successful
Scope: backend-only, no UI surface. Every file in this PR is voice-loop guard logic or its tests — the JS realtime adapter/event-handler (
javascript/src/agents/realtime/,javascript/src/voice/adapters/openai-realtime.ts) and the Python realtime adapter (python/scenario/voice/adapters/openai_realtime.py). The PR's Files changed tab is entirely.ts/.py(8 files, no.tsx/.jsx/.vue/CSS/HTML), so there is no frontend surface to screenshot. Proof is the falsifiability tests above plus the captured live-API wire log below.Guard demonstration — hermetic stub + captured live-API run:
Script: `python/tests/voice/demo_response_create_guard.py` (run via `cd python && python3 tests/voice/demo_response_create_guard.py`)
The committed demo exercises the production
OpenAIRealtimeAgentAdaptercode through aLoggingWSstub that replays scripted server events and captures every send/recv. No OpenAI API key required. The stub makes the guard observable without a network connection.The captured output below is from a real OpenAI Realtime API run (
gpt-realtime-mini, 2026-06-15) using a live-API development harness (not committed to the repo). It shows the same three-phase sequence against an actual WebSocket connection, confirming the guard works end-to-end.Three phases exercise the previously-unguarded `send_text()` call site:
Captured live run output (2026-06-15, gpt-realtime-mini):
```
response.create guard — LIVE API DEMO
PR #669 langwatch/scenario
Real OpenAI Realtime WebSocket — gpt-realtime-mini
[t=0.001s] Connecting to wss://api.openai.com/v1/realtime?model=gpt-realtime-mini ...
[t=0.547s] Connected.
[t=0.547s] → SEND session.update
[t=0.555s] ← RECV session.created
────────────────────────────────────────────────────────────────
PHASE 1: Agent-initiated response (real audio exchange)
────────────────────────────────────────────────────────────────
[t=0.555s] Calling recv_audio(timeout=30) — waiting for first audio chunk ...
[t=0.555s] → SEND response.create
[t=0.681s] ← RECV session.updated
[t=0.689s] ← RECV response.created
[t=0.973s] ← RECV response.output_item.added
[t=0.973s] ← RECV conversation.item.added
[t=1.046s] ← RECV response.content_part.added
[t=1.046s] ← RECV response.output_audio_transcript.delta
[t=1.061s] ← RECV response.output_audio_transcript.delta
[t=1.212s] ← RECV response.output_audio_transcript.delta
[t=1.219s] ← RECV response.output_audio_transcript.delta
[t=1.464s] ← RECV response.output_audio.delta (19200 bytes)
[t=1.464s] *** First audio chunk received (19200 bytes) ***
_response_active = True
────────────────────────────────────────────────────────────────
PHASE 2: send_text() while _response_active=True
────────────────────────────────────────────────────────────────
[t=1.464s] _response_active before send_text = True
[t=1.464s] _deferred_response_create before = False
[t=1.464s] GUARD CONDITION MET — calling send_text() ...
[t=1.464s] → SEND conversation.item.create
[t=1.464s] _deferred_response_create after = True
[t=1.464s] response.create count after send_text: 1 (was 1)
────────────────────────────────────────────────────────────────
PHASE 3: Drain — watching for response.done + deferred create
────────────────────────────────────────────────────────────────
[t=1.464s] ← RECV response.output_audio.delta (19200 bytes)
... [23 more audio deltas from original response] ...
[t=4.627s] ← RECV response.done
[t=4.627s] → SEND response.create ← DEFERRED create fires after response.done
[t=4.641s] ← RECV rate_limits.updated
[t=4.765s] ← RECV response.created
... [13 more audio deltas from deferred response] ...
[t=6.464s] ← RECV response.done
────────────────────────────────────────────────────────────────
WIRE LOG ASSESSMENT
────────────────────────────────────────────────────────────────
Guard suppressed response.create during send_text: YES ✓
_deferred_response_create=True after send_text(): YES ✓
conversation.item.create was sent: YES ✓
Deferred response.create fired after response.done: YES ✓
response.create count before first response.done: 1 (expected 1) ✓
(log idx: response.done=100, deferred create=101)
ALL CHECKS PASS — guard is operating correctly on real API.
```
The key sequence from the full wire log:
Anything surprising?
`response.cancelled` clears `_active` / `_responseActive` — on `response.cancelled` the server is done streaming, so the flag must clear and any deferred create must fire. Both JS paths clear on cancel as well as done. This isn't obvious from the issue spec (which only mentions `response.done`) but is required for correctness if the user interrupts.
`_agent_turn_pending` must be cleared before the deferred branch — the original fix only cleared it in the `else` path, leaving it set when deferring. A subsequent `recv_audio` call would then fire a spurious bare `response.create` via the `elif _agent_turn_pending` guard. Fixed in the review-findings commit.
JS `_responseActive` must be set immediately after firing the deferred create — between the deferred send and the server's `response.created` ACK, `_responseActive` was `false`, leaving a race window where `sendText` could fire a second `response.create`. Fixed by mirroring Python's pattern of setting the flag immediately.
Verification Report
send_textdefersresponse.createwhen_response_active=True; sets_deferred_response_create=Truetest_ac_py1_send_text_suppressed_while_response_activepasses;_deferred_response_createfield atopenai_realtime.py:153send_textcreate fires afterresponse.done(ordering:log_create > log_done)test_ac_py2_send_text_deferred_create_fires_after_response_donepasses with interleaved-log ordering checkrecv_audiotest_ac_defer_send_text_deferred_create_fires_exactly_oncepasses;count_of("response.create") == 1receiveAudiopreamble: commit sent,response.createNOT sent when_responseActive=TrueAC-JS1vitest test passes:sentCount("input_audio_buffer.commit")==1,sentCount("response.create")==0receiveAudioresponse.createfires afterresponse.doneAC-JS2vitest test passes: ordered-frame assertion via FakeWS EventEmitter recv queuesendTextdoes not sendresponse.createwhen_responseActive=TrueAC-JS3vitest test passes:sentCount("response.create")==0handleInitialResponsedoes not sendresponse.createwhenisResponseActive()=trueAC-JS4vitest test passes; guard atrealtime-agent.adapter.ts:188handleAudioInputdoes not sendresponse.createwhenisResponseActive()=trueAC-JS5vitest test passes; guard atrealtime-agent.adapter.ts:232isResponseActive()transitions:false→response.created→true→response.done→falserealtime-event-handler.ts:192response.createimmediatelytest_ac5_normal_path_commit_then_createPASS; JS: normal-path branches in all guard tests PASStest_realtime_response_create_guard.pystill passpytest tests/voice/test_realtime_response_create_guard.py→ 10 passed, 0 failedrealtime-adapter-frames.test.tsstill passvitest run realtime-adapter-frames.test.ts→ 2 passed"active response in progress"error surfaces as rejectedError/RuntimeErrortest_ac6_server_rejection_raises_runtime_errorPASS; JS:AC-ERR1vitest test PASSVerdict: 13/13 verified — ALL PASS. Demo script (
demo_response_create_guard.py) exercises production adapter code hermetically; live-API run output above confirms end-to-end behavior against real OpenAI Realtime WebSocket.