fix(voice/#662): guard response.create at all unguarded call sites (JS + PY) by drewdrewthis · Pull Request #669 · langwatch/scenario

drewdrewthis · 2026-06-15T00:44:49Z

Why

The OpenAI Realtime API rejects (or silently drops) a `response.create` sent while a response is already streaming — `{"type":"error","message":"Conversation already has an active response in progress"}`. PR #659 fixed this race for the Python `recv_audio` user-audio branch; three JS call sites and one Python call site were left unguarded. Closes #662.

What changed

`RealtimeEventHandler` gained `_active` / `isResponseActive()` — set on `response.created`, cleared on `response.done`. This is the authoritative lifecycle flag for the agent-role JS path; no parallel listener needed.
`openai-realtime.ts` gained `_responseActive` + `_deferredResponseCreate` — `receiveAudio` guards the `response.create` preamble (commit always, create only when not active, else defer); `sendText` guards identically. Deferred create fires on `response.done`/`response.cancelled` inside the existing tool-drain branch. `_responseActive` is set immediately after firing the deferred create (closes race window before `response.created` arrives — mirrors Python).
`RealtimeAgentAdapter.handleInitialResponse` + `handleAudioInput` guard via `isResponseActive()` — both unconditional `transport.sendEvent({ type: "response.create" })` calls now short-circuit when the handler reports active.
Python `openai_realtime.send_text` guards via `_response_active` / `_deferred_response_create` (fields introduced by fix(voice): guard response.create on active response in recv_audio #659, already on this branch) — matches the `recv_audio` pattern exactly.
Python `recv_audio`: `_agent_turn_pending` cleared before the if/else — was only cleared in the `else` branch, leaving it set in the deferred path and allowing a spurious `response.create` on the next `recv_audio` call.

Test plan

All tests are hermetic mock-transport / mock-WS — no live OpenAI key required.

Python:
```
cd python && pytest tests/voice/test_realtime_response_create_guard.py -v
```
Covers AC-PY1 (deferred when active), AC-PY2 (fires after `response.done`), AC-DEFER (consumed exactly once), and AC-REG1 (normal path unchanged).

JavaScript:
```
cd javascript && npm test -- --testPathPattern="openai-realtime.test|realtime-adapter|realtime-event-handler"
```
Covers AC-JS1–6: `receiveAudio` guard (deferred create, frame ordering), `sendText` guard, `handleInitialResponse`+`handleAudioInput` guard, `isResponseActive()` lifecycle transitions. AC-REG3: existing frame-ordering tests in `realtime-adapter-frames.test.ts` still pass.

Human verification

For a skeptical reviewer who wants to confirm this independently — no OpenAI API key required:

Check out the branch — git fetch origin && git checkout fix/issue662-realtime-js-response-create.
Run the Python guard suite — cd python && pytest tests/voice/test_realtime_response_create_guard.py -v. Confirm all guard cases pass (AC1–AC7 plus the send_text cases test_ac_py1/_py2/_defer).
Run the JS guard suites — cd javascript && npm test -- realtime-response-guard. Confirm AC-JS4/JS5 (RealtimeAgentAdapter guard), AC-JS6a–d (isResponseActive() lifecycle), and the OpenAIRealtimeAgentAdapter — response.create guard (#662) cases pass.
Falsifiability — in python/scenario/voice/adapters/openai_realtime.py, delete the if self._response_active: self._deferred_response_create = True; return guard inside send_text (so response.create fires unconditionally), re-run step 2, and confirm test_ac_py1_send_text_suppressed_while_response_active now FAILS. Restore the guard → it passes again.
(Optional, live key) Run cd python && python3 tests/voice/demo_response_create_guard.py against a real OpenAI Realtime endpoint and confirm no Conversation already has an active response in progress error appears — matching the captured run below.

How I can prove I was successful

Scope: backend-only, no UI surface. Every file in this PR is voice-loop guard logic or its tests — the JS realtime adapter/event-handler (javascript/src/agents/realtime/, javascript/src/voice/adapters/openai-realtime.ts) and the Python realtime adapter (python/scenario/voice/adapters/openai_realtime.py). The PR's Files changed tab is entirely .ts/.py (8 files, no .tsx/.jsx/.vue/CSS/HTML), so there is no frontend surface to screenshot. Proof is the falsifiability tests above plus the captured live-API wire log below.

Guard demonstration — hermetic stub + captured live-API run:

Script: `python/tests/voice/demo_response_create_guard.py` (run via `cd python && python3 tests/voice/demo_response_create_guard.py`)

The committed demo exercises the production OpenAIRealtimeAgentAdapter code through a LoggingWS stub that replays scripted server events and captures every send/recv. No OpenAI API key required. The stub makes the guard observable without a network connection.

The captured output below is from a real OpenAI Realtime API run (gpt-realtime-mini, 2026-06-15) using a live-API development harness (not committed to the repo). It shows the same three-phase sequence against an actual WebSocket connection, confirming the guard works end-to-end.

Three phases exercise the previously-unguarded `send_text()` call site:

Phase 1 — start a real agent response (`response.create` → receive real audio deltas)
Phase 2 — call `send_text()` while `_response_active=True` — the guard fires: `conversation.item.create` is sent, NO duplicate `response.create`
Phase 3 — drain until `response.done` → deferred `response.create` fires automatically, second response completes

Captured live run output (2026-06-15, gpt-realtime-mini):

```

response.create guard — LIVE API DEMO
PR #669 langwatch/scenario
Real OpenAI Realtime WebSocket — gpt-realtime-mini

[t=0.001s] Connecting to wss://api.openai.com/v1/realtime?model=gpt-realtime-mini ...
[t=0.547s] Connected.

[t=0.547s] → SEND session.update
[t=0.555s] ← RECV session.created

────────────────────────────────────────────────────────────────
PHASE 1: Agent-initiated response (real audio exchange)
────────────────────────────────────────────────────────────────

[t=0.555s] Calling recv_audio(timeout=30) — waiting for first audio chunk ...
[t=0.555s] → SEND response.create
[t=0.681s] ← RECV session.updated
[t=0.689s] ← RECV response.created
[t=0.973s] ← RECV response.output_item.added
[t=0.973s] ← RECV conversation.item.added
[t=1.046s] ← RECV response.content_part.added
[t=1.046s] ← RECV response.output_audio_transcript.delta
[t=1.061s] ← RECV response.output_audio_transcript.delta
[t=1.212s] ← RECV response.output_audio_transcript.delta
[t=1.219s] ← RECV response.output_audio_transcript.delta
[t=1.464s] ← RECV response.output_audio.delta (19200 bytes)
[t=1.464s] *** First audio chunk received (19200 bytes) ***
_response_active = True

────────────────────────────────────────────────────────────────
PHASE 2: send_text() while _response_active=True
────────────────────────────────────────────────────────────────

[t=1.464s] _response_active before send_text = True
[t=1.464s] _deferred_response_create before = False
[t=1.464s] GUARD CONDITION MET — calling send_text() ...
[t=1.464s] → SEND conversation.item.create
[t=1.464s] _deferred_response_create after = True
[t=1.464s] response.create count after send_text: 1 (was 1)

────────────────────────────────────────────────────────────────
PHASE 3: Drain — watching for response.done + deferred create
────────────────────────────────────────────────────────────────

[t=1.464s] ← RECV response.output_audio.delta (19200 bytes)
... [23 more audio deltas from original response] ...
[t=4.627s] ← RECV response.done
[t=4.627s] → SEND response.create ← DEFERRED create fires after response.done
[t=4.641s] ← RECV rate_limits.updated
[t=4.765s] ← RECV response.created
... [13 more audio deltas from deferred response] ...
[t=6.464s] ← RECV response.done

────────────────────────────────────────────────────────────────
WIRE LOG ASSESSMENT
────────────────────────────────────────────────────────────────

Guard suppressed response.create during send_text: YES ✓
_deferred_response_create=True after send_text(): YES ✓
conversation.item.create was sent: YES ✓
Deferred response.create fired after response.done: YES ✓
response.create count before first response.done: 1 (expected 1) ✓
(log idx: response.done=100, deferred create=101)

ALL CHECKS PASS — guard is operating correctly on real API.
```

The key sequence from the full wire log:

t=1.464s: `SEND conversation.item.create` — NO second `response.create` (GUARD SUPPRESSED IT)
t=4.627s: `RECV response.done` — deferred create fires immediately
t=4.627s: `SEND response.create` — fires strictly AFTER `response.done` (log idx 101 > 100)

Anything surprising?

`response.cancelled` clears `_active` / `_responseActive` — on `response.cancelled` the server is done streaming, so the flag must clear and any deferred create must fire. Both JS paths clear on cancel as well as done. This isn't obvious from the issue spec (which only mentions `response.done`) but is required for correctness if the user interrupts.

`_agent_turn_pending` must be cleared before the deferred branch — the original fix only cleared it in the `else` path, leaving it set when deferring. A subsequent `recv_audio` call would then fire a spurious bare `response.create` via the `elif _agent_turn_pending` guard. Fixed in the review-findings commit.

JS `_responseActive` must be set immediately after firing the deferred create — between the deferred send and the server's `response.created` ACK, `_responseActive` was `false`, leaving a race window where `sendText` could fire a second `response.create`. Fixed by mirroring Python's pattern of setting the flag immediately.

Verification Report

#	Criterion	Evidence	Status
AC-PY1	`send_text` defers `response.create` when `_response_active=True`; sets `_deferred_response_create=True`	`test_ac_py1_send_text_suppressed_while_response_active` passes; `_deferred_response_create` field at `openai_realtime.py:153`	✅ PASS
AC-PY2	Deferred `send_text` create fires after `response.done` (ordering: `log_create > log_done`)	`test_ac_py2_send_text_deferred_create_fires_after_response_done` passes with interleaved-log ordering check	✅ PASS
AC-DEFER	Deferred create consumed exactly once by next `recv_audio`	`test_ac_defer_send_text_deferred_create_fires_exactly_once` passes; `count_of("response.create") == 1`	✅ PASS
AC-JS1	`receiveAudio` preamble: commit sent, `response.create` NOT sent when `_responseActive=True`	`AC-JS1` vitest test passes: `sentCount("input_audio_buffer.commit")==1`, `sentCount("response.create")==0`	✅ PASS
AC-JS2	Deferred `receiveAudio` `response.create` fires after `response.done`	`AC-JS2` vitest test passes: ordered-frame assertion via FakeWS EventEmitter recv queue	✅ PASS
AC-JS3	`sendText` does not send `response.create` when `_responseActive=True`	`AC-JS3` vitest test passes: `sentCount("response.create")==0`	✅ PASS
AC-JS4	`handleInitialResponse` does not send `response.create` when `isResponseActive()=true`	`AC-JS4` vitest test passes; guard at `realtime-agent.adapter.ts:188`	✅ PASS
AC-JS5	`handleAudioInput` does not send `response.create` when `isResponseActive()=true`	`AC-JS5` vitest test passes; guard at `realtime-agent.adapter.ts:232`	✅ PASS
AC-JS6	`isResponseActive()` transitions: `false` → `response.created` → `true` → `response.done` → `false`	AC-JS6a/b/c/d vitest tests pass; `realtime-event-handler.ts:192`	✅ PASS
AC-REG1	Normal (uncontested) paths still fire `response.create` immediately	Python: `test_ac5_normal_path_commit_then_create` PASS; JS: normal-path branches in all guard tests PASS	✅ PASS
AC-REG2	Pre-existing AC1–AC7 from `test_realtime_response_create_guard.py` still pass	`pytest tests/voice/test_realtime_response_create_guard.py` → 10 passed, 0 failed	✅ PASS
AC-REG3	Existing JS frame-ordering tests in `realtime-adapter-frames.test.ts` still pass	`vitest run realtime-adapter-frames.test.ts` → 2 passed	✅ PASS
AC-ERR1	Server `"active response in progress"` error surfaces as rejected `Error` / `RuntimeError`	Python: `test_ac6_server_rejection_raises_runtime_error` PASS; JS: `AC-ERR1` vitest test PASS	✅ PASS

Verdict: 13/13 verified — ALL PASS. Demo script (demo_response_create_guard.py) exercises production adapter code hermetically; live-API run output above confirms end-to-end behavior against real OpenAI Realtime WebSocket.

coderabbitai · 2026-06-15T00:44:56Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds a response.create double-fire guard across JS (RealtimeEventHandler, RealtimeAgentAdapter, OpenAIRealtimeAgentAdapter) and Python (OpenAIRealtimeAgentAdapter) realtime adapters. A new _active/_responseActive flag tracks the in-flight response lifecycle; response.create is deferred instead of sent immediately when a response is already active, then flushed on response.done/response.cancelled. Hermetic regression tests cover all four guarded call sites.

Changes

response.create double-fire guard — JS and Python adapters + tests

Layer / File(s)	Summary
JS RealtimeEventHandler: `_active` flag and `isResponseActive()` `javascript/src/agents/realtime/realtime-event-handler.ts`	Adds private `_active` boolean, wires it to `response.created`/`response.cancelled`/`response.done` events, clears it in `reset()`, and exposes it via new public `isResponseActive()`.
JS RealtimeAgentAdapter: guard `handleInitialResponse` and `handleAudioInput` `javascript/src/agents/realtime/realtime-agent.adapter.ts`	Both `handleInitialResponse` and `handleAudioInput` now check `isResponseActive()` before emitting `response.create`.
JS RealtimeEventHandler and RealtimeAgentAdapter regression tests `javascript/src/agents/__tests__/realtime-response-guard.test.ts`	`FakeTransport`/`makeFakeSession` helpers plus two test suites: `isResponseActive()` lifecycle transitions and `response.create` suppression in `handleInitialResponse`/`handleAudioInput`.
JS OpenAIRealtimeAgentAdapter: wsFactory injection, lifecycle flags, deferred `response.create` `javascript/src/voice/adapters/openai-realtime.ts`	Adds optional `wsFactory` to `OpenAIRealtimeAgentAdapterInit` for test injection; adds `_responseActive` and `_deferredResponseCreate` private fields; guards `receiveAudio` and `sendText` to defer `response.create` when active; fires deferred create on `response.done`/`response.cancelled`; sets `_responseActive` on `response.created`.
JS OpenAIRealtimeAgentAdapter regression tests `javascript/src/voice/adapters/__tests__/openai-realtime-response-guard.test.ts`	`FakeWS` EventEmitter mock and five Vitest scenarios (AC-JS1/2/3/ERR1 and reconnect hygiene) for suppression, deferred fire, `sendText` guard, error control, and guard flag cleanup.
Python OpenAIRealtimeAgentAdapter: `_agent_turn_pending` fix and `send_text` guard `python/scenario/voice/adapters/openai_realtime.py`	`recv_audio` unconditionally clears `_agent_turn_pending` after audio buffer commit before the deferred/immediate branch; `send_text` now defers `response.create` via `_deferred_response_create` when `_response_active` is true.
Python regression tests and demo script `python/tests/voice/test_realtime_response_create_guard.py`, `python/tests/voice/demo_response_create_guard.py`	Three new async pytest cases for `send_text` suppression and deferral with interleaved log ordering; standalone demo script with `LoggingWS` stub running Scenarios A/B/C with ordered-log assertions and pass/fail exit code.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant Adapter as OpenAIRealtimeAgentAdapter
  participant OpenAI as OpenAI Realtime API

  Caller->>Adapter: receiveAudio() or sendText()
  alt _responseActive = true
    Adapter->>Adapter: _deferredResponseCreate = true (suppress)
  else _responseActive = false
    Adapter->>OpenAI: response.create
    OpenAI-->>Adapter: response.created → _responseActive = true
  end

  OpenAI-->>Adapter: response.done / response.cancelled
  Adapter->>Adapter: _responseActive = false
  alt _deferredResponseCreate = true
    Adapter->>OpenAI: response.create (deferred)
    OpenAI-->>Adapter: response.created → _responseActive = true
  end

Possibly related issues

#663: Proposes replacing the boolean _response_active/_deferred_response_create flags added in this PR with an explicit state machine, explicitly noting it stacks on #662.

Possibly related PRs

langwatch/scenario#659: Both PRs modify the same OpenAIRealtimeAgentAdapter response-in-flight lifecycle guard logic in Python, with corresponding regression test updates aligned to address overlapping constraint violations.

Suggested reviewers

0xdeafcafe
rogeriochaves
sergioestebance

Poem

🐰 A response was flying, mid-air, still alive,
Another one knocked — "Hey, let me inside!"
The rabbit said "Wait, there's one in the queue,
I'll defer your creation till the first one is through."
No double-fire drama, no API wrath —
Just one tidy response.create down the path! 🎙️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: guarding response.create sends across unguarded call sites in both JavaScript and Python, directly addressing issue `#662`.
Linked Issues check	✅ Passed	The pull request fully addresses all primary objectives from issue `#662`: guards all four unguarded call sites (JS receiveAudio, sendText, handleInitialResponse, handleAudioInput; Python send_text), implements deferred-send pattern, adds state tracking, provides hermetic test coverage, and maintains backward compatibility.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to guarding response.create call sites per issue `#662`. The test files, demo scripts, and field additions all serve the guard implementation objective with no tangential modifications.
Description check	✅ Passed	The pull request description is comprehensive and directly related to the changeset, explaining the race condition fix for response.create calls in the OpenAI Realtime API integration across both JavaScript and Python adapters.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/issue662-realtime-js-response-create

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

drewdrewthis · 2026-06-15T08:50:08Z

Review verdict: READY

Reviewed at: ff8d5fa0 · Run: /review --since 1439e56a (own-PR) · CI green (15 pass / 0 fail / 3 skip) · 0 unresolved review threads

No code-level blockers. The three CodeRabbit Major flags are triaged and resolved; the scoped re-review (reset-on-reconnect fix + test-timing change) surfaced no must-fix defects.

⚠️ Open maintainer design question gates HUMAN approval (not a code blocker). @rogeriochaves (2026-06-17): "if we guard then are we just silently dropping it? shouldn't we error … also thinking about interruptions?" This questions the guard approach itself. Issue #662's own Approach table chose defer-don't-drop (the voice adapter, openai-realtime.ts, implements this — AC-JS1/2/3); the agents adapter (realtime-agent.adapter.ts) implements suppress-only per AC-JS4/5. Whether the agents adapter should defer/error instead of suppress is a design call for the maintainers — see the flag (a) thread and "Follow-ups" below. This verdict reflects automated review only; it does not stand in for Rogerio's approval.

CodeRabbit Major flags — disposition

(c) reset-on-reconnect (openai-realtime.ts) — FIXED (71f11f95). disconnect() clears the guard flags; connect() clears them + _closeReason (required: _nextEvent rejects on a stale _closeReason, so a flag-only reset would leave reconnect dead). New test asserts flags clear AND reconnect functions. Thread resolved.
(b) fixed setTimeout delays (guard test) — FIXED (ff8d5fa0). 20/5/10ms → setTimeout(r, 0): the delays are macrotask boundaries draining receiveAudio's ordered-queue processing (load-independent), so magnitude was irrelevant; removed the magic numbers. Not queueMicrotask (under-drains a multi-tick continuation). Deterministic 5/5 ×3. Thread resolved.
(a) suppress-without-replay (realtime-agent.adapter.ts) — RESOLVED w/ reason. Correct per AC-JS4/5 (agents adapter scoped to suppression; defer-and-replay is the voice adapter's contract, AC-JS2). handleInitialResponse suppression is strictly correct (the active response IS the awaited one). handleAudioInput defer-consistency is a valid follow-up that would change the agreed AC contract — see Follow-ups. Thread resolved. Note: this is the same question @rogeriochaves raised above.

Non-blocking (Decide)

[principles] Asymmetric lifecycle reset — disconnect() clears 2 flags, connect() clears 3. Today connect is the authoritative reset so it's harmless, but a future 4th flag has two sites to keep in sync. Consider a single _resetLifecycleState(). (Decide — cheap extract vs explicit inline.)
[test][hygiene] _wsFactory reassignment cast — the reconnect test uses (adapter as unknown as { _wsFactory: unknown }), inconsistent with the file's Record<string,unknown> idiom and silently no-ops if the field is renamed (test would pass without exercising reconnect). Reuse the state binding, or add a test-seam accessor. (Decide — low-prob brittleness.)
[proof] Stale prove-it markers (945255e9, 7768c7cc) predate the rebase; all hermetic ACs verified by CI on HEAD. Re-run /prove-it to refresh. (Decide — process; coverage confirmed by CI.)

Follow-ups (out of scope for this PR)

Agents-adapter defer consistency — mirror the voice adapter's _deferredResponseCreate in RealtimeEventHandler so handleAudioInput defers-and-replays instead of suppress-and-return. Changes the AC-JS4/5 contract (their tests assert response.create count stays 0 after response.done), so it is a deliberate design change, not a bug fix. Directly addresses @rogeriochaves's question — recommend resolving that design call before/with this follow-up.
[hygiene] sibling openai-realtime-tool-calls.test.ts still uses setTimeout(r, 2) polls — align to the zero-delay idiom if adopted project-wide.
Prior follow-ups still open: FSM extraction; console.*→Logger in realtime-event-handler.ts/realtime-agent.adapter.ts; wsFactory authHeader JSDoc.

Verdict is prose, not a GitHub approval. Drew merges; Rogerio reviews next.

drewdrewthis · 2026-06-15T20:22:31Z

**[proof-reviewer] Stale prove-it marker — SHA `945255e9` is not an ancestor of HEAD**

The  marker (comment #4706259659) was written at SHA 945255e9, which is not an ancestor of the current HEAD 8f505f5b. The branch was rebased after the prove-it run, invalidating the marker.

Action: Fix — re-run /prove-it on HEAD 8f505f5b to refresh the marker and attach updated AC evidence.

drewdrewthis · 2026-06-15T20:49:30Z

No description provided.

drewdrewthis · 2026-06-16T11:08:54Z

Reopening to re-trigger CI workflows on fresh SHA.

time.monotonic() was stored in each log tuple but all three consumers (sent_types, ordered_log, first_log_idx) unpacked it as _ — dead data. Remove the timestamp field, drop import time. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… tests Drop the unreachable `self._agent_turn_pending = False` at py:429 (line 420 already clears it unconditionally before the branch is entered). Add optional `wsFactory` to `OpenAIRealtimeAgentAdapterInit` (mirrors the ElevenLabs `webSocketFactory` pattern) so tests can inject a fake WS that the adapter drives via its real `.on("message", ...)` listener path. Rewrite `openai-realtime-response-guard.test.ts` to use the factory + `FakeWS extends EventEmitter` instead of casting to the private `_enqueueEvent` method. All 4 guard tests pass; existing Python tests unaffected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…statement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

python/scenario/voice/adapters/openai_realtime.py (1)

345-357: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear deferred/active response flags on disconnect.

With the new send_text deferral path (Lines 718-723), _deferred_response_create can be set and persist. disconnect() (Lines 345-357) only closes _ws; it does not reset _response_active / _deferred_response_create, so adapter reuse may carry stale lifecycle state into the next connection.

🔧 Suggested teardown reset

     async def disconnect(self) -> None:
         """Close the WebSocket if open."""
         if self._ws is not None:
             try:
                 await self._ws.close()
             except Exception:
                 # Best-effort: connection may already be half-closed or in an
                 # error state when disconnect() is called. We're tearing down
                 # regardless — propagating here would just leak the WS reference.
                 pass
             finally:
                 self._ws = None
+                self._response_active = False
+                self._deferred_response_create = False
             logger.debug("OpenAIRealtimeAgentAdapter: disconnected")

Also applies to: 718-723

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/scenario/voice/adapters/openai_realtime.py` around lines 345 - 357,
The disconnect() method closes the WebSocket but does not reset the
_response_active and _deferred_response_create flags, causing stale lifecycle
state to persist when the adapter is reused. After closing the WebSocket in the
finally block of the disconnect() method, reset both _response_active and
_deferred_response_create to their initial values (likely False and None
respectively) to ensure a clean state for subsequent connections.

🧹 Nitpick comments (1)

javascript/src/agents/realtime/realtime-event-handler.ts (1)

180-194: ⚡ Quick win

Move isResponseActive() into the public-method section.

isResponseActive() (Lines 188-194) is public but currently sits below private reset() (Lines 180-186). Keep public methods grouped above private helpers for consistent class structure.

♻️ Suggested reorder

-  /**
-   * Resets the internal state for the next response
-   */
-  private reset(): void {
-    this._active = false;
-    this.responseResolver = null;
-    this.errorRejecter = null;
-    this.currentResponse = "";
-    this.currentAudioChunks = [];
-  }
-
   /**
    * Returns true while a response is in flight (between response.created
    * and response.done). Used by RealtimeAgentAdapter to guard response.create.
    */
   isResponseActive(): boolean {
     return this._active;
   }
+
+  /**
+   * Resets the internal state for the next response
+   */
+  private reset(): void {
+    this._active = false;
+    this.responseResolver = null;
+    this.errorRejecter = null;
+    this.currentResponse = "";
+    this.currentAudioChunks = [];
+  }

As per coding guidelines, **/*.ts: "In TypeScript classes, place public methods first, private methods at the bottom, and group related methods together".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@javascript/src/agents/realtime/realtime-event-handler.ts` around lines 180 -
194, The public method isResponseActive() is currently positioned after the
private method reset(), which violates the class organization guideline
requiring public methods to be grouped before private methods. Move the
isResponseActive() method (which checks if _active is true and includes its
JSDoc comment) above the reset() method so that all public methods appear before
private helper methods in the class structure.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@javascript/src/agents/__tests__/realtime-response-guard.test.ts`:
- Around line 116-117: The test uses fixed 20ms setTimeout delays at lines 116
and 136 to allow async operations to complete, which creates timing-dependent
flakiness. Replace these arbitrary sleep delays with deterministic
synchronization mechanisms such as awaiting actual promises from the code under
test, using jest.runAllTimers() if using fake timers, or checking that mock
functions have been called with expected arguments before proceeding with
assertions instead of relying on wall-clock delays.

In `@javascript/src/agents/realtime/realtime-agent.adapter.ts`:
- Around line 188-193: The guard condition checking `isResponseActive()`
suppresses the `response.create` event, but this can drop new turns when the
in-flight response completes. Instead of suppressing the send, implement
deferred send behavior by storing the pending `response.create` event (similar
to a `_deferredResponseCreate` pattern used in voice adapters) and firing it
when the current response completes via `response.done` or `response.cancelled`
events. Apply this deferred pattern to both occurrences of the
`isResponseActive()` guard at the specified locations to ensure suppressed
creates are replayed rather than dropped.

In
`@javascript/src/voice/adapters/__tests__/openai-realtime-response-guard.test.ts`:
- Around line 148-149: Replace the hard-coded setTimeout delays at three
locations in the openai-realtime-response-guard.test.ts file (where Promise
timeouts of 20ms are used) with deterministic synchronization mechanisms.
Instead of relying on arbitrary delays, use test utilities such as
jest.useFakeTimers() and jest.advanceTimersByTime() to control time
deterministically, or implement event-based synchronization where the test waits
for actual completion signals from the code under test rather than hoping a
fixed delay is sufficient. This will make the tests reproducible and eliminate
race conditions that can cause flaky test failures.

In `@javascript/src/voice/adapters/openai-realtime.ts`:
- Around line 180-189: The session-lifecycle state flags `_responseActive` and
`_deferredResponseCreate` are not being reset during teardown/disconnect,
causing stale state to persist when the adapter instance is reused for a new
connection. Add logic in the teardown or disconnect method to reset both
`_responseActive` and `_deferredResponseCreate` back to their initial false
state to ensure a clean slate for subsequent connections.

In `@python/tests/voice/demo_response_create_guard.py`:
- Around line 233-245: The return statement at the end of the function is
missing the item_ok validation check. The variable item_ok is calculated to
verify that "conversation.item.create" was sent, but it is not included in the
return statement alongside guard_ok and deferred_ok. Update the return statement
to include all three conditions (guard_ok and deferred_ok and item_ok) so that
the function properly validates all required behaviors before reporting success.

---

Outside diff comments:
In `@python/scenario/voice/adapters/openai_realtime.py`:
- Around line 345-357: The disconnect() method closes the WebSocket but does not
reset the _response_active and _deferred_response_create flags, causing stale
lifecycle state to persist when the adapter is reused. After closing the
WebSocket in the finally block of the disconnect() method, reset both
_response_active and _deferred_response_create to their initial values (likely
False and None respectively) to ensure a clean state for subsequent connections.

---

Nitpick comments:
In `@javascript/src/agents/realtime/realtime-event-handler.ts`:
- Around line 180-194: The public method isResponseActive() is currently
positioned after the private method reset(), which violates the class
organization guideline requiring public methods to be grouped before private
methods. Move the isResponseActive() method (which checks if _active is true and
includes its JSDoc comment) above the reset() method so that all public methods
appear before private helper methods in the class structure.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 98a7449d-261f-47ab-b347-066c42764439

📥 Commits

Reviewing files that changed from the base of the PR and between 732d426 and 47cc632.

📒 Files selected for processing (8)

javascript/src/agents/__tests__/realtime-response-guard.test.ts
javascript/src/agents/realtime/realtime-agent.adapter.ts
javascript/src/agents/realtime/realtime-event-handler.ts
javascript/src/voice/adapters/__tests__/openai-realtime-response-guard.test.ts
javascript/src/voice/adapters/openai-realtime.ts
python/scenario/voice/adapters/openai_realtime.py
python/tests/voice/demo_response_create_guard.py
python/tests/voice/test_realtime_response_create_guard.py

…_ok, setTimeout too short - demo_response_create_guard.py Scenario C: `return guard_ok and deferred_ok` dropped `item_ok`, so the demo could print FAIL for item check then return True (false proof). Fixed to `return guard_ok and deferred_ok and item_ok`. - realtime-response-guard.test.ts AC-JS4/JS5: widen 20ms flush wait to 100ms and add comment explaining it is a preamble flush, not a timed window. Prevents flake on loaded CI runners. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…k flush The prior commit widened the pre-assert sleep to 100ms, which dead-heated the adapter's responseTimeout:100 in makeAdapter — on CI the adapter's internal waitForResponse timeout fired before the test's 100ms wait elapsed and fired response.done, throwing "Agent response timeout after 100ms" and failing ci-checks (24.x). The send decision in handleInitialResponse/handleAudioInput is synchronous, before the method's first await (waitForResponse). So a microtask flush (await Promise.resolve()) followed by a prompt response.done is deterministic and settles well inside the timeout — no wall-clock race. Test runs in 7ms. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…econnect CodeRabbit flagged that `_responseActive`/`_deferredResponseCreate` (the #662 guard flags) were never cleared in teardown, so a reused adapter instance could carry stale deferred state into its next connection — suppressing the first turn after a reconnect. Fix: - disconnect(): clear both guard flags on teardown. - connect(): clear both guard flags AND `_closeReason`. The latter is required — `_nextEvent` rejects immediately when `_closeReason` is set, so resetting only the guard flags would leave a reconnected adapter unable to receive (cosmetic half-fix). Test: new "resets response-lifecycle state on disconnect and reconnects clean" asserts the flags clear on disconnect and that a reconnected instance reaches its own receiveAudio timeout (proving the stale close reason was cleared) rather than rejecting instantly. Full voice-adapter suite: 33/33 green. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… macrotask yields CodeRabbit flagged the hard-coded 20/5/10ms delays in the receiveAudio guard tests as brittle. They are not race-prone — receiveAudio's preamble and _nextEvent setup are synchronous and server events flow through an ordered queue, so each delay is a macrotask boundary that drains the loop's microtask chain regardless of magnitude or runner load. Reduce all three to setTimeout 0 to drop the arbitrary magnitudes while keeping the robust macrotask-drain semantics. (Not queueMicrotask, per CodeRabbit's suggestion: a single microtask tick can under-drain a multi-tick continuation; a zero-delay macrotask drains the whole queue.) Deterministic across repeated runs: 5/5 x3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

langwatch-agent · 2026-07-03T15:29:46Z

🐕 PR Hound review brief — assigned reviewer @rogeriochaves

Review mode: Deep Review
Scariest prod risk: This is a concurrency guard across four call sites in two languages. If a deferred response.create is never fired (flag stuck active) the agent goes silent forever; if it fires twice you get the exact "active response in progress" error this PR exists to prevent. The race window between firing the deferred create and response.created arriving is the crux.

Inspection targets (max 3, each names a file or behavior):

[Must Check] javascript/src/voice/adapters/openai-realtime.ts — _responseActive is set immediately after firing the deferred create (before response.created arrives) and _deferredResponseCreate is consumed exactly once on response.done/response.cancelled; no path leaves it set-and-never-fired.
[Must Check] realtime-event-handler.ts _active/isResponseActive() — set on response.created, cleared on response.done, with no missed response.cancelled/error transition that leaves _active stuck true and wedges all future turns.
[Must Check] python/scenario/voice/adapters/openai_realtime.py — _agent_turn_pending now cleared before the if/else; confirm both Python guards (send_text and recv_audio) mirror the JS lifecycle and can't defer without a later trigger to fire it.

Author questions (optional, max 3):

If response.done never arrives (transport error / dropped socket) after a deferral, is _responseActive/_active ever reset, or does the adapter deadlock on the next turn?
The four newly-guarded call sites: is there any path that legitimately needs an unconditional response.create (e.g. a cancel-then-recreate) that this guard now suppresses?

github-actions · 2026-07-03T15:53:58Z

Automated low-risk assessment

This PR was evaluated against the repository's Low-Risk Pull Requests procedure and does not qualify as low risk.

The changes modify runtime behavior of the realtime agent adapters in both JavaScript and Python: they add lifecycle flags, defer and re-order sending of response.create, and change connect/disconnect logic. Because this alters integration and messaging with the OpenAI Realtime API (an external third‑party integration) and changes protocol/behavioral logic, it does not meet the low-risk criteria and requires normal review.

This PR requires a manual review before merging.

drewdrewthis added the prove-it-clean All ACs verified by /prove-it at this HEAD label Jun 15, 2026

drewdrewthis force-pushed the fix/issue662-realtime-js-response-create branch from 8a197d5 to 9790b71 Compare June 15, 2026 07:58

drewdrewthis marked this pull request as ready for review June 15, 2026 08:19

drewdrewthis requested a review from rogeriochaves June 15, 2026 08:19

drewdrewthis self-assigned this Jun 15, 2026

drewdrewthis force-pushed the fix/issue662-realtime-js-response-create branch from 945255e to b1b2ead Compare June 15, 2026 17:39

drewdrewthis force-pushed the fix/657 branch from b6ed2f3 to fc0ff2d Compare June 15, 2026 17:46