Skip to content

fix(voice/#662): guard response.create at all unguarded call sites (JS + PY)#669

Open
drewdrewthis wants to merge 16 commits into
mainfrom
fix/issue662-realtime-js-response-create
Open

fix(voice/#662): guard response.create at all unguarded call sites (JS + PY)#669
drewdrewthis wants to merge 16 commits into
mainfrom
fix/issue662-realtime-js-response-create

Conversation

@drewdrewthis

@drewdrewthis drewdrewthis commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Why

The OpenAI Realtime API rejects (or silently drops) a `response.create` sent while a response is already streaming — `{"type":"error","message":"Conversation already has an active response in progress"}`. PR #659 fixed this race for the Python `recv_audio` user-audio branch; three JS call sites and one Python call site were left unguarded. Closes #662.

What changed

  • `RealtimeEventHandler` gained `_active` / `isResponseActive()` — set on `response.created`, cleared on `response.done`. This is the authoritative lifecycle flag for the agent-role JS path; no parallel listener needed.
  • `openai-realtime.ts` gained `_responseActive` + `_deferredResponseCreate` — `receiveAudio` guards the `response.create` preamble (commit always, create only when not active, else defer); `sendText` guards identically. Deferred create fires on `response.done`/`response.cancelled` inside the existing tool-drain branch. `_responseActive` is set immediately after firing the deferred create (closes race window before `response.created` arrives — mirrors Python).
  • `RealtimeAgentAdapter.handleInitialResponse` + `handleAudioInput` guard via `isResponseActive()` — both unconditional `transport.sendEvent({ type: "response.create" })` calls now short-circuit when the handler reports active.
  • Python `openai_realtime.send_text` guards via `_response_active` / `_deferred_response_create` (fields introduced by fix(voice): guard response.create on active response in recv_audio #659, already on this branch) — matches the `recv_audio` pattern exactly.
  • Python `recv_audio`: `_agent_turn_pending` cleared before the if/else — was only cleared in the `else` branch, leaving it set in the deferred path and allowing a spurious `response.create` on the next `recv_audio` call.

Test plan

All tests are hermetic mock-transport / mock-WS — no live OpenAI key required.

Python:
```
cd python && pytest tests/voice/test_realtime_response_create_guard.py -v
```
Covers AC-PY1 (deferred when active), AC-PY2 (fires after `response.done`), AC-DEFER (consumed exactly once), and AC-REG1 (normal path unchanged).

JavaScript:
```
cd javascript && npm test -- --testPathPattern="openai-realtime.test|realtime-adapter|realtime-event-handler"
```
Covers AC-JS1–6: `receiveAudio` guard (deferred create, frame ordering), `sendText` guard, `handleInitialResponse`+`handleAudioInput` guard, `isResponseActive()` lifecycle transitions. AC-REG3: existing frame-ordering tests in `realtime-adapter-frames.test.ts` still pass.

Human verification

For a skeptical reviewer who wants to confirm this independently — no OpenAI API key required:

  1. Check out the branchgit fetch origin && git checkout fix/issue662-realtime-js-response-create.
  2. Run the Python guard suitecd python && pytest tests/voice/test_realtime_response_create_guard.py -v. Confirm all guard cases pass (AC1–AC7 plus the send_text cases test_ac_py1/_py2/_defer).
  3. Run the JS guard suitescd javascript && npm test -- realtime-response-guard. Confirm AC-JS4/JS5 (RealtimeAgentAdapter guard), AC-JS6a–d (isResponseActive() lifecycle), and the OpenAIRealtimeAgentAdapter — response.create guard (#662) cases pass.
  4. Falsifiability — in python/scenario/voice/adapters/openai_realtime.py, delete the if self._response_active: self._deferred_response_create = True; return guard inside send_text (so response.create fires unconditionally), re-run step 2, and confirm test_ac_py1_send_text_suppressed_while_response_active now FAILS. Restore the guard → it passes again.
  5. (Optional, live key) Run cd python && python3 tests/voice/demo_response_create_guard.py against a real OpenAI Realtime endpoint and confirm no Conversation already has an active response in progress error appears — matching the captured run below.

How I can prove I was successful

Scope: backend-only, no UI surface. Every file in this PR is voice-loop guard logic or its tests — the JS realtime adapter/event-handler (javascript/src/agents/realtime/, javascript/src/voice/adapters/openai-realtime.ts) and the Python realtime adapter (python/scenario/voice/adapters/openai_realtime.py). The PR's Files changed tab is entirely .ts/.py (8 files, no .tsx/.jsx/.vue/CSS/HTML), so there is no frontend surface to screenshot. Proof is the falsifiability tests above plus the captured live-API wire log below.

Guard demonstration — hermetic stub + captured live-API run:

Script: `python/tests/voice/demo_response_create_guard.py` (run via `cd python && python3 tests/voice/demo_response_create_guard.py`)

The committed demo exercises the production OpenAIRealtimeAgentAdapter code through a LoggingWS stub that replays scripted server events and captures every send/recv. No OpenAI API key required. The stub makes the guard observable without a network connection.

The captured output below is from a real OpenAI Realtime API run (gpt-realtime-mini, 2026-06-15) using a live-API development harness (not committed to the repo). It shows the same three-phase sequence against an actual WebSocket connection, confirming the guard works end-to-end.

Three phases exercise the previously-unguarded `send_text()` call site:

  1. Phase 1 — start a real agent response (`response.create` → receive real audio deltas)
  2. Phase 2 — call `send_text()` while `_response_active=True` — the guard fires: `conversation.item.create` is sent, NO duplicate `response.create`
  3. Phase 3 — drain until `response.done` → deferred `response.create` fires automatically, second response completes

Captured live run output (2026-06-15, gpt-realtime-mini):

```

response.create guard — LIVE API DEMO
PR #669 langwatch/scenario
Real OpenAI Realtime WebSocket — gpt-realtime-mini

[t=0.001s] Connecting to wss://api.openai.com/v1/realtime?model=gpt-realtime-mini ...
[t=0.547s] Connected.

[t=0.547s] → SEND session.update
[t=0.555s] ← RECV session.created

────────────────────────────────────────────────────────────────
PHASE 1: Agent-initiated response (real audio exchange)
────────────────────────────────────────────────────────────────

[t=0.555s] Calling recv_audio(timeout=30) — waiting for first audio chunk ...
[t=0.555s] → SEND response.create
[t=0.681s] ← RECV session.updated
[t=0.689s] ← RECV response.created
[t=0.973s] ← RECV response.output_item.added
[t=0.973s] ← RECV conversation.item.added
[t=1.046s] ← RECV response.content_part.added
[t=1.046s] ← RECV response.output_audio_transcript.delta
[t=1.061s] ← RECV response.output_audio_transcript.delta
[t=1.212s] ← RECV response.output_audio_transcript.delta
[t=1.219s] ← RECV response.output_audio_transcript.delta
[t=1.464s] ← RECV response.output_audio.delta (19200 bytes)
[t=1.464s] *** First audio chunk received (19200 bytes) ***
_response_active = True

────────────────────────────────────────────────────────────────
PHASE 2: send_text() while _response_active=True
────────────────────────────────────────────────────────────────

[t=1.464s] _response_active before send_text = True
[t=1.464s] _deferred_response_create before = False
[t=1.464s] GUARD CONDITION MET — calling send_text() ...
[t=1.464s] → SEND conversation.item.create
[t=1.464s] _deferred_response_create after = True
[t=1.464s] response.create count after send_text: 1 (was 1)

────────────────────────────────────────────────────────────────
PHASE 3: Drain — watching for response.done + deferred create
────────────────────────────────────────────────────────────────

[t=1.464s] ← RECV response.output_audio.delta (19200 bytes)
... [23 more audio deltas from original response] ...
[t=4.627s] ← RECV response.done
[t=4.627s] → SEND response.create ← DEFERRED create fires after response.done
[t=4.641s] ← RECV rate_limits.updated
[t=4.765s] ← RECV response.created
... [13 more audio deltas from deferred response] ...
[t=6.464s] ← RECV response.done

────────────────────────────────────────────────────────────────
WIRE LOG ASSESSMENT
────────────────────────────────────────────────────────────────

Guard suppressed response.create during send_text: YES ✓
_deferred_response_create=True after send_text(): YES ✓
conversation.item.create was sent: YES ✓
Deferred response.create fired after response.done: YES ✓
response.create count before first response.done: 1 (expected 1) ✓
(log idx: response.done=100, deferred create=101)

ALL CHECKS PASS — guard is operating correctly on real API.
```

The key sequence from the full wire log:

  • t=1.464s: `SEND conversation.item.create` — NO second `response.create` (GUARD SUPPRESSED IT)
  • t=4.627s: `RECV response.done` — deferred create fires immediately
  • t=4.627s: `SEND response.create` — fires strictly AFTER `response.done` (log idx 101 > 100)

Anything surprising?

`response.cancelled` clears `_active` / `_responseActive` — on `response.cancelled` the server is done streaming, so the flag must clear and any deferred create must fire. Both JS paths clear on cancel as well as done. This isn't obvious from the issue spec (which only mentions `response.done`) but is required for correctness if the user interrupts.

`_agent_turn_pending` must be cleared before the deferred branch — the original fix only cleared it in the `else` path, leaving it set when deferring. A subsequent `recv_audio` call would then fire a spurious bare `response.create` via the `elif _agent_turn_pending` guard. Fixed in the review-findings commit.

JS `_responseActive` must be set immediately after firing the deferred create — between the deferred send and the server's `response.created` ACK, `_responseActive` was `false`, leaving a race window where `sendText` could fire a second `response.create`. Fixed by mirroring Python's pattern of setting the flag immediately.

Verification Report

# Criterion Evidence Status
AC-PY1 send_text defers response.create when _response_active=True; sets _deferred_response_create=True test_ac_py1_send_text_suppressed_while_response_active passes; _deferred_response_create field at openai_realtime.py:153 ✅ PASS
AC-PY2 Deferred send_text create fires after response.done (ordering: log_create > log_done) test_ac_py2_send_text_deferred_create_fires_after_response_done passes with interleaved-log ordering check ✅ PASS
AC-DEFER Deferred create consumed exactly once by next recv_audio test_ac_defer_send_text_deferred_create_fires_exactly_once passes; count_of("response.create") == 1 ✅ PASS
AC-JS1 receiveAudio preamble: commit sent, response.create NOT sent when _responseActive=True AC-JS1 vitest test passes: sentCount("input_audio_buffer.commit")==1, sentCount("response.create")==0 ✅ PASS
AC-JS2 Deferred receiveAudio response.create fires after response.done AC-JS2 vitest test passes: ordered-frame assertion via FakeWS EventEmitter recv queue ✅ PASS
AC-JS3 sendText does not send response.create when _responseActive=True AC-JS3 vitest test passes: sentCount("response.create")==0 ✅ PASS
AC-JS4 handleInitialResponse does not send response.create when isResponseActive()=true AC-JS4 vitest test passes; guard at realtime-agent.adapter.ts:188 ✅ PASS
AC-JS5 handleAudioInput does not send response.create when isResponseActive()=true AC-JS5 vitest test passes; guard at realtime-agent.adapter.ts:232 ✅ PASS
AC-JS6 isResponseActive() transitions: falseresponse.createdtrueresponse.donefalse AC-JS6a/b/c/d vitest tests pass; realtime-event-handler.ts:192 ✅ PASS
AC-REG1 Normal (uncontested) paths still fire response.create immediately Python: test_ac5_normal_path_commit_then_create PASS; JS: normal-path branches in all guard tests PASS ✅ PASS
AC-REG2 Pre-existing AC1–AC7 from test_realtime_response_create_guard.py still pass pytest tests/voice/test_realtime_response_create_guard.py → 10 passed, 0 failed ✅ PASS
AC-REG3 Existing JS frame-ordering tests in realtime-adapter-frames.test.ts still pass vitest run realtime-adapter-frames.test.ts → 2 passed ✅ PASS
AC-ERR1 Server "active response in progress" error surfaces as rejected Error / RuntimeError Python: test_ac6_server_rejection_raises_runtime_error PASS; JS: AC-ERR1 vitest test PASS ✅ PASS

Verdict: 13/13 verified — ALL PASS. Demo script (demo_response_create_guard.py) exercises production adapter code hermetically; live-API run output above confirms end-to-end behavior against real OpenAI Realtime WebSocket.

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a response.create double-fire guard across JS (RealtimeEventHandler, RealtimeAgentAdapter, OpenAIRealtimeAgentAdapter) and Python (OpenAIRealtimeAgentAdapter) realtime adapters. A new _active/_responseActive flag tracks the in-flight response lifecycle; response.create is deferred instead of sent immediately when a response is already active, then flushed on response.done/response.cancelled. Hermetic regression tests cover all four guarded call sites.

Changes

response.create double-fire guard — JS and Python adapters + tests

Layer / File(s) Summary
JS RealtimeEventHandler: _active flag and isResponseActive()
javascript/src/agents/realtime/realtime-event-handler.ts
Adds private _active boolean, wires it to response.created/response.cancelled/response.done events, clears it in reset(), and exposes it via new public isResponseActive().
JS RealtimeAgentAdapter: guard handleInitialResponse and handleAudioInput
javascript/src/agents/realtime/realtime-agent.adapter.ts
Both handleInitialResponse and handleAudioInput now check isResponseActive() before emitting response.create.
JS RealtimeEventHandler and RealtimeAgentAdapter regression tests
javascript/src/agents/__tests__/realtime-response-guard.test.ts
FakeTransport/makeFakeSession helpers plus two test suites: isResponseActive() lifecycle transitions and response.create suppression in handleInitialResponse/handleAudioInput.
JS OpenAIRealtimeAgentAdapter: wsFactory injection, lifecycle flags, deferred response.create
javascript/src/voice/adapters/openai-realtime.ts
Adds optional wsFactory to OpenAIRealtimeAgentAdapterInit for test injection; adds _responseActive and _deferredResponseCreate private fields; guards receiveAudio and sendText to defer response.create when active; fires deferred create on response.done/response.cancelled; sets _responseActive on response.created.
JS OpenAIRealtimeAgentAdapter regression tests
javascript/src/voice/adapters/__tests__/openai-realtime-response-guard.test.ts
FakeWS EventEmitter mock and five Vitest scenarios (AC-JS1/2/3/ERR1 and reconnect hygiene) for suppression, deferred fire, sendText guard, error control, and guard flag cleanup.
Python OpenAIRealtimeAgentAdapter: _agent_turn_pending fix and send_text guard
python/scenario/voice/adapters/openai_realtime.py
recv_audio unconditionally clears _agent_turn_pending after audio buffer commit before the deferred/immediate branch; send_text now defers response.create via _deferred_response_create when _response_active is true.
Python regression tests and demo script
python/tests/voice/test_realtime_response_create_guard.py, python/tests/voice/demo_response_create_guard.py
Three new async pytest cases for send_text suppression and deferral with interleaved log ordering; standalone demo script with LoggingWS stub running Scenarios A/B/C with ordered-log assertions and pass/fail exit code.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant Adapter as OpenAIRealtimeAgentAdapter
  participant OpenAI as OpenAI Realtime API

  Caller->>Adapter: receiveAudio() or sendText()
  alt _responseActive = true
    Adapter->>Adapter: _deferredResponseCreate = true (suppress)
  else _responseActive = false
    Adapter->>OpenAI: response.create
    OpenAI-->>Adapter: response.created → _responseActive = true
  end

  OpenAI-->>Adapter: response.done / response.cancelled
  Adapter->>Adapter: _responseActive = false
  alt _deferredResponseCreate = true
    Adapter->>OpenAI: response.create (deferred)
    OpenAI-->>Adapter: response.created → _responseActive = true
  end
Loading

Possibly related issues

  • #663: Proposes replacing the boolean _response_active/_deferred_response_create flags added in this PR with an explicit state machine, explicitly noting it stacks on #662.

Possibly related PRs

  • langwatch/scenario#659: Both PRs modify the same OpenAIRealtimeAgentAdapter response-in-flight lifecycle guard logic in Python, with corresponding regression test updates aligned to address overlapping constraint violations.

Suggested reviewers

  • 0xdeafcafe
  • rogeriochaves
  • sergioestebance

Poem

🐰 A response was flying, mid-air, still alive,
Another one knocked — "Hey, let me inside!"
The rabbit said "Wait, there's one in the queue,
I'll defer your creation till the first one is through."
No double-fire drama, no API wrath —
Just one tidy response.create down the path! 🎙️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: guarding response.create sends across unguarded call sites in both JavaScript and Python, directly addressing issue #662.
Linked Issues check ✅ Passed The pull request fully addresses all primary objectives from issue #662: guards all four unguarded call sites (JS receiveAudio, sendText, handleInitialResponse, handleAudioInput; Python send_text), implements deferred-send pattern, adds state tracking, provides hermetic test coverage, and maintains backward compatibility.
Out of Scope Changes check ✅ Passed All changes are directly scoped to guarding response.create call sites per issue #662. The test files, demo scripts, and field additions all serve the guard implementation objective with no tangential modifications.
Description check ✅ Passed The pull request description is comprehensive and directly related to the changeset, explaining the race condition fix for response.create calls in the OpenAI Realtime API integration across both JavaScript and Python adapters.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/issue662-realtime-js-response-create

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@drewdrewthis drewdrewthis added the prove-it-clean All ACs verified by /prove-it at this HEAD label Jun 15, 2026
@drewdrewthis drewdrewthis force-pushed the fix/issue662-realtime-js-response-create branch from 8a197d5 to 9790b71 Compare June 15, 2026 07:58
@drewdrewthis drewdrewthis marked this pull request as ready for review June 15, 2026 08:19
@drewdrewthis drewdrewthis self-assigned this Jun 15, 2026
@drewdrewthis

drewdrewthis commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator Author

Review verdict: READY

Reviewed at: ff8d5fa0 · Run: /review --since 1439e56a (own-PR) · CI green (15 pass / 0 fail / 3 skip) · 0 unresolved review threads

No code-level blockers. The three CodeRabbit Major flags are triaged and resolved; the scoped re-review (reset-on-reconnect fix + test-timing change) surfaced no must-fix defects.

⚠️ Open maintainer design question gates HUMAN approval (not a code blocker). @rogeriochaves (2026-06-17): "if we guard then are we just silently dropping it? shouldn't we error … also thinking about interruptions?" This questions the guard approach itself. Issue #662's own Approach table chose defer-don't-drop (the voice adapter, openai-realtime.ts, implements this — AC-JS1/2/3); the agents adapter (realtime-agent.adapter.ts) implements suppress-only per AC-JS4/5. Whether the agents adapter should defer/error instead of suppress is a design call for the maintainers — see the flag (a) thread and "Follow-ups" below. This verdict reflects automated review only; it does not stand in for Rogerio's approval.

CodeRabbit Major flags — disposition

  1. (c) reset-on-reconnect (openai-realtime.ts) — FIXED (71f11f95). disconnect() clears the guard flags; connect() clears them + _closeReason (required: _nextEvent rejects on a stale _closeReason, so a flag-only reset would leave reconnect dead). New test asserts flags clear AND reconnect functions. Thread resolved.
  2. (b) fixed setTimeout delays (guard test) — FIXED (ff8d5fa0). 20/5/10ms → setTimeout(r, 0): the delays are macrotask boundaries draining receiveAudio's ordered-queue processing (load-independent), so magnitude was irrelevant; removed the magic numbers. Not queueMicrotask (under-drains a multi-tick continuation). Deterministic 5/5 ×3. Thread resolved.
  3. (a) suppress-without-replay (realtime-agent.adapter.ts) — RESOLVED w/ reason. Correct per AC-JS4/5 (agents adapter scoped to suppression; defer-and-replay is the voice adapter's contract, AC-JS2). handleInitialResponse suppression is strictly correct (the active response IS the awaited one). handleAudioInput defer-consistency is a valid follow-up that would change the agreed AC contract — see Follow-ups. Thread resolved. Note: this is the same question @rogeriochaves raised above.

Non-blocking (Decide)

  • [principles] Asymmetric lifecycle resetdisconnect() clears 2 flags, connect() clears 3. Today connect is the authoritative reset so it's harmless, but a future 4th flag has two sites to keep in sync. Consider a single _resetLifecycleState(). (Decide — cheap extract vs explicit inline.)
  • [test][hygiene] _wsFactory reassignment cast — the reconnect test uses (adapter as unknown as { _wsFactory: unknown }), inconsistent with the file's Record<string,unknown> idiom and silently no-ops if the field is renamed (test would pass without exercising reconnect). Reuse the state binding, or add a test-seam accessor. (Decide — low-prob brittleness.)
  • [proof] Stale prove-it markers (945255e9, 7768c7cc) predate the rebase; all hermetic ACs verified by CI on HEAD. Re-run /prove-it to refresh. (Decide — process; coverage confirmed by CI.)

Follow-ups (out of scope for this PR)

  • Agents-adapter defer consistency — mirror the voice adapter's _deferredResponseCreate in RealtimeEventHandler so handleAudioInput defers-and-replays instead of suppress-and-return. Changes the AC-JS4/5 contract (their tests assert response.create count stays 0 after response.done), so it is a deliberate design change, not a bug fix. Directly addresses @rogeriochaves's question — recommend resolving that design call before/with this follow-up.
  • [hygiene] sibling openai-realtime-tool-calls.test.ts still uses setTimeout(r, 2) polls — align to the zero-delay idiom if adopted project-wide.
  • Prior follow-ups still open: FSM extraction; console.*Logger in realtime-event-handler.ts/realtime-agent.adapter.ts; wsFactory authHeader JSDoc.

Verdict is prose, not a GitHub approval. Drew merges; Rogerio reviews next.

@drewdrewthis drewdrewthis force-pushed the fix/issue662-realtime-js-response-create branch from 945255e to b1b2ead Compare June 15, 2026 17:39
Comment thread python/scenario/voice/adapters/openai_realtime.py Outdated
Comment thread javascript/src/voice/adapters/openai-realtime.ts
Comment thread python/scenario/voice/adapters/openai_realtime.py
Comment thread python/tests/voice/test_realtime_response_create_guard.py
@drewdrewthis drewdrewthis marked this pull request as draft June 15, 2026 18:12
@drewdrewthis drewdrewthis marked this pull request as ready for review June 15, 2026 18:19
@drewdrewthis drewdrewthis force-pushed the fix/issue662-realtime-js-response-create branch from 7d30b81 to 9538bba Compare June 15, 2026 18:20
Comment thread python/scenario/voice/adapters/openai_realtime.py Outdated
Comment thread javascript/src/voice/adapters/__tests__/openai-realtime-response-guard.test.ts Outdated
@drewdrewthis

drewdrewthis commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator Author
**[proof-reviewer] Stale prove-it marker — SHA `945255e9` is not an ancestor of HEAD**

The <!-- prove-it-clean: 945255e90d12244c2e9eecf05ea067eaf996ec8d --> marker (comment #4706259659) was written at SHA 945255e9, which is not an ancestor of the current HEAD 8f505f5b. The branch was rebased after the prove-it run, invalidating the marker.

Action: Fix — re-run /prove-it on HEAD 8f505f5b to refresh the marker and attach updated AC evidence.

@drewdrewthis drewdrewthis removed the prove-it-clean All ACs verified by /prove-it at this HEAD label Jun 15, 2026
@drewdrewthis

Copy link
Copy Markdown
Collaborator Author

No description provided.

@drewdrewthis drewdrewthis added prove-it-clean All ACs verified by /prove-it at this HEAD ai-reviewed /review was run on this PR (multi-agent: principles, hygiene, test, security) labels Jun 15, 2026
@drewdrewthis drewdrewthis force-pushed the fix/issue662-realtime-js-response-create branch from 7768c7c to 661c7f5 Compare June 16, 2026 10:33
@drewdrewthis drewdrewthis marked this pull request as draft June 16, 2026 10:59
@drewdrewthis drewdrewthis marked this pull request as ready for review June 16, 2026 10:59
@drewdrewthis

Copy link
Copy Markdown
Collaborator Author

Reopening to re-trigger CI workflows on fresh SHA.

@drewdrewthis drewdrewthis reopened this Jun 16, 2026
@drewdrewthis drewdrewthis force-pushed the fix/issue662-realtime-js-response-create branch 2 times, most recently from 661c7f5 to 2d1bc62 Compare June 16, 2026 14:51
Ubuntu and others added 3 commits June 17, 2026 13:04
time.monotonic() was stored in each log tuple but all three consumers
(sent_types, ordered_log, first_log_idx) unpacked it as _ — dead data.
Remove the timestamp field, drop import time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… tests

Drop the unreachable `self._agent_turn_pending = False` at py:429 (line 420
already clears it unconditionally before the branch is entered).

Add optional `wsFactory` to `OpenAIRealtimeAgentAdapterInit` (mirrors the
ElevenLabs `webSocketFactory` pattern) so tests can inject a fake WS that
the adapter drives via its real `.on("message", ...)` listener path.

Rewrite `openai-realtime-response-guard.test.ts` to use the factory +
`FakeWS extends EventEmitter` instead of casting to the private
`_enqueueEvent` method.  All 4 guard tests pass; existing Python tests
unaffected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…statement

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@drewdrewthis drewdrewthis force-pushed the fix/issue662-realtime-js-response-create branch from 8441aab to 47cc632 Compare June 17, 2026 13:04

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
python/scenario/voice/adapters/openai_realtime.py (1)

345-357: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear deferred/active response flags on disconnect.

With the new send_text deferral path (Lines 718-723), _deferred_response_create can be set and persist. disconnect() (Lines 345-357) only closes _ws; it does not reset _response_active / _deferred_response_create, so adapter reuse may carry stale lifecycle state into the next connection.

🔧 Suggested teardown reset
     async def disconnect(self) -> None:
         """Close the WebSocket if open."""
         if self._ws is not None:
             try:
                 await self._ws.close()
             except Exception:
                 # Best-effort: connection may already be half-closed or in an
                 # error state when disconnect() is called. We're tearing down
                 # regardless — propagating here would just leak the WS reference.
                 pass
             finally:
                 self._ws = None
+                self._response_active = False
+                self._deferred_response_create = False
             logger.debug("OpenAIRealtimeAgentAdapter: disconnected")

Also applies to: 718-723

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/scenario/voice/adapters/openai_realtime.py` around lines 345 - 357,
The disconnect() method closes the WebSocket but does not reset the
_response_active and _deferred_response_create flags, causing stale lifecycle
state to persist when the adapter is reused. After closing the WebSocket in the
finally block of the disconnect() method, reset both _response_active and
_deferred_response_create to their initial values (likely False and None
respectively) to ensure a clean state for subsequent connections.
🧹 Nitpick comments (1)
javascript/src/agents/realtime/realtime-event-handler.ts (1)

180-194: ⚡ Quick win

Move isResponseActive() into the public-method section.

isResponseActive() (Lines 188-194) is public but currently sits below private reset() (Lines 180-186). Keep public methods grouped above private helpers for consistent class structure.

♻️ Suggested reorder
-  /**
-   * Resets the internal state for the next response
-   */
-  private reset(): void {
-    this._active = false;
-    this.responseResolver = null;
-    this.errorRejecter = null;
-    this.currentResponse = "";
-    this.currentAudioChunks = [];
-  }
-
   /**
    * Returns true while a response is in flight (between response.created
    * and response.done). Used by RealtimeAgentAdapter to guard response.create.
    */
   isResponseActive(): boolean {
     return this._active;
   }
+
+  /**
+   * Resets the internal state for the next response
+   */
+  private reset(): void {
+    this._active = false;
+    this.responseResolver = null;
+    this.errorRejecter = null;
+    this.currentResponse = "";
+    this.currentAudioChunks = [];
+  }

As per coding guidelines, **/*.ts: "In TypeScript classes, place public methods first, private methods at the bottom, and group related methods together".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@javascript/src/agents/realtime/realtime-event-handler.ts` around lines 180 -
194, The public method isResponseActive() is currently positioned after the
private method reset(), which violates the class organization guideline
requiring public methods to be grouped before private methods. Move the
isResponseActive() method (which checks if _active is true and includes its
JSDoc comment) above the reset() method so that all public methods appear before
private helper methods in the class structure.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@javascript/src/agents/__tests__/realtime-response-guard.test.ts`:
- Around line 116-117: The test uses fixed 20ms setTimeout delays at lines 116
and 136 to allow async operations to complete, which creates timing-dependent
flakiness. Replace these arbitrary sleep delays with deterministic
synchronization mechanisms such as awaiting actual promises from the code under
test, using jest.runAllTimers() if using fake timers, or checking that mock
functions have been called with expected arguments before proceeding with
assertions instead of relying on wall-clock delays.

In `@javascript/src/agents/realtime/realtime-agent.adapter.ts`:
- Around line 188-193: The guard condition checking `isResponseActive()`
suppresses the `response.create` event, but this can drop new turns when the
in-flight response completes. Instead of suppressing the send, implement
deferred send behavior by storing the pending `response.create` event (similar
to a `_deferredResponseCreate` pattern used in voice adapters) and firing it
when the current response completes via `response.done` or `response.cancelled`
events. Apply this deferred pattern to both occurrences of the
`isResponseActive()` guard at the specified locations to ensure suppressed
creates are replayed rather than dropped.

In
`@javascript/src/voice/adapters/__tests__/openai-realtime-response-guard.test.ts`:
- Around line 148-149: Replace the hard-coded setTimeout delays at three
locations in the openai-realtime-response-guard.test.ts file (where Promise
timeouts of 20ms are used) with deterministic synchronization mechanisms.
Instead of relying on arbitrary delays, use test utilities such as
jest.useFakeTimers() and jest.advanceTimersByTime() to control time
deterministically, or implement event-based synchronization where the test waits
for actual completion signals from the code under test rather than hoping a
fixed delay is sufficient. This will make the tests reproducible and eliminate
race conditions that can cause flaky test failures.

In `@javascript/src/voice/adapters/openai-realtime.ts`:
- Around line 180-189: The session-lifecycle state flags `_responseActive` and
`_deferredResponseCreate` are not being reset during teardown/disconnect,
causing stale state to persist when the adapter instance is reused for a new
connection. Add logic in the teardown or disconnect method to reset both
`_responseActive` and `_deferredResponseCreate` back to their initial false
state to ensure a clean slate for subsequent connections.

In `@python/tests/voice/demo_response_create_guard.py`:
- Around line 233-245: The return statement at the end of the function is
missing the item_ok validation check. The variable item_ok is calculated to
verify that "conversation.item.create" was sent, but it is not included in the
return statement alongside guard_ok and deferred_ok. Update the return statement
to include all three conditions (guard_ok and deferred_ok and item_ok) so that
the function properly validates all required behaviors before reporting success.

---

Outside diff comments:
In `@python/scenario/voice/adapters/openai_realtime.py`:
- Around line 345-357: The disconnect() method closes the WebSocket but does not
reset the _response_active and _deferred_response_create flags, causing stale
lifecycle state to persist when the adapter is reused. After closing the
WebSocket in the finally block of the disconnect() method, reset both
_response_active and _deferred_response_create to their initial values (likely
False and None respectively) to ensure a clean state for subsequent connections.

---

Nitpick comments:
In `@javascript/src/agents/realtime/realtime-event-handler.ts`:
- Around line 180-194: The public method isResponseActive() is currently
positioned after the private method reset(), which violates the class
organization guideline requiring public methods to be grouped before private
methods. Move the isResponseActive() method (which checks if _active is true and
includes its JSDoc comment) above the reset() method so that all public methods
appear before private helper methods in the class structure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 98a7449d-261f-47ab-b347-066c42764439

📥 Commits

Reviewing files that changed from the base of the PR and between 732d426 and 47cc632.

📒 Files selected for processing (8)
  • javascript/src/agents/__tests__/realtime-response-guard.test.ts
  • javascript/src/agents/realtime/realtime-agent.adapter.ts
  • javascript/src/agents/realtime/realtime-event-handler.ts
  • javascript/src/voice/adapters/__tests__/openai-realtime-response-guard.test.ts
  • javascript/src/voice/adapters/openai-realtime.ts
  • python/scenario/voice/adapters/openai_realtime.py
  • python/tests/voice/demo_response_create_guard.py
  • python/tests/voice/test_realtime_response_create_guard.py

Comment thread javascript/src/agents/__tests__/realtime-response-guard.test.ts Outdated
Comment thread javascript/src/agents/realtime/realtime-agent.adapter.ts
Comment thread javascript/src/voice/adapters/__tests__/openai-realtime-response-guard.test.ts Outdated
Comment thread javascript/src/voice/adapters/openai-realtime.ts
Comment thread python/tests/voice/demo_response_create_guard.py
Comment thread python/tests/voice/demo_response_create_guard.py Outdated
Comment thread javascript/src/agents/__tests__/realtime-response-guard.test.ts
Comment thread javascript/src/agents/__tests__/realtime-response-guard.test.ts Outdated
Ubuntu and others added 4 commits June 17, 2026 13:36
…_ok, setTimeout too short

- demo_response_create_guard.py Scenario C: `return guard_ok and deferred_ok` dropped
  `item_ok`, so the demo could print FAIL for item check then return True (false proof).
  Fixed to `return guard_ok and deferred_ok and item_ok`.
- realtime-response-guard.test.ts AC-JS4/JS5: widen 20ms flush wait to 100ms and
  add comment explaining it is a preamble flush, not a timed window. Prevents flake
  on loaded CI runners.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…k flush

The prior commit widened the pre-assert sleep to 100ms, which dead-heated the
adapter's responseTimeout:100 in makeAdapter — on CI the adapter's internal
waitForResponse timeout fired before the test's 100ms wait elapsed and fired
response.done, throwing "Agent response timeout after 100ms" and failing
ci-checks (24.x).

The send decision in handleInitialResponse/handleAudioInput is synchronous,
before the method's first await (waitForResponse). So a microtask flush
(await Promise.resolve()) followed by a prompt response.done is deterministic
and settles well inside the timeout — no wall-clock race. Test runs in 7ms.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…econnect

CodeRabbit flagged that `_responseActive`/`_deferredResponseCreate` (the #662
guard flags) were never cleared in teardown, so a reused adapter instance could
carry stale deferred state into its next connection — suppressing the first turn
after a reconnect.

Fix:
- disconnect(): clear both guard flags on teardown.
- connect(): clear both guard flags AND `_closeReason`. The latter is required —
  `_nextEvent` rejects immediately when `_closeReason` is set, so resetting only
  the guard flags would leave a reconnected adapter unable to receive (cosmetic
  half-fix).

Test: new "resets response-lifecycle state on disconnect and reconnects clean"
asserts the flags clear on disconnect and that a reconnected instance reaches
its own receiveAudio timeout (proving the stale close reason was cleared) rather
than rejecting instantly. Full voice-adapter suite: 33/33 green.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… macrotask yields

CodeRabbit flagged the hard-coded 20/5/10ms delays in the receiveAudio guard
tests as brittle. They are not race-prone — receiveAudio's preamble and
_nextEvent setup are synchronous and server events flow through an ordered
queue, so each delay is a macrotask boundary that drains the loop's microtask
chain regardless of magnitude or runner load. Reduce all three to setTimeout 0
to drop the arbitrary magnitudes while keeping the robust macrotask-drain
semantics. (Not queueMicrotask, per CodeRabbit's suggestion: a single microtask
tick can under-drain a multi-tick continuation; a zero-delay macrotask drains
the whole queue.) Deterministic across repeated runs: 5/5 x3.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@langwatch-agent

Copy link
Copy Markdown
Contributor

🐕 PR Hound review brief — assigned reviewer @rogeriochaves

Review mode: Deep Review
Scariest prod risk: This is a concurrency guard across four call sites in two languages. If a deferred response.create is never fired (flag stuck active) the agent goes silent forever; if it fires twice you get the exact "active response in progress" error this PR exists to prevent. The race window between firing the deferred create and response.created arriving is the crux.

Inspection targets (max 3, each names a file or behavior):

  • [Must Check] javascript/src/voice/adapters/openai-realtime.ts_responseActive is set immediately after firing the deferred create (before response.created arrives) and _deferredResponseCreate is consumed exactly once on response.done/response.cancelled; no path leaves it set-and-never-fired.
  • [Must Check] realtime-event-handler.ts _active/isResponseActive() — set on response.created, cleared on response.done, with no missed response.cancelled/error transition that leaves _active stuck true and wedges all future turns.
  • [Must Check] python/scenario/voice/adapters/openai_realtime.py_agent_turn_pending now cleared before the if/else; confirm both Python guards (send_text and recv_audio) mirror the JS lifecycle and can't defer without a later trigger to fire it.

Author questions (optional, max 3):

  • If response.done never arrives (transport error / dropped socket) after a deferral, is _responseActive/_active ever reset, or does the adapter deadlock on the next turn?
  • The four newly-guarded call sites: is there any path that legitimately needs an unconditional response.create (e.g. a cancel-then-recreate) that this guard now suppresses?

@langwatch-agent langwatch-agent added the review: deep PR Hound review mode label Jul 3, 2026
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Automated low-risk assessment

This PR was evaluated against the repository's Low-Risk Pull Requests procedure and does not qualify as low risk.

The changes modify runtime behavior of the realtime agent adapters in both JavaScript and Python: they add lifecycle flags, defer and re-order sending of response.create, and change connect/disconnect logic. Because this alters integration and messaging with the OpenAI Realtime API (an external third‑party integration) and changes protocol/behavioral logic, it does not meet the low-risk criteria and requires normal review.

This PR requires a manual review before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-reviewed /review was run on this PR (multi-agent: principles, hygiene, test, security) hound-checked Triaged by the pr-hound agent at the current head SHA prove-it-clean All ACs verified by /prove-it at this HEAD review: deep PR Hound review mode slack-requested Slack PR review request posted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAI Realtime JS adapters send response.create unconditionally — same active-response race as #657

3 participants