Follow-up from #654.
The session config the control plane sends to OpenAI (createOpenAIRealtimeCall in control-plane/internal/handlers/sessions.go) is currently minimal: type, model, instructions, audio.output.voice, tool_choice. There's no way for a session author to configure turn detection / server-side VAD / interruption (barge-in), which are table-stakes for a usable voice UX.
Why it matters
- Without server VAD + turn detection config, the app can't tune when the model considers a turn complete, silence thresholds, or whether the user can interrupt the model mid-utterance. These are the difference between a demo and something people will actually talk to.
Acceptance criteria (behavior)
@app.session(...) (and the TS/Go equivalents) accept turn-detection / VAD options (e.g. turn_detection type, threshold, silence duration, create_response/interrupt_response) and the control plane forwards them into the OpenAI realtime session config.
- Defaults produce a sane interruptible voice experience (barge-in works out of the box).
- Validation rejects unsupported combinations explicitly (consistent with the existing provider/transport validation philosophy — no silent inference).
Ref: OpenAI realtime session turn_detection config.
Follow-up from #654.
The session config the control plane sends to OpenAI (
createOpenAIRealtimeCallincontrol-plane/internal/handlers/sessions.go) is currently minimal:type,model,instructions,audio.output.voice,tool_choice. There's no way for a session author to configure turn detection / server-side VAD / interruption (barge-in), which are table-stakes for a usable voice UX.Why it matters
Acceptance criteria (behavior)
@app.session(...)(and the TS/Go equivalents) accept turn-detection / VAD options (e.g.turn_detectiontype, threshold, silence duration,create_response/interrupt_response) and the control plane forwards them into the OpenAI realtime session config.Ref: OpenAI realtime session
turn_detectionconfig.