Fix/elevenlabs multi stream input by Jacob-Lasky · Pull Request #25 · deepgram-devs/flask-agent-function-calling-demo

Jacob-Lasky · 2026-03-02T21:49:59Z

Switches the ElevenLabs WebSocket URL from stream-input to multi-stream-input. The old endpoint doesn't support barge-in and causes FailedToSpeak after turn 1 — Deepgram connects once, ElevenLabs closes the socket after the greeting, and Deepgram never reconnects. Also removes voice_id from the provider block (rejected by Deepgram when a custom endpoint is set) and language_code (not supported by eleven_turbo_v2_5).

Note

Medium Risk
Moderate risk: introduces filesystem-backed config CRUD, rewrites agent settings construction (including ElevenLabs endpoint wiring), and adjusts SocketIO/threaded asyncio startup, all of which can affect session startup and runtime behavior.

Overview
Moves demo configuration to JSON and exposes config CRUD. AgentTemplates is rewritten to load configs/*.json, build agent Settings dynamically (sorted/filtered configs, default/disabled support), and client.py adds GET/POST/DELETE /configs for managing configs on disk.

Session startup now uses config-driven defaults and adds hotword support. The start_voice_agent SocketIO handler accepts config_id (fallback to legacy industry), loads the selected config to default voiceModel/voiceName/language, and tweaks SocketIO/threaded asyncio setup (CORS enabled; dedicated event loop policy) to reduce eventlet loop conflicts. Hotword detection is added via new functions in common/agent_functions.py and is injected into the agent prompt/functions list when a config specifies hotword.

Frontend is redesigned and integrated with the new config API. templates/index.html switches to Deepgram design system layout, loads configs from /configs into selectable cards, adds a slide-in builder form that POSTs to /configs, and updates Start Session to emit config_id plus browser-audio capture/playback. static/style.css is reduced to minimal design-system overrides.

Deployment/config hygiene updates. Adds multiple demo JSON files (including ttsProvider: eleven_labs config fields), updates fly.toml with concurrency and health checks, and expands .gitignore/.dockerignore to ignore .claude/ and .planning/ while keeping fly.toml in the Docker context.

^{Written by Cursor Bugbot for commit ea09850. This will update automatically on new commits. Configure here.}

…Manny stub) - configs/dubai-real-estate.json: Sophia luxury concierge, Emirates Premium Properties, aura-2-amalthea-en - configs/hey-saga.json: Saga smart city assistant with Hey Saga hotword, aura-2-arcas-en - configs/deepgram.json: Deepgram tech support, system prompt sourced from prompt_templates.py - configs/hey-manny.json: Filipino BPO champion, en-PH, phone_ui mode, Hey Manny hotword - configs/bpo-tagalog.json: Luna Tagalog BPO agent, language tl

- Mark Phase 1 as Complete in STATE.md - Add 01-SUMMARY.md with verification results and decisions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Replace static/style.css with minimal overrides; Deepgram design system loaded via CDN - Rewrite templates/index.html with dg-columns 3-panel layout (sidebar | conversation | event log) - Demo selector renders dg-card--selectable cards populated from GET /configs - Builder slide-in panel with full form: name, company, personality, system prompt, greeting, language, voice model, hotword, mode, function toggles - Builder POSTs to /configs and refreshes card grid without page reload; edit pre-populates form - Start/stop button uses dg-btn--primary with mic icon; dg-status component shows connection state - Force dark mode via :root { color-scheme: dark; }; brand green #13ef95 used for agent messages - All existing SocketIO audio logic preserved verbatim: audio capture, resampling, playback, event handlers - Font Awesome icons loaded from CDN; vanilla JS only, no frameworks

…d session start - handle_start_voice_agent now accepts config_id (new) or industry (legacy) with fallback - Loads full JSON config via AgentTemplates.load(config_id) for voice model/language defaults - Frontend sends config_id alongside industry for backward compat - Frontend guards: requires selectedConfig before starting session

- Wire config_id from frontend through SocketIO handler to VoiceAgent - Load full JSON config in handler for voice model/language defaults - Frontend guards Start Session when no config selected - All 5 demo configs verified via smoke test - No audio/WebSocket processing logic modified - Phase 4 status updated to Complete in STATE.md

Chrome's autoplay policy suspends AudioContext created outside a user gesture (e.g. inside a SocketIO callback). Move audioOutputContext creation into startAudioCapture() which runs from the Start button click, ensuring it's always created within a trusted user event. Firefox was unaffected due to more lenient autoplay handling. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

With 2 Fly machines, SocketIO was falling back to HTTP polling when requests hit different machines (Invalid session errors). Binary audio chunks can't be encoded in polling payloads, causing: TypeError: can only concatenate str (not "bytes") to str Fix: force transports: ['websocket'] on client, allow_upgrades=False on server. WebSocket connections are long-lived and stick to one machine, eliminating the cross-machine session issue entirely. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- deepgram.json marked default:true, sorts first via load_all() - All English configs language fixed: en-US/en-PH -> en (matches TTS API) - Builder language select now dynamically populated from TTS models - Builder language change listener repopulates voice models correctly - openBuilder() repopulates voices for config language before setting value - Auto-select first config on page load Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Replace {{agentName}} in systemPrompt/greeting with voice name at init - Use DefaultEventLoopPolicy().new_event_loop() to avoid eventlet conflict NOTE: stop/restart thread tracking fix is incomplete - see next commit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- On stop: close the Deepgram WebSocket before clearing voice_agent, and use call_soon_threadsafe for task cancellation (thread-safe) - On start: track thread in voice_agent_thread global, join previous thread (2s timeout) before starting new one to prevent loop conflicts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Every demo now introduces itself using the selected voice model's name. Also sets Thalia as the default voice for the Deepgram Tech Support demo. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…on call agentName fix: use caller-provided voiceName/voiceModel for {{agentName}} substitution instead of the config's default voice model. Previously the config's voiceModel (e.g. Thalia) was always used regardless of selection. Hotword: when a config has a 'hotword' field, the agent is put into hotword mode. check_hotword is added to the functions list and the system prompt instructs the LLM to call it before every response. The Python implementation checks if the hotword appears in the transcript and returns either {active: false} (stay silent) or {active: true, query: "..."} (respond to the extracted query after the hotword). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

STT transcribes 'Hey Saga' as 'Hey, Saga.' with commas and periods. Use a regex pattern that allows punctuation/whitespace between hotword words so 'hey saga' matches 'Hey, Saga.' or 'Hey Saga!' etc. Also clarify function description to pass only the current utterance, not accumulated conversation history. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Once the hotword fires, subsequent turns pass through check_hotword without needing the hotword again. A 30-second inactivity timeout resets to hotword-only mode. _last_activity_time updates on each turn while the conversation is active, so as long as the user keeps talking the session stays open. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The LLM can now call close_hotword_session when the user signals they're done (thanks, got it, okay, that's all, etc.), resetting _conversation_active to False so the agent returns to silent hotword-only listening immediately rather than waiting for the 30-second inactivity timeout. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- hey-manny.json: add disabled:true (hidden from demo selector, config preserved) - load_all() now filters configs with disabled:true - dubai-real-estate.json: default voice changed from Amalthea to Pandora Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- agent_templates.py: conditional listen provider adds language field for non-English STT (Nova-3 + language="tl" etc) - agent_templates.py: conditional speak provider builds ElevenLabs provider when config has ttsProvider="eleven_labs" - bpo-tagalog.json: add ttsProvider, elevenLabsVoiceId, elevenLabsModel fields Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Feminine voice: G1AxVA91PtrWu96MHgTC Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

voice_id goes in the endpoint URL path, api_key goes in endpoint headers as xi-api-key — not inline in the provider object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ElevenLabs uses 'fil' not 'tl' for Tagalog — sending an unrecognized language_code causes the WebSocket to close unexpectedly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Deepgram's internal ElevenLabsSpeakProvider struct has a voice_id field that must be populated for multi-turn reconnections to work. Without it, voice_id is None and ElevenLabs returns audio:null on the second turn. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…vements - Switch ElevenLabs endpoint from wss://stream-input to wss://multi-stream-input (stream-input doesn't support barge-in; multi-stream-input does) - Remove voice_id from ElevenLabs provider block (Deepgram rejects it when a custom endpoint is set) - Hide language/voice model selects from sidebar (set automatically from config) - Demo cards now show voice name and TTS provider info - Capitalize voice name in card display - Preserve default:true flag when editing configs so default card stays on top - Fix voiceModel/voiceName not updating after config save (re-select with fresh data on loadConfigs so currentVoiceModel/voiceName stay in sync) - Remove Mode field from builder modal (always voice_agent) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Free Tier Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-02T22:12:07Z

common/agent_templates.py

+        if voiceModel != "aura-2-thalia-en":
+            voice_model = voiceModel
+        if language != "en":
+            config_language = language


Sentinel-based override ignores explicit default voice/language selection

Medium Severity

The override logic uses hardcoded default values ("aura-2-thalia-en" and "en") as sentinels to detect whether the caller explicitly chose a voice model or language. Since handle_start_voice_agent always forwards whatever the frontend sends (which could be exactly these defaults), a user who explicitly selects "aura-2-thalia-en" from the voice dropdown for a config that uses a different model (e.g. "aura-2-arcas-en" in hey-saga) will have their choice silently ignored — the config's model is used instead.

cursor · 2026-03-02T22:12:07Z

common/agent_templates.py

+        if path.exists():
+            path.unlink()
+            return True
+        return False


Path traversal in config CRUD allows arbitrary file access

High Severity

The save, load, and delete static methods construct file paths using unsanitized config_id values (e.g., CONFIGS_DIR / f"{config_id}.json"). A config_id containing ../ can escape the configs/ directory. Since POST /configs and DELETE /configs/<config_id> are unauthenticated Flask routes on a publicly deployed Fly.io app, an attacker can write arbitrary JSON files or delete files anywhere the process has permissions.

Additional Locations (1)

client.py#L597-L614

cursor · 2026-03-02T22:12:07Z

templates/index.html

+    document.getElementById('builder-form').addEventListener('submit', async (e) => {
+      e.preventDefault();
+      const form = e.target;
+      const data = Object.fromEntries(new FormData(form));


FormData flattening corrupts multi-value functions field to string

Medium Severity

The builder form has multiple checkboxes all sharing name="functions", but the submit handler uses Object.fromEntries(new FormData(form)) which silently drops all but the last value for duplicate keys. This saves "functions": "end_call" (a single string) instead of the expected array, corrupting the config's functions field on every save via the builder form.

Additional Locations (1)

templates/index.html#L819-L823

Jake Lasky and others added 25 commits February 26, 2026 09:53

docs: mark Phase 3 (Demo JSON Configs) as Complete in STATE.md

5d60904

docs(01-01): complete Phase 1 Backend JSON Config System plan

b828748

- Mark Phase 1 as Complete in STATE.md - Add 01-SUMMARY.md with verification results and decisions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Use {{agentName}} in all config greetings and system prompts

51f0f60

Every demo now introduces itself using the selected voice model's name. Also sets Thalia as the default voice for the Deepgram Tech Support demo. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Set ElevenLabs Tagalog voice ID for bpo-tagalog config

2639b0d

Feminine voice: G1AxVA91PtrWu96MHgTC Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix ElevenLabs speak provider structure per Deepgram VA API spec

133126b

voice_id goes in the endpoint URL path, api_key goes in endpoint headers as xi-api-key — not inline in the provider object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Remove language_code from ElevenLabs provider; let voice auto-detect

a45d645

ElevenLabs uses 'fil' not 'tl' for Tagalog — sending an unrecognized language_code causes the WebSocket to close unexpectedly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add model_id as URL query param for ElevenLabs reconnections

c02590e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: ignore .claude/ and .planning/ directories

ea09850

cursor bot reviewed Mar 2, 2026

View reviewed changes

jeniya-tabassum approved these changes Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/elevenlabs multi stream input#25

Fix/elevenlabs multi stream input#25
Jacob-Lasky wants to merge 25 commits intomainfrom
fix/elevenlabs-multi-stream-input

Jacob-Lasky commented Mar 2, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 2, 2026

Uh oh!

cursor bot Mar 2, 2026

Uh oh!

cursor bot Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jacob-Lasky commented Mar 2, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 2, 2026

Choose a reason for hiding this comment

Sentinel-based override ignores explicit default voice/language selection

Uh oh!

cursor bot Mar 2, 2026

Choose a reason for hiding this comment

Path traversal in config CRUD allows arbitrary file access

Uh oh!

cursor bot Mar 2, 2026

Choose a reason for hiding this comment

FormData flattening corrupts multi-value functions field to string

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jacob-Lasky commented Mar 2, 2026 •

edited by cursor bot

Loading