Conversation
…Manny stub) - configs/dubai-real-estate.json: Sophia luxury concierge, Emirates Premium Properties, aura-2-amalthea-en - configs/hey-saga.json: Saga smart city assistant with Hey Saga hotword, aura-2-arcas-en - configs/deepgram.json: Deepgram tech support, system prompt sourced from prompt_templates.py - configs/hey-manny.json: Filipino BPO champion, en-PH, phone_ui mode, Hey Manny hotword - configs/bpo-tagalog.json: Luna Tagalog BPO agent, language tl
- Mark Phase 1 as Complete in STATE.md - Add 01-SUMMARY.md with verification results and decisions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace static/style.css with minimal overrides; Deepgram design system loaded via CDN
- Rewrite templates/index.html with dg-columns 3-panel layout (sidebar | conversation | event log)
- Demo selector renders dg-card--selectable cards populated from GET /configs
- Builder slide-in panel with full form: name, company, personality, system prompt, greeting, language, voice model, hotword, mode, function toggles
- Builder POSTs to /configs and refreshes card grid without page reload; edit pre-populates form
- Start/stop button uses dg-btn--primary with mic icon; dg-status component shows connection state
- Force dark mode via :root { color-scheme: dark; }; brand green #13ef95 used for agent messages
- All existing SocketIO audio logic preserved verbatim: audio capture, resampling, playback, event handlers
- Font Awesome icons loaded from CDN; vanilla JS only, no frameworks
…d session start - handle_start_voice_agent now accepts config_id (new) or industry (legacy) with fallback - Loads full JSON config via AgentTemplates.load(config_id) for voice model/language defaults - Frontend sends config_id alongside industry for backward compat - Frontend guards: requires selectedConfig before starting session
- Wire config_id from frontend through SocketIO handler to VoiceAgent - Load full JSON config in handler for voice model/language defaults - Frontend guards Start Session when no config selected - All 5 demo configs verified via smoke test - No audio/WebSocket processing logic modified - Phase 4 status updated to Complete in STATE.md
Chrome's autoplay policy suspends AudioContext created outside a user gesture (e.g. inside a SocketIO callback). Move audioOutputContext creation into startAudioCapture() which runs from the Start button click, ensuring it's always created within a trusted user event. Firefox was unaffected due to more lenient autoplay handling. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
With 2 Fly machines, SocketIO was falling back to HTTP polling when requests hit different machines (Invalid session errors). Binary audio chunks can't be encoded in polling payloads, causing: TypeError: can only concatenate str (not "bytes") to str Fix: force transports: ['websocket'] on client, allow_upgrades=False on server. WebSocket connections are long-lived and stick to one machine, eliminating the cross-machine session issue entirely. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- deepgram.json marked default:true, sorts first via load_all() - All English configs language fixed: en-US/en-PH -> en (matches TTS API) - Builder language select now dynamically populated from TTS models - Builder language change listener repopulates voice models correctly - openBuilder() repopulates voices for config language before setting value - Auto-select first config on page load Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace {{agentName}} in systemPrompt/greeting with voice name at init
- Use DefaultEventLoopPolicy().new_event_loop() to avoid eventlet conflict
NOTE: stop/restart thread tracking fix is incomplete - see next commit
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- On stop: close the Deepgram WebSocket before clearing voice_agent, and use call_soon_threadsafe for task cancellation (thread-safe) - On start: track thread in voice_agent_thread global, join previous thread (2s timeout) before starting new one to prevent loop conflicts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Every demo now introduces itself using the selected voice model's name. Also sets Thalia as the default voice for the Deepgram Tech Support demo. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…on call
agentName fix: use caller-provided voiceName/voiceModel for {{agentName}}
substitution instead of the config's default voice model. Previously the
config's voiceModel (e.g. Thalia) was always used regardless of selection.
Hotword: when a config has a 'hotword' field, the agent is put into hotword
mode. check_hotword is added to the functions list and the system prompt
instructs the LLM to call it before every response. The Python implementation
checks if the hotword appears in the transcript and returns either
{active: false} (stay silent) or {active: true, query: "..."} (respond to
the extracted query after the hotword).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
STT transcribes 'Hey Saga' as 'Hey, Saga.' with commas and periods. Use a regex pattern that allows punctuation/whitespace between hotword words so 'hey saga' matches 'Hey, Saga.' or 'Hey Saga!' etc. Also clarify function description to pass only the current utterance, not accumulated conversation history. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Once the hotword fires, subsequent turns pass through check_hotword without needing the hotword again. A 30-second inactivity timeout resets to hotword-only mode. _last_activity_time updates on each turn while the conversation is active, so as long as the user keeps talking the session stays open. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The LLM can now call close_hotword_session when the user signals they're done (thanks, got it, okay, that's all, etc.), resetting _conversation_active to False so the agent returns to silent hotword-only listening immediately rather than waiting for the 30-second inactivity timeout. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- hey-manny.json: add disabled:true (hidden from demo selector, config preserved) - load_all() now filters configs with disabled:true - dubai-real-estate.json: default voice changed from Amalthea to Pandora Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- agent_templates.py: conditional listen provider adds language field for non-English STT (Nova-3 + language="tl" etc) - agent_templates.py: conditional speak provider builds ElevenLabs provider when config has ttsProvider="eleven_labs" - bpo-tagalog.json: add ttsProvider, elevenLabsVoiceId, elevenLabsModel fields Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Feminine voice: G1AxVA91PtrWu96MHgTC Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
voice_id goes in the endpoint URL path, api_key goes in endpoint headers as xi-api-key — not inline in the provider object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ElevenLabs uses 'fil' not 'tl' for Tagalog — sending an unrecognized language_code causes the WebSocket to close unexpectedly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Deepgram's internal ElevenLabsSpeakProvider struct has a voice_id field that must be populated for multi-turn reconnections to work. Without it, voice_id is None and ElevenLabs returns audio:null on the second turn. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…vements - Switch ElevenLabs endpoint from wss://stream-input to wss://multi-stream-input (stream-input doesn't support barge-in; multi-stream-input does) - Remove voice_id from ElevenLabs provider block (Deepgram rejects it when a custom endpoint is set) - Hide language/voice model selects from sidebar (set automatically from config) - Demo cards now show voice name and TTS provider info - Capitalize voice name in card display - Preserve default:true flag when editing configs so default card stays on top - Fix voiceModel/voiceName not updating after config save (re-select with fresh data on loadConfigs so currentVoiceModel/voiceName stay in sync) - Remove Mode field from builder modal (always voice_agent) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Free Tier Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| if voiceModel != "aura-2-thalia-en": | ||
| voice_model = voiceModel | ||
| if language != "en": | ||
| config_language = language |
There was a problem hiding this comment.
Sentinel-based override ignores explicit default voice/language selection
Medium Severity
The override logic uses hardcoded default values ("aura-2-thalia-en" and "en") as sentinels to detect whether the caller explicitly chose a voice model or language. Since handle_start_voice_agent always forwards whatever the frontend sends (which could be exactly these defaults), a user who explicitly selects "aura-2-thalia-en" from the voice dropdown for a config that uses a different model (e.g. "aura-2-arcas-en" in hey-saga) will have their choice silently ignored — the config's model is used instead.
| if path.exists(): | ||
| path.unlink() | ||
| return True | ||
| return False |
There was a problem hiding this comment.
Path traversal in config CRUD allows arbitrary file access
High Severity
The save, load, and delete static methods construct file paths using unsanitized config_id values (e.g., CONFIGS_DIR / f"{config_id}.json"). A config_id containing ../ can escape the configs/ directory. Since POST /configs and DELETE /configs/<config_id> are unauthenticated Flask routes on a publicly deployed Fly.io app, an attacker can write arbitrary JSON files or delete files anywhere the process has permissions.
Additional Locations (1)
| document.getElementById('builder-form').addEventListener('submit', async (e) => { | ||
| e.preventDefault(); | ||
| const form = e.target; | ||
| const data = Object.fromEntries(new FormData(form)); |
There was a problem hiding this comment.
FormData flattening corrupts multi-value functions field to string
Medium Severity
The builder form has multiple checkboxes all sharing name="functions", but the submit handler uses Object.fromEntries(new FormData(form)) which silently drops all but the last value for duplicate keys. This saves "functions": "end_call" (a single string) instead of the expected array, corrupting the config's functions field on every save via the builder form.


Switches the ElevenLabs WebSocket URL from stream-input to multi-stream-input. The old endpoint doesn't support barge-in and causes FailedToSpeak after turn 1 — Deepgram connects once, ElevenLabs closes the socket after the greeting, and Deepgram never reconnects. Also removes voice_id from the provider block (rejected by Deepgram when a custom endpoint is set) and language_code (not supported by eleven_turbo_v2_5).
Note
Medium Risk
Moderate risk: introduces filesystem-backed config CRUD, rewrites agent settings construction (including ElevenLabs endpoint wiring), and adjusts SocketIO/threaded asyncio startup, all of which can affect session startup and runtime behavior.
Overview
Moves demo configuration to JSON and exposes config CRUD.
AgentTemplatesis rewritten to loadconfigs/*.json, build agent Settings dynamically (sorted/filtered configs,default/disabledsupport), andclient.pyaddsGET/POST/DELETE /configsfor managing configs on disk.Session startup now uses config-driven defaults and adds hotword support. The
start_voice_agentSocketIO handler acceptsconfig_id(fallback to legacyindustry), loads the selected config to defaultvoiceModel/voiceName/language, and tweaks SocketIO/threaded asyncio setup (CORS enabled; dedicated event loop policy) to reduce eventlet loop conflicts. Hotword detection is added via new functions incommon/agent_functions.pyand is injected into the agent prompt/functions list when a config specifieshotword.Frontend is redesigned and integrated with the new config API.
templates/index.htmlswitches to Deepgram design system layout, loads configs from/configsinto selectable cards, adds a slide-in builder form that POSTs to/configs, and updates Start Session to emitconfig_idplus browser-audio capture/playback.static/style.cssis reduced to minimal design-system overrides.Deployment/config hygiene updates. Adds multiple demo JSON files (including
ttsProvider: eleven_labsconfig fields), updatesfly.tomlwith concurrency and health checks, and expands.gitignore/.dockerignoreto ignore.claude/and.planning/while keepingfly.tomlin the Docker context.Written by Cursor Bugbot for commit ea09850. This will update automatically on new commits. Configure here.