v1.9.2#252
Merged
Merged
Conversation
Update Translations
Update TTS Cast pluginVersion from 1.9.1 to 1.9.2 and bump google-genai in requirements.txt from 2.3.0 to 2.4.0. This advances the plugin release and brings a newer google-genai dependency for fixes/compatibility.
Introduce streaming mode for Gemini TTS to reduce perceived latency and stream audio to Chromecast devices. Changes include: - Add streamingDefault config (UI checkbox, install/update defaults, CLI arg) to enable streaming by default. - Daemon: support streaming path in getTTS — prefetch first chunk, create a named pipe (mkfifo) and spawn geminiTTSStream writer thread; cast uses a live stream_type when appropriate. - Add geminiTTS(streaming=True) path that returns a stream iterator and first chunk for format detection; add geminiTTSStream to write PCM chunks to the pipe and optionally cache a WAV file. - Audio proxy: add handling for type=stream that validates UUID.l16 names, serves audio/L16 (24k/mono), streams pipe contents and cleans up the pipe. - castToGoogleHome: accept streamType param and propagate to media metadata. - Utils and settings updates: new temporary stream folder, VSCode settings and mime entries updated. Includes error handling, non-blocking pipe writes with retry, and logging for debugging.
PHP: Replace hardcoded audio/L16 Content-Type with validated rate and channels from query params and remove the static 'l16' MIME entry. This ensures the proxy emits correct L16 headers (allowed rates: 8000,16000,22050,24000,44100,48000; channels: 1 or 2) and avoids malformed responses. Python: Include sampleRate and channels in the generated pipe URL and pass them into the TTS streaming thread. Change geminiTTS result checks from "is not None" to "isinstance(..., bytes)" before writing files to ensure only binary audio is written. These changes improve correctness when streaming different L16 configurations and prevent non-bytes values from being treated as audio.
Pass the new ttsTestStreaming flag from the PHP UI to the daemon and add a UI checkbox + new Gemini 3.5 Flash model option. Update daemon socket handling to accept ttsStreaming, extend generateTestTTS signature, and implement a streaming path that creates a FIFO pipe and casts live audio (geminiTTS stream) to devices; keep fallback to file-based generation. Also add http_options timeout to genai.Client creation calls to increase request timeouts and improve reliability.
Move the "Style de voix (Gemini TTS)" text input in plugin_info/configuration.php so it appears after the "Tester avec le mode streaming" option. This is a UI reorder only—the field, its data key (ttsTestGeminiStyle), placeholder and tooltip are unchanged; no functional logic was modified.
Propagate and retain the Gemini TTS stream client to prevent premature socket/HTTPX closure. Prefetch now returns an additional streamClient value; callers unpack it and pass it into geminiTTSStream. geminiTTSStream signature gains a trailing _client parameter (kept alive for httpx sockets), and threading invocations were updated to include this argument. No other functional changes.
Harden the PHP proxy and make streaming more robust and observable. Adds core include, strict filename/UUID regex validation for TTS/stream/sounds, and extensive logging for invalid params, missing pipes, open/read errors and stream lifecycle (start/bytes sent/finished). Replaces previous IP-based restriction with validation-based protection. Implements real-time chunked streaming by disabling PHP output buffers, reading the pipe in loops, setting Content-Encoding: identity to avoid compression, and cleaning up the pipe after use. Also logs and returns proper HTTP codes on errors. In the daemon, catch BrokenPipeError during Gemini TTS streaming, log a warning and return None to handle client disconnects gracefully.
Switch the HTTP stream Content-Type to audio/wav and prepend a WAV RIFF header for Gemini TTS streaming so clients correctly interpret 16-bit little-endian PCM. audio/L16 (RFC 2586) expects big-endian, but Gemini returns PCM LE; using WAV avoids endian mismatch. The daemon builds a minimal WAV header (channels, sampwidth=2, framerate=sampleRate) with the wave module, patches the RIFF and data chunk sizes to 0x7FFFFFFF as placeholders for an unknown/ongoing stream, and writes the header to the pipe before sending audio chunks. Modified files: core/php/ttscast.audio.proxy.php and resources/ttscastd/ttscastd.py.
Replace dynamic 'audio/L16;rate=...;channels=...' MIME strings with a fixed 'audio/wav' MIME type because the proxy delivers a WAV RIFF stream (PCM LE 16-bit). This change updates both the TestTTS prefetch path and the main TTS streaming path in resources/ttscastd/ttscastd.py so the stream is handled with the correct format.
Send HTTP headers and flush output before opening the FIFO to avoid Chromecast timeouts and start streaming immediately. Enable implicit flush and remove chunked transfer header so PCM data isn't buffered/compressed. Add debug logging around pipe open and streaming. Also reduce the ttscastd pipe-writer retry sleep from 50ms to 5ms to decrease startup latency before the first write.
Enable LINEAR16 options in the configuration UI and make TTS generation respect the gCloudAudioEncoding and gCloudSampleRate settings. Updated plugin_info/configuration.php to allow selecting LINEAR16 (24k/48k). Updated resources/ttscastd/ttscastd.py to choose file extension and MIME type based on gCloudAudioEncoding, and to use myConfig.gCloudSampleRate (instead of hardcoded 48000) for Google Cloud TTS AudioConfig and WAV file generation. This re-enables WAV output and allows configurable sample rates for LINEAR16 audio.
Insert info-level timing logs into resources/ttscastd/ttscastd.py to help measure streaming latency and caching. Adds t0_start before calling Gemini streaming TTS, t1_castStart after spawning the streaming thread and before casting to Google Home, and t2_cacheWritten after the streamed WAV is written to cache. These timestamps aid debugging and performance analysis of the Gemini TTS streaming path.
Ensure the FIFO opened for GeminiTTSStream writes is set to blocking mode by calling os.set_blocking(fd, True). This restores blocking behavior after the descriptor was opened non-blocking so writes will block if the pipe buffer is full, providing proper backpressure during streaming. Includes a type-ignore and POSIX/Debian note in the source comment.
PHP: Only enable the ttsTestStreaming flag when Gemini test is selected (ttsTestGemini == '1'), forcing it to '0' otherwise to prevent streaming being offered for non-Gemini tests. Python: Improve Gemini streaming timing logs in ttscastd.py by capturing a single timestamp (_t) and logging both the epoch float and a human-readable HH:MM:SS.mmm representation for t0_start, t1_castStart and t2_cacheWritten. This provides more precise and consistent timing info for debugging streaming flows.
Remove the previous global "Streaming TTS par défaut" form group and add a Gemini-specific "Streaming Gemini TTS par défaut" form group under the "IA & TTS - Gemini" section. The new block uses the same config key (data-l1key="streamingDefault") and retains the existing tooltips about daemon restart and streaming behavior, grouping the streaming option with Gemini TTS settings.
Add a warning in TTSCast when the 'streaming' option is requested but the active TTS engine is not 'geminitts' and myConfig.streamingDefault is false. This logs a clear message after sending plain-text TTS results to inform callers that streaming mode is only supported by the Gemini TTS engine and that they should remove the 'streaming:true' option from the TTS call.
Move the streamingDefault config into the Gemini TTS section and add a clarifying comment ('Gemini TTS uniquement — ignoré pour les autres moteurs'). This consolidates the Gemini-specific setting with related options and clarifies its scope; no functional change intended beyond organization and documentation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces support for streaming Gemini TTS (Text-to-Speech) with low-latency playback and improves configuration options, security, and compatibility. The most significant changes are the addition of a streaming mode for Gemini TTS, enhancements to the audio proxy for secure streaming, and updates to the configuration UI and installation scripts.
Gemini TTS Streaming Support:
streamingDefault), test options (ttsTestStreaming), and the necessary backend logic to handle streaming requests. [1] [2] [3] [4] [5] [6] [7]Audio Proxy Security and Streaming Implementation:
ttscast.audio.proxy.phpto:Configuration and UI Improvements:
Dependency and Internal Updates:
google-genaiPython package to version 2.4.0 for improved compatibility..vscode/settings.jsonand updating the plugin version to 1.9.2. [1] [2] [3] [4] [5] [6] [7]These changes enable faster, more flexible TTS playback and improve the maintainability and security of the plugin.