-
Notifications
You must be signed in to change notification settings - Fork 5
feat: add on-device TTS using Supertonic #335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds a cross-platform on-device TTS feature: Rust/Tauri backend (ONNX pipeline + Tauri commands), React TTS context + hook, UI for model download/management and per-message playback, build targets and AppImage media bundling, and a TTS research doc. Changes
Sequence Diagram(s)sequenceDiagram
participant UI as TTSDownloadDialog
participant React as TTS Context\n(useTTS)
participant Tauri as Tauri IPC
participant Rust as TTS Backend\n(tts.rs)
participant FS as File System
UI->>React: startDownload()
React->>Tauri: invoke tts_download_models()
Tauri->>Rust: download request
Rust->>FS: stream & save model files
loop progress events
Rust-->>Tauri: emit download progress
Tauri-->>React: tts-download-progress event
React-->>UI: update progress UI
end
Rust-->>Tauri: download complete
React->>Tauri: invoke tts_load_models()
Tauri->>Rust: load ONNX sessions / init engine
Rust-->>Tauri: load complete
Tauri-->>React: ready
React-->>UI: status = ready
sequenceDiagram
participant User as User
participant UI as TTSButton
participant React as TTS Context
participant Tauri as Tauri IPC
participant Rust as TTS Backend
participant Audio as Browser Audio
User->>UI: click play(text)
UI->>React: speak(text, id)
React->>Tauri: invoke tts_synthesize(text)
Tauri->>Rust: synthesize request
Rust->>Rust: preprocess → duration → encode → denoise → vocoder
Rust-->>Tauri: TTSSynthesizeResponse (audio_base64)
Tauri-->>React: deliver response
React->>Audio: create Blob/URL and play
Audio-->>React: onended
React-->>UI: update playing state
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
Comment |
Deploying maple with
|
| Latest commit: |
8004fa1
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://7230bc57.maple-ca8.pages.dev |
| Branch Preview URL: | https://feature-tts-supertonic.maple-ca8.pages.dev |
Greptile SummaryThis PR adds on-device text-to-speech functionality using Supertonic, a fast ONNX-based TTS engine. The implementation downloads ~265MB of models from HuggingFace on first use and performs all synthesis locally on the desktop app. Key Changes:
Previous Review Feedback Addressed:
Minor Issues:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant UI as React UI
participant TTSContext
participant Tauri as Tauri Bridge
participant Rust as Rust Backend
participant HF as HuggingFace
participant ONNX as ONNX Runtime
Note over User,ONNX: Initial Setup Flow
User->>UI: Opens TTS Dialog
UI->>TTSContext: checkStatus()
TTSContext->>Tauri: invoke("tts_get_status")
Tauri->>Rust: tts_get_status()
Rust->>Rust: Check if models exist
Rust-->>Tauri: TTSStatusResponse
Tauri-->>TTSContext: {models_downloaded, models_loaded}
TTSContext-->>UI: Update status
alt Models not downloaded
User->>UI: Click Download
UI->>TTSContext: startDownload()
TTSContext->>Tauri: invoke("tts_download_models")
Tauri->>Rust: tts_download_models()
loop For each model file
Rust->>HF: HTTP GET model file
HF-->>Rust: Stream chunks
Rust->>Rust: Write to temp file
Rust->>Tauri: emit("tts-download-progress")
Tauri-->>TTSContext: Progress event
TTSContext-->>UI: Update progress bar
end
Rust->>Rust: Flush and rename temp files
Rust-->>Tauri: Success
end
Note over User,ONNX: Model Loading Flow
TTSContext->>Tauri: invoke("tts_load_models")
Tauri->>Rust: tts_load_models()
Rust->>Rust: Load tts.json config
Rust->>Rust: Load unicode_indexer.json
Rust->>ONNX: Load duration_predictor.onnx
Rust->>ONNX: Load text_encoder.onnx
Rust->>ONNX: Load vector_estimator.onnx
Rust->>ONNX: Load vocoder.onnx
Rust->>Rust: Load F2.json voice style
Rust->>Rust: Store in TTSState
Rust-->>Tauri: Success
Tauri-->>TTSContext: Models loaded
TTSContext-->>UI: status = "ready"
Note over User,ONNX: Text-to-Speech Synthesis
User->>UI: Click speaker icon
UI->>TTSContext: speak(text, messageId)
TTSContext->>TTSContext: preprocessTextForTTS()
TTSContext->>Tauri: invoke("tts_synthesize", {text})
Tauri->>Rust: tts_synthesize(text)
Rust->>Rust: chunk_text(text, 300)
loop For each chunk
Rust->>Rust: preprocess_text(chunk)
Rust->>Rust: UnicodeProcessor.call()
Rust->>ONNX: Run duration_predictor
ONNX-->>Rust: Duration predictions
Rust->>ONNX: Run text_encoder
ONNX-->>Rust: Text embeddings
Rust->>Rust: sample_noisy_latent()
loop 10 denoising steps
Rust->>ONNX: Run vector_estimator
ONNX-->>Rust: Denoised latent
end
Rust->>ONNX: Run vocoder
ONNX-->>Rust: Audio waveform
Rust->>Rust: Concatenate with silence
end
Rust->>Rust: wav_to_base64()
Rust-->>Tauri: TTSSynthesizeResponse
Tauri-->>TTSContext: {audio_base64, sample_rate}
TTSContext->>TTSContext: base64ToBlob()
TTSContext->>TTSContext: Create AudioContext
TTSContext->>TTSContext: Decode audio buffer
TTSContext->>User: Play audio via Web Audio API
TTSContext-->>UI: isPlaying = true
Note over User,ONNX: Stop/Cleanup
User->>UI: Click stop or unmount
UI->>TTSContext: stop()
TTSContext->>TTSContext: sourceNode.stop()
TTSContext->>TTSContext: audioContext.close()
TTSContext->>TTSContext: URL.revokeObjectURL()
TTSContext-->>UI: isPlaying = false
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 2 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
frontend/src/components/UnifiedChat.tsx (1)
179-233: TTSButton implementation looks solid with proper environment check and state handling.The component correctly:
- Returns null when not in Tauri environment (line 193-195)
- Handles different TTS states appropriately
- Provides proper aria-labels for accessibility
However, there's a potential issue with the async
handleClickfunction:The
speakcall on line 207 is awaited but errors are not handled. Consider adding error handling:const handleClick = async () => { if (status === "not_downloaded" || status === "error") { onNeedsSetup(); return; } if (status === "ready") { if (isThisPlaying) { stop(); } else { - await speak(text, messageId); + try { + await speak(text, messageId); + } catch (error) { + console.error("TTS playback failed:", error); + } } } };frontend/src-tauri/src/tts.rs (3)
115-219: Consider compiling regexes once to improve performance.The
preprocess_textfunction creates multipleRegexobjects on every call. Since this function is called for each text chunk during synthesis, this causes repeated compilation overhead.Consider using
lazy_staticoronce_cell::sync::Lazyto compile regexes once:use once_cell::sync::Lazy; static EMOJI_PATTERN: Lazy<Regex> = Lazy::new(|| { Regex::new(r"[\x{1F600}-\x{1F64F}...]").unwrap() }); static DIACRITICS_PATTERN: Lazy<Regex> = Lazy::new(|| { Regex::new(r"[\u{0302}...]").unwrap() }); // ... other regexes fn preprocess_text(text: &str) -> String { let mut text: String = text.nfkd().collect(); text = EMOJI_PATTERN.replace_all(&text, "").to_string(); // ... }This is especially important since
chunk_textmay split long messages into many chunks, each processed separately.
734-771: Mutex held during CPU-intensive synthesis may cause contention.The mutex guard is held at line 739 and only released at line 759, after the potentially long-running
synthesize()call completes. For longer texts, ONNX inference across multiple chunks could take several seconds, blocking all other TTS commands.Consider whether the ONNX sessions truly need mutable access. If
Session::runonly requires&self, you could restructure to minimize lock duration:// Clone what's needed, release lock quickly let (tts_data, style) = { let guard = state.lock().map_err(|e| e.to_string())?; let style = guard.style.as_ref().ok_or("Voice style not loaded")?.clone(); // If possible, clone or Arc the session references // ... }; // Synthesize outside the lock let audio = /* ... */;Alternatively, if mutable access is truly required by ONNX Runtime, this is acceptable for now but document the limitation. For production, consider using
tokio::task::spawn_blockingto avoid blocking the async runtime.
536-566: Consider configuring ONNX session options for better performance.The ONNX sessions are created with default settings. For a better user experience, you might want to configure the execution:
use ort::session::SessionBuilder; let dp_ort = SessionBuilder::new()? .with_intra_threads(4)? // Limit thread usage .commit_from_file(models_dir.join("duration_predictor.onnx"))?;This is optional but could help with:
- Controlling CPU usage during inference
- Potentially enabling GPU acceleration in the future
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
frontend/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
frontend/src-tauri/Cargo.toml(1 hunks)frontend/src-tauri/src/lib.rs(4 hunks)frontend/src-tauri/src/pdf_extractor.rs(1 hunks)frontend/src-tauri/src/tts.rs(1 hunks)frontend/src/app.tsx(2 hunks)frontend/src/components/TTSDownloadDialog.tsx(1 hunks)frontend/src/components/UnifiedChat.tsx(8 hunks)frontend/src/services/tts/TTSContext.tsx(1 hunks)frontend/src/services/tts/index.ts(1 hunks)justfile(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx,js,jsx}: Use 2-space indentation, double quotes, and a 100-character line limit for formatting
Use camelCase for variable and function names
Use try/catch with specific error types for error handling
Files:
frontend/src/services/tts/index.tsfrontend/src/components/TTSDownloadDialog.tsxfrontend/src/app.tsxfrontend/src/services/tts/TTSContext.tsxfrontend/src/components/UnifiedChat.tsx
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use strict TypeScript typing and avoid
anywhen possible
Files:
frontend/src/services/tts/index.tsfrontend/src/components/TTSDownloadDialog.tsxfrontend/src/app.tsxfrontend/src/services/tts/TTSContext.tsxfrontend/src/components/UnifiedChat.tsx
🧠 Learnings (1)
📚 Learning: 2025-07-19T21:31:44.925Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-19T21:31:44.925Z
Learning: Applies to src/**/*.{tsx} : Use React context for global state management when needed
Applied to files:
frontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (4)
frontend/src/components/TTSDownloadDialog.tsx (1)
frontend/src/services/tts/TTSContext.tsx (1)
useTTS(251-257)
frontend/src/app.tsx (5)
frontend/src/services/tts/index.ts (1)
TTSProvider(1-1)frontend/src/components/BillingServiceProvider.tsx (1)
BillingServiceProvider(5-14)frontend/src/components/ProxyEventListener.tsx (1)
ProxyEventListener(7-58)frontend/src/components/UpdateEventListener.tsx (1)
UpdateEventListener(11-65)frontend/src/components/DeepLinkHandler.tsx (1)
DeepLinkHandler(9-109)
frontend/src/services/tts/TTSContext.tsx (1)
frontend/src/services/tts/index.ts (3)
TTSStatus(2-2)TTSProvider(1-1)useTTS(1-1)
frontend/src-tauri/src/lib.rs (1)
frontend/src-tauri/src/tts.rs (7)
new(86-88)new(341-346)tts_get_status(602-620)tts_download_models(631-704)tts_load_models(707-725)tts_synthesize(735-771)tts_unload_models(774-780)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build-ios
- GitHub Check: build-linux
- GitHub Check: build-android
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (16)
justfile (1)
46-57: Desktop debug build targets look consistent and usefulThe new debug and debug-no-cc recipes mirror the existing release targets and should help diagnose desktop issues without impacting current workflows.
frontend/src/services/tts/TTSContext.tsx (1)
60-72: Overall TTS provider shape and status lifecycle look solidThe context cleanly encapsulates TTS state (status, errors, progress, playback) and exposes a small surface (checkStatus, startDownload, speak, stop). Checking
isTauri()once, defaulting to"not_available"off-Tauri, and auto-checking status on mount is consistent with the existing platform utilities and React-context usage.frontend/src/app.tsx (1)
17-17: TTSProvider placement in the provider tree looks appropriateWrapping
BillingServiceProvider(and thusProxyEventListener,UpdateEventListener,DeepLinkHandler, andInnerApp) withTTSProviderensures all consumers ofuseTTSsit under the context without altering existing provider ordering. This is a good integration point.Also applies to: 102-109
frontend/src-tauri/Cargo.toml (1)
42-53: TTS dependency set looks coherent; please verify cross-platform builds and versionsThe added crates (ONNX Runtime via
ort,ndarraywithrayon,reqwestwith streaming,dirs, etc.) are consistent with a local Supertonic TTS pipeline. Since they’re in the shared[dependencies]section (not cfg-gated), they’ll be compiled for all targets, including Android/iOS.Please verify:
- That
ort = "2.0.0-rc.7"and the other new crates build cleanly for your mobile targets.- That there are no unexpected size or licensing issues introduced by this set.
Consider adding cfg-gating or feature flags later if you decide TTS should remain desktop-only.
frontend/src-tauri/src/pdf_extractor.rs (1)
19-27: PDF extractor changes are formatting-only and safeThe adjustments here are purely stylistic (parameter layout and whitespace) and do not affect decoding, file-type branching, or response construction.
Also applies to: 33-38, 43-51
frontend/src/services/tts/index.ts (1)
1-2: Barrel re-exports for TTS are clean and usefulRe-exporting
TTSProvider,useTTS, andTTSStatusfrom the TTS module simplifies imports for consumers and keeps the public surface centralized.frontend/src-tauri/src/lib.rs (2)
5-8: TTS state management and command wiring on desktop look correctRegistering
mod tts;, managingtts::TTSState::new(), and adding the TTS commands to the desktopinvoke_handlercleanly integrates the backend TTS pipeline. Scoping these commands to the desktop builder matches the “desktop-only TTS” goal.Also applies to: 39-54
387-399: Update-ready event emission and logging structure are clearThe
app_handle.emit("update-ready", UpdateReadyPayload { ... })block, along with the success/failure logging, is straightforward and keeps the updater behavior unchanged while improving observability.frontend/src/components/TTSDownloadDialog.tsx (1)
1-160: TTS download/setup dialog behavior matches the context state machine wellThe dialog cleanly maps
TTSStatusinto UX states (not available, error, processing, ready, initial), wiresstartDownloadappropriately, and prevents accidental closure while work is in progress. The progress bar and messaging around the one-time ~MB download and local-only processing are clear.frontend/src/components/UnifiedChat.tsx (3)
412-427: LGTM!The MessageList signature update correctly adds the
onTTSSetupOpencallback with proper typing. The integration follows the existing pattern for handling dialogs in this component.
626-633: LGTM!The TTSButton integration mirrors the existing CopyButton pattern, reusing the same text extraction logic. The button is properly placed within the hover-reveal action group for assistant messages.
3023-3024: LGTM!The TTS setup dialog follows the same pattern as other dialogs in the component (e.g., WebSearchInfoDialog, ContextLimitDialog). State management is consistent with the rest of the codebase.
frontend/src-tauri/src/tts.rs (4)
335-347: LGTM!The
TTSStatedesign withMutex<Self>is appropriate here. While usingstd::sync::Mutexin async contexts can be problematic if held across.awaitpoints, the current implementation correctly performs all mutex-guarded operations synchronously before any awaits.
281-333: Text chunking logic is functional but has the same regex compilation overhead.The function works correctly for splitting text into manageable chunks. The regex objects (
para_re,sentence_re) are recreated on each call - same recommendation aspreprocess_textapplies.Note: The sentence boundary regex
([.!?])\s+may not handle all edge cases (e.g., abbreviations like "Dr. Smith"), but this is acceptable for TTS where minor imperfections in chunk boundaries won't significantly affect output quality.
568-588: LGTM!The WAV encoding function correctly clamps audio samples to prevent overflow and uses proper 16-bit PCM encoding. The in-memory buffer approach is efficient for the expected audio sizes.
706-725: LGTM!The model loading function is straightforward. While it doesn't explicitly check if models are downloaded first, the error messages from
load_tts_engineandload_voice_stylewill be informative if files are missing. The frontend'sTTSContextguards against calling this before download completes.
fc94dfa to
e607185
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
frontend/src/services/tts/TTSContext.tsx (2)
10-10: UseisTauriDesktop()instead ofisTauri()to prevent mobile TTS errors.According to past review comments, TTS commands are only registered in the desktop builder. Using
isTauri()returns true for both desktop and mobile Tauri environments, which would cause "unknown command" errors on mobile when invokingtts_get_status.Replace
isTauriwithisTauriDesktopthroughout this file (lines 10, 62, 78, 112, 118, 163).-import { isTauri } from "@/utils/platform"; +import { isTauriDesktop } from "@/utils/platform";And update usage:
- const isTauriEnv = isTauri(); + const isTauriEnv = isTauriDesktop();
208-215: Memory leak: Audio object URL not revoked on stop() or unmount.When
stop()is called or the provider unmounts during playback, the object URL created inspeak()is never revoked, causing memory leaks over repeated play/stop cycles.Add a ref to track the URL and revoke it in
stop()and the unmount effect:const audioRef = useRef<HTMLAudioElement | null>(null); + const audioUrlRef = useRef<string | null>(null);In
stop():const stop = useCallback(() => { if (audioRef.current) { audioRef.current.pause(); audioRef.current = null; } + if (audioUrlRef.current) { + URL.revokeObjectURL(audioUrlRef.current); + audioUrlRef.current = null; + } setIsPlaying(false); setCurrentPlayingId(null); }, []);And store the URL in
speak():const audioUrl = URL.createObjectURL(audioBlob); + audioUrlRef.current = audioUrl;frontend/src-tauri/src/tts.rs (2)
295-347: Sentence splitting loses punctuation marks.The regex pattern
r"([.!?])\s+"uses a capture group, butsplit()discards captured groups, causing sentences to lose their ending punctuation.Consider using
split_inclusiveor adjusting the pattern:- let sentence_re = Regex::new(r"([.!?])\s+").unwrap(); + let sentence_re = Regex::new(r"(?<=[.!?])\s+").unwrap();Or manually preserve punctuation by iterating matches instead of splitting. This affects the naturalness of TTS output since sentences will be synthesized without proper endings.
650-668: Add request timeout and handle partial downloads.Two reliability concerns from past reviews still apply:
No HTTP timeout: If the server hangs, the download blocks indefinitely.
Partial file on failure: If download fails mid-stream, a partial file remains. Next
tts_get_statuswill report models as downloaded.+ use std::time::Duration; let client = reqwest::Client::builder() + .timeout(Duration::from_secs(300)) + .connect_timeout(Duration::from_secs(30)) .build() - .map_err(|e| format!("Failed to create HTTP client: {}", e))?; + .unwrap_or_else(|_| reqwest::Client::new());For partial files, download to a
.partfile and rename on success:+ let temp_path = file_path.with_extension("part"); - let mut file = File::create(&file_path) + let mut file = File::create(&temp_path) ... + // After successful download: + fs::rename(&temp_path, &file_path)?;
🧹 Nitpick comments (1)
frontend/src/components/UnifiedChat.tsx (1)
626-633: Consider extracting the text content logic to avoid duplication.The same text extraction logic is duplicated between CopyButton and TTSButton. While acceptable for now, consider extracting this to a helper if more buttons are added.
const messageText = message.content .filter((p) => "text" in p && p.text) .map((p) => ("text" in p ? p.text : "")) .join("");
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
frontend/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
frontend/src-tauri/Cargo.toml(1 hunks)frontend/src-tauri/src/lib.rs(7 hunks)frontend/src-tauri/src/pdf_extractor.rs(1 hunks)frontend/src-tauri/src/tts.rs(1 hunks)frontend/src/app.tsx(2 hunks)frontend/src/components/TTSDownloadDialog.tsx(1 hunks)frontend/src/components/UnifiedChat.tsx(8 hunks)frontend/src/services/tts/TTSContext.tsx(1 hunks)frontend/src/services/tts/index.ts(1 hunks)justfile(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- frontend/src/app.tsx
- frontend/src/services/tts/index.ts
- justfile
- frontend/src-tauri/Cargo.toml
- frontend/src-tauri/src/pdf_extractor.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx,js,jsx}: Use 2-space indentation, double quotes, and a 100-character line limit for formatting
Use camelCase for variable and function names
Use try/catch with specific error types for error handling
Files:
frontend/src/components/TTSDownloadDialog.tsxfrontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsx
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use strict TypeScript typing and avoid
anywhen possible
Files:
frontend/src/components/TTSDownloadDialog.tsxfrontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-07-19T21:31:44.925Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-19T21:31:44.925Z
Learning: Applies to src/**/*.{tsx} : Use React context for global state management when needed
Applied to files:
frontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (3)
frontend/src/components/TTSDownloadDialog.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
useTTS(251-257)frontend/src/components/ui/dialog.tsx (6)
Dialog(92-92)DialogContent(97-97)DialogHeader(98-98)DialogTitle(100-100)DialogDescription(101-101)DialogFooter(99-99)frontend/src/components/ui/button.tsx (1)
Button(62-62)
frontend/src/services/tts/TTSContext.tsx (1)
frontend/src/services/tts/index.ts (3)
TTSStatus(2-2)TTSProvider(1-1)useTTS(1-1)
frontend/src-tauri/src/lib.rs (1)
frontend/src-tauri/src/tts.rs (8)
init_tts_backend(20-30)new(100-102)new(355-360)tts_get_status(616-634)tts_download_models(645-718)tts_load_models(721-739)tts_synthesize(749-785)tts_unload_models(788-794)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build-ios
- GitHub Check: build-android
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: build-linux
- GitHub Check: Cloudflare Pages
🔇 Additional comments (11)
frontend/src/components/UnifiedChat.tsx (2)
179-233: TTSButton implementation looks solid.The component properly:
- Guards rendering with
isTauriEnvcheck- Handles all TTS status states appropriately
- Provides clear visual feedback (spinner during loading states, stop icon during playback)
- Uses the correct disabled states during transitional operations
3023-3024: TTS dialog integration is clean.The dialog is properly controlled via state and placed appropriately in the component tree.
frontend/src/services/tts/TTSContext.tsx (1)
259-267: Helper function implementation is correct.The
base64ToBlobfunction correctly decodes base64 and creates a Blob with the appropriate MIME type.frontend/src/components/TTSDownloadDialog.tsx (3)
18-24: Dialog implementation is well-structured.The component properly handles all TTS status states with appropriate UI feedback. The
handleDownloadwrapper is a simple passthrough, which is fine but could be inlined if no additional logic is needed.
70-78: Error display is correctly guarded.The condition
hasError && errorensures the error panel only shows when there's an actual error message to display, preventing empty error states.
93-105: Progress bar behavior is well-designed.The progress bar correctly shows 100% during the loading phase (after download completes), and the percentage text is appropriately hidden during non-download states.
frontend/src-tauri/src/lib.rs (2)
396-407: Minor formatting changes to emit call.These are style/formatting adjustments with no functional impact.
264-273: > Likely an incorrect or invalid review comment.frontend/src-tauri/src/tts.rs (3)
765-767: Synthesis parameters are hardcoded as per design.The hardcoded values (10 inference steps, 1.2x speed) match the PR description. Consider exposing these as optional parameters in the future for user customization.
550-580: TTS engine loading is straightforward and correct.The ONNX session creation uses default builder settings. Consider adding thread configuration or optimization options if performance tuning is needed later.
18-30: Platform-specific TTS backend initialization is correct.The conditional compilation properly selects tract for iOS/Android and ONNX Runtime for desktop. This aligns with the cross-platform architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, no comments
e607185 to
fc94dfa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 3 comments
fc94dfa to
ca87d46
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 3 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In @docs/tts-research.md:
- Around line 1-30: The Gemini section has many run-on paragraphs and missing
blank lines (e.g., "2026The trajectory", "SuccessOn desktop") and improperly
formatted code and URLs; fix by inserting blank lines between all headings (like
"## Gemini"), subheadings (e.g., "2. Architectural Deconstruction"), and
paragraph blocks so each logical paragraph is separated, correct concatenated
sentences into proper sentences, convert bare URLs into markdown links with
descriptive text (replace bare URLs with [link text](url)), and ensure every
code block uses fenced code blocks with language identifiers and closing ```
markers (e.g., ```rust, ```json) where the diff shows example snippets (the
Cargo.toml and Rust/JavaScript examples) so the document is valid Markdown and
readable.
In @frontend/src-tauri/src/tts.rs:
- Around line 281-333: The sentence-splitting regex in chunk_text discards
terminal punctuation because sentence_re = r"([.!?])\s+" is used with split();
update chunk_text to preserve sentence-ending punctuation by either using a
lookahead regex like r"(?<=[.!?])\s+" for splitting or use
split_inclusive/iterate matches so the punctuation stays part of each part; keep
the surrounding logic (current accumulation, trimming, max_len checks, and
fallback behavior) the same but operate on parts that include their terminal
punctuation so TTS prosody is preserved.
In @frontend/src/services/tts/TTSContext.tsx:
- Around line 161-215: The speak function creates an object URL (audioUrl) but
only revokes it in onended/onerror, causing leaks when stop() is called,
audio.play() throws, or the component unmounts; fix by tracking the created URL
in a ref (e.g., audioUrlRef) and centralize cleanup in a helper (e.g.,
revokeAudioUrl) that revokes URL, clears audioRef and audioUrlRef, and updates
state; call that helper from onended, onerror, the catch block in speak, stop(),
and a useEffect cleanup on unmount so all exit paths revoke the object URL.
🧹 Nitpick comments (3)
docs/tts-research.md (2)
3-27: Trim verbosity in the Gemini section for better readability.The opening Gemini section (lines 3–27) reads as lengthy, marketing-oriented prose with excessive punctuation and repetitive framing. The content is technically sound, but 25 lines of introductory preamble before substantive technical details may obscure the key findings. Consider condensing the executive summary to 3–5 sentences and moving deeper narrative to subsections if detail is necessary.
Example: Replace the verbose opening with a concise summary:
-Definitive Comparative Analysis of Local TTS Architectures for Cross-Platform Tauri Integration: Supertonic 2 vs. Chatterbox Turbo1. Executive Strategic Overview: The Local Inference Convergence in 2026The trajectory of computational linguistics... [excessive framing] +## Executive Summary +For Tauri developers building cross-platform local TTS, Supertonic 2 (ONNX-native, 44M params) outweighs Chatterbox Turbo (PyTorch-based, 350M params) due to superior mobile compatibility, smaller footprint, and direct Rust integration. Chatterbox excels only when voice cloning or paralinguistic features are essential requirements.
29-180: Consolidate or clearly distinguish the three AI research perspectives.The document contains three separate comprehensive analyses (Gemini, Claude, Grok) covering largely overlapping ground: architecture, model sizes, cross-platform support, performance, and Tauri integration. While multiple perspectives can add depth, the repetition may reduce clarity. If all three are retained for documentation completeness, add a brief preamble explaining their purpose (e.g., "The following sections represent independent research analyses to validate findings across sources").
Alternatively, extract consensus findings into a single unified recommendation, placing detailed comparative tables and architecture discussions in a "Detailed Comparisons" section, and move the three analyses to an appendix or separate file if historical record is important.
frontend/src-tauri/src/tts.rs (1)
734-771: Synthesis command holds mutex during potentially slow inference.The
tts_synthesizecommand acquires a mutex lock at line 739 and holds it through the entire synthesis operation (lines 751-753), which involves multiple ONNX model runs. This blocks other potential TTS operations.The lock is released before base64 encoding (line 759), which is good, but consider whether the synthesis itself could be done with a shorter lock duration by cloning necessary state upfront.
For the current single-user desktop app context, this is acceptable. If concurrent TTS requests become possible in the future, consider restructuring to minimize lock duration.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
frontend/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
docs/tts-research.mdfrontend/src-tauri/Cargo.tomlfrontend/src-tauri/src/lib.rsfrontend/src-tauri/src/tts.rsfrontend/src/app.tsxfrontend/src/components/TTSDownloadDialog.tsxfrontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsxfrontend/src/services/tts/index.tsjustfile
🚧 Files skipped from review as they are similar to previous changes (4)
- frontend/src-tauri/Cargo.toml
- frontend/src/services/tts/index.ts
- justfile
- frontend/src/app.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Use path aliases (@/*maps to./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid usinganytype
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Runjust format,just lint, andjust buildafter making TypeScript/React changes to ensure code quality and compilation
Files:
frontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsxfrontend/src/components/TTSDownloadDialog.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management
Applied to files:
frontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (3)
frontend/src/components/UnifiedChat.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
useTTS(251-257)frontend/src/components/ui/button.tsx (1)
Button(62-62)frontend/src/components/TTSDownloadDialog.tsx (1)
TTSDownloadDialog(18-160)
frontend/src-tauri/src/lib.rs (1)
frontend/src-tauri/src/tts.rs (7)
new(86-88)new(341-346)tts_get_status(602-620)tts_download_models(631-704)tts_load_models(707-725)tts_synthesize(735-771)tts_unload_models(774-780)
frontend/src/components/TTSDownloadDialog.tsx (2)
frontend/src/components/ui/dialog.tsx (6)
Dialog(92-92)DialogContent(97-97)DialogHeader(98-98)DialogTitle(100-100)DialogDescription(101-101)DialogFooter(99-99)frontend/src/components/ui/button.tsx (1)
Button(62-62)
🪛 LanguageTool
docs/tts-research.md
[grammar] ~5-~5: Ensure spelling is correct
Context: ...ligible levels.The 44 Million Parameter AdvantageThe model operates with approximately 44 mi...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[uncategorized] ~5-~5: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...d) can often reside entirely within the System Level Cache (SLC) or high-speed RAM partition...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[typographical] ~5-~5: Consider using an en dash here instead of a hyphen.
Context: ... to be under a certain threshold, often 200MB-4GB depending on OS version) and the Google...
(QB_NEW_EN_DASH_RULE_EN)
[grammar] ~5-~5: Use a hyphen to join words.
Context: ...features. The dynamic nature of its flow matching steps and custom tokenizers mak...
(QB_NEW_EN_HYPHEN)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...he Tauri Framework Context: Integration RealitiesThe user's choice of Tauri as the applicati...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...).3.1 The "Sidecar Pattern" and Desktop SuccessOn desktop operating systems, Tauri suppor...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...obile Wall": Why Sidecars Fail on iOS & AndroidThe user's requirement for iOS and Android ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...fecycle.The Dependency Hell of Embedded PythonThe alternative to a sidecar is embedding t...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...ource path helpers).Android Integration StrategyJNI and Shared Libraries: Android requires ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...g the models.5.1 Real-Time Factor (RTF) BenchmarksThe "Real-Time Factor" measures the speed o...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...s the integration roadmap.6.1 Rust Core ConfigurationThe integration avoids the sidecar pattern ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...hin the Rust process.Step 1: Dependency ManagementIn src-tauri/Cargo.toml:Ini, TOML[dependen...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~13-~13: Ensure spelling is correct
Context: ...back rodio = "0.19" Step 2: Model Asset BundlingThe 268MB model file must be accessible to ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~26-~26: Ensure spelling is correct
Context: ...capability.10. Conclusion and Strategic RecommendationThe comparative analysis yields a definitiv...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~162-~162: Consider using a shorter alternative to avoid wordiness.
Context: ...ox models (350-500 million parameters), making it better suited for resource-constrained devices...
(MADE_IT_JJR)
🪛 markdownlint-cli2 (0.18.1)
docs/tts-research.md
26-26: Bare URL used
(MD034, no-bare-urls)
26-26: Bare URL used
(MD034, no-bare-urls)
169-169: Bare URL used
(MD034, no-bare-urls)
175-175: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Greptile Review
- GitHub Check: build-linux
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: build-ios
- GitHub Check: build-android
- GitHub Check: Cloudflare Pages
🔇 Additional comments (9)
docs/tts-research.md (2)
200-220: Verify performance benchmarks and resource claims against primary sources.The document cites specific metrics (e.g., RTF of 0.006 on M4 Pro WebGPU at line 119, ~5GB VRAM for Chatterbox at line 199, 0.3 RTF on Raspberry Pi at line 120). While generally reasonable, these claims lack inline citations to benchmark reports or official documentation. For a research document, adding references or footnotes (e.g., linking to the GitHub repos' own benchmarks or published evaluations) strengthens credibility.
233-246: Ensure all citations are accurate and up-to-date.The "Key Citations" section (lines 233–246) lists GitHub repos, blog posts, and videos. These are helpful, but verify:
- GitHub repo URLs match the latest stable branches (e.g., Supertone Inc. repo path).
- Blog and video dates align with the document's "January 2026" context (e.g., "Supertonic TTS is incredible" post should be recent).
- All links remain active; add a note if any are known to be archived or updated.
frontend/src-tauri/src/lib.rs (1)
6-6: TTS module integration looks correct for desktop-only functionality.The TTS module, state management, and command handlers are appropriately scoped to the desktop build path. The mobile invoke handler (lines 261-263) correctly excludes TTS commands, aligning with the PR objective that TTS is desktop-only.
Also applies to: 38-38, 48-52
frontend/src/components/UnifiedChat.tsx (2)
192-246: Well-structured TTSButton component with proper state handling.The component correctly:
- Guards against non-Tauri environments by returning null early (line 206-208)
- Handles all status states appropriately in
handleClick- Disables the button during transitional states (checking, downloading, loading)
- Shows appropriate icons for playing/stopped/loading states
The implementation follows React best practices with proper hook usage and conditional rendering.
756-764: TTS integration wired correctly through the component tree.The TTS setup flow is properly connected:
ttsSetupDialogOpenstate manages dialog visibilityonTTSSetupOpencallback is passed through MessageList to TTSButton- TTSDownloadDialog is rendered with appropriate props
Also applies to: 880-880, 3346-3348
frontend/src/components/TTSDownloadDialog.tsx (1)
18-159: Clean implementation of the TTS download dialog with comprehensive state handling.The dialog properly handles all TTS lifecycle states:
- Prevents dismissal during processing (line 34)
- Shows appropriate messaging for each state (not_available, error, checking, downloading, loading, ready)
- Progress bar and percentage display are conditionally rendered based on state
- Footer buttons adapt to the current status
The component follows coding guidelines with proper path aliases, TypeScript types, and React patterns.
frontend/src/services/tts/TTSContext.tsx (2)
259-266:base64ToBlobhelper is correctly implemented.The function properly decodes base64 to bytes and creates a Blob with the specified MIME type.
62-62: UseisTauriDesktop()instead ofisTauri()to prevent mobile invoke errors.TTS commands are only registered in the desktop invoke handler (
lib.rslines 48-52), not in the mobile handler (lines 261-263). The currentisTauri()check returns true for both desktop and mobile Tauri environments, which causes "unknown command" errors whentts_get_statusis invoked on mobile.Replace
isTauri()withisTauriDesktop()at all usage points (lines 62, 78, 112, 118, 163) to ensure TTS operations are only attempted on desktop.Also applies to: 78-78, 112-112, 118-118, 163-163
⛔ Skipped due to learnings
Learnt from: CR Repo: OpenSecretCloud/Maple PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-12-08T19:55:33.330Z Learning: Applies to src-tauri/**/*.rs : Use `#[tauri::command]` attribute for Tauri command handlersfrontend/src-tauri/src/tts.rs (1)
1-588: Solid TTS engine implementation with proper ONNX integration.The TTS pipeline is well-structured:
- Text preprocessing handles Unicode normalization, emoji removal, and punctuation cleanup
- Multi-stage inference (duration prediction → text encoding → denoising loop → vocoder)
- Proper chunking for long texts with silence between chunks
- WAV encoding with correct sample format
The ONNX session management and ndarray operations follow established patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 1 comment
eab4fbc to
4898520
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
frontend/src-tauri/src/tts.rs (1)
115-219: Consider caching compiled regexes for performance.Multiple regexes are compiled on each call to
preprocess_text(). For better performance, especially with longer texts or frequent calls, consider usinglazy_staticoronce_cell::sync::Lazyto compile regexes once.♻️ Example using once_cell
use once_cell::sync::Lazy; static EMOJI_PATTERN: Lazy<Regex> = Lazy::new(|| { Regex::new(r"[\x{1F600}-\x{1F64F}...]").unwrap() }); static DIACRITICS_PATTERN: Lazy<Regex> = Lazy::new(|| { Regex::new(r"[\u{0302}...]").unwrap() }); // Then use &*EMOJI_PATTERN in the function
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
frontend/src-tauri/src/tts.rsfrontend/src/services/tts/TTSContext.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Use path aliases (@/*maps to./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid usinganytype
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Runjust format,just lint, andjust buildafter making TypeScript/React changes to ensure code quality and compilation
Files:
frontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management
Applied to files:
frontend/src/services/tts/TTSContext.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Greptile Review
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: build-linux
- GitHub Check: build-ios
- GitHub Check: build-android
- GitHub Check: Cloudflare Pages
🔇 Additional comments (17)
frontend/src/services/tts/TTSContext.tsx (8)
1-13: LGTM!Imports are clean and follow the coding guidelines with path aliases (
@/utils/platform).
14-56: LGTM!Type definitions are well-structured and align with the Rust backend response types.
60-74: LGTM!Good pattern to check
isTauriDesktop()once at mount and derive initial status accordingly.
76-108: LGTM!The status checking logic correctly handles the various model states with appropriate error handling.
117-159: LGTM!Download flow correctly manages the event listener lifecycle and handles both success and error paths.
161-213: Memory leak fixes properly implemented.The
speakcallback now correctly revokes object URLs in all exit paths:
- When stopping current audio to play new (lines 168-170)
- On playback end/error (lines 190, 197)
- In the catch block (lines 206-208)
This addresses the previous review feedback.
215-240: LGTM!The
stop()callback and unmount cleanup effect properly revoke object URLs, addressing the previous memory leak concerns.
264-280: LGTM!The
useTTShook follows the standard context hook pattern with proper error handling for missing provider, and thebase64ToBlobhelper is clear and functional.frontend/src-tauri/src/tts.rs (9)
1-41: LGTM!Imports and constants are well-organized. Model file definitions with expected sizes enable proper progress tracking.
43-113: LGTM!Configuration structs and
UnicodeProcessorare well-structured. The processor gracefully handles out-of-range Unicode characters by assigning -1.
281-350: Punctuation preservation fix confirmed.The sentence splitting now correctly preserves terminal punctuation by taking
para[last_end..m.start() + 1]which includes the punctuation character. This addresses the previous review feedback about lost sentence endings.
352-511: LGTM!The TTS engine implementation is well-structured with proper state management. The synthesis flow correctly:
- Chunks text for manageable processing
- Adds inter-chunk silence for natural pacing
- Runs the multi-stage ONNX pipeline (duration prediction → text encoding → denoising → vocoder)
521-551: Voice style is hardcoded to F2.The voice style is hardcoded to
"F2.json"(line 522). Per the PR summary, this is intentional ("default/currently using F2"). If voice selection is planned for the future, consider accepting the style name as a parameter or adding a TODO comment.
647-738: Download reliability improvements confirmed.The download implementation now includes all the previously requested reliability improvements:
- HTTP timeouts (300s total, 30s connect) at lines 655-658
- Partial file cleanup (
.partextension) at lines 664, 682- Atomic file finalization (flush + rename) at lines 727-731
This prevents corrupted model files from being treated as valid.
768-805: Lock is held during synthesis.The mutex guard is held at line 773 through the synthesis call at lines 785-787, which could take a noticeable amount of time for longer texts. While the guard is correctly dropped before encoding (line 793), this means concurrent TTS commands (e.g., status checks) will block during synthesis.
For a single-user desktop app this is likely acceptable, but consider documenting this behavior or adding a comment noting the intentional lock scope.
618-637: LGTM!The status command correctly reports download and load state. The file existence check combined with the
.partfile mechanism in downloads should be sufficient to detect incomplete downloads.
740-759: LGTM!Model loading and unloading commands are straightforward with proper error handling and logging.
Also applies to: 807-814
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
frontend/src-tauri/src/tts.rs (1)
618-637: Consider: Optional file integrity verification.The status check uses simple file existence (line 626). While the
.parttemp-file download pattern mitigates partial-file issues, you could optionally verify file sizes matchexpected_sizefromMODEL_FILESfor additional robustness against corruption.Optional enhancement for file integrity
let models_downloaded = MODEL_FILES .iter() - .all(|(name, _, _)| models_dir.join(name).exists()); + .all(|(name, _, expected_size)| { + if let Ok(metadata) = models_dir.join(name).metadata() { + metadata.len() == *expected_size + } else { + false + } + });
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
frontend/src-tauri/src/lib.rsfrontend/src-tauri/src/tts.rsfrontend/src/components/TTSDownloadDialog.tsxfrontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsx
🚧 Files skipped from review as they are similar to previous changes (2)
- frontend/src/components/TTSDownloadDialog.tsx
- frontend/src/components/UnifiedChat.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Use path aliases (@/*maps to./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid usinganytype
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Runjust format,just lint, andjust buildafter making TypeScript/React changes to ensure code quality and compilation
Files:
frontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management
Applied to files:
frontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (1)
frontend/src-tauri/src/lib.rs (1)
frontend/src-tauri/src/tts.rs (8)
new(86-88)new(358-363)tts_get_status(619-637)tts_download_models(648-738)tts_load_models(741-759)tts_synthesize(769-805)tts_unload_models(808-814)tts_delete_models(817-834)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Greptile Review
- GitHub Check: build-android
- GitHub Check: build-ios
- GitHub Check: build-linux
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (9)
frontend/src/services/tts/TTSContext.tsx (4)
1-58: LGTM: Clean type definitions and imports.The imports follow the project's path alias conventions, and the TypeScript type definitions are strict and well-structured. The exported
TTSStatustype provides a clear public API surface.
62-77: Well-addressed: Correct desktop-only detection.The use of
isTauriDesktop()(line 64) correctly ensures TTS functionality is only available on desktop Tauri environments, preventing the "unknown command" errors that would occur on mobile. This addresses the concern raised in previous reviews.
190-269: Excellent: Comprehensive audio resource cleanup.The audio playback implementation now properly revokes object URLs in all exit paths:
- When stopping previous audio (line 198)
- On playback end (line 219) and error (line 226)
- In the error catch block (line 236)
- In the explicit
stop()function (line 248)- On component unmount (line 265)
This comprehensively addresses the memory leak concerns raised in previous reviews.
302-310: LGTM: Correct base64-to-blob conversion.The helper function uses standard browser APIs correctly to convert the base64-encoded audio data from the backend into a playable Blob.
frontend/src-tauri/src/tts.rs (4)
1-79: LGTM: Well-structured configuration and dependencies.The model definitions, constants, and configuration types are clearly organized. The TOTAL_MODEL_SIZE of ~265 MB aligns with the PR description, and the serde annotations correctly support JSON serialization for config files and voice styles.
281-350: Well-fixed: Sentence chunking now preserves punctuation.The
chunk_textfunction now usesfind_iterwith manual slice construction (line 308:para[last_end..m.start() + 1]) rather thansplit(), ensuring terminal punctuation is preserved in each sentence. This addresses the concern raised in previous reviews about lost punctuation affecting TTS prosody.
647-738: Excellent: Robust download implementation with timeouts and atomic completion.The download function now includes:
- HTTP timeouts (300s total, 30s connect) to prevent indefinite hangs (lines 656-657)
- Atomic file completion using a
.parttemp file pattern (line 664)- Explicit flush and rename on success (lines 727-731)
This prevents both indefinite blocking and partial-file corruption issues raised in previous reviews. The existence check (line 667) correctly tests the final
file_path, not the temp file.
740-834: LGTM: Clean command implementations with proper locking.The Tauri commands are well-structured:
tts_load_modelscorrectly loads both engine and style into shared statetts_synthesizeproperly clones the style (line 780) to avoid borrow conflicts and drops the guard early (line 793) to minimize lock contention during encodingtts_delete_modelscorrectly unloads models before filesystem removalThe hardcoded synthesis parameters (10 steps, 1.2× speed at line 786) align with the PR description's configuration.
frontend/src-tauri/src/lib.rs (1)
6-6: LGTM: Correct desktop-only TTS integration.The TTS module is properly integrated into the desktop builder only:
- Module declaration (line 6)
- State initialization with
TTSState::new()(line 38)- Six TTS commands registered in the desktop
invoke_handler(lines 48-53)The mobile builder (lines 262-264) correctly excludes TTS commands, preventing "unknown command" errors when the frontend's
isTauriDesktop()check correctly gates TTS usage.Also applies to: 38-53
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (7)
frontend/src-tauri/src/tts.rs (7)
114-241: Consider refactoring into smaller helper functions.This 127-line function handles normalization, markdown removal, emoji removal, symbol replacement, punctuation fixing, and more. Breaking it into focused helpers (e.g.,
remove_markdown,normalize_symbols,fix_punctuation) would improve readability and testability.
434-537: Optional: Extract pipeline stages into helper methods.The 103-line
infermethod orchestrates a multi-stage pipeline (duration prediction, text encoding, latent sampling, denoising loop, vocoding). Consider extracting stages likepredict_duration,encode_text,denoise_latent, andgenerate_waveformto improve readability.
548-578: Make voice style configurable.The voice style is hardcoded to F2.json (line 549), but the PR summary mentions four available styles (F1, F2, M1, M2). Consider adding a parameter to allow users to select their preferred voice, or document why F2 is the default-only choice.
Example: Add voice parameter
-fn load_voice_style(models_dir: &Path) -> Result<Style> { - let style_path = models_dir.join("F2.json"); +fn load_voice_style(models_dir: &Path, voice: &str) -> Result<Style> { + let style_path = models_dir.join(format!("{}.json", voice)); let file = File::open(&style_path).context("Failed to open voice style file")?; // ... rest of functionThen expose voice selection in Tauri commands or config.
651-653: Consider verifying file size in addition to existence.The status check only verifies file existence. If a previous download was interrupted despite the temp file safeguards, or if files become corrupted on disk,
models_downloadedwill return true but loading will fail.Optional: Add size verification
let models_downloaded = MODEL_FILES .iter() - .all(|(name, _, _)| models_dir.join(name).exists()); + .all(|(name, _, expected_size)| { + let path = models_dir.join(name); + path.exists() && path.metadata().map(|m| m.len() == *expected_size).unwrap_or(false) + });
795-838: Consider making synthesis parameters configurable.The
total_step(10) andspeed(1.2) parameters are hardcoded at line 815. While these match the PR's documented defaults, exposing them as optional parameters would allow users to trade quality/speed or adjust playback rate.Example: Add optional parameters
#[tauri::command] pub async fn tts_synthesize( text: String, + speed: Option<f32>, + steps: Option<usize>, state: tauri::State<'_, Mutex<TTSState>>, ) -> Result<TTSSynthesizeResponse, String> { // ... existing code ... + let speed = speed.unwrap_or(1.2); + let steps = steps.unwrap_or(10); + let audio = tts - .synthesize(&text, &style, 10, 1.2) + .synthesize(&text, &style, steps, speed) .map_err(|e| format!("TTS synthesis failed: {}", e))?;
674-765: Optional: Add cancellation support for long downloads.Downloading 265 MB can take significant time on slower connections. Consider adding cancellation support so users can abort if needed. Tauri supports abort signals through event listeners.
767-786: Consider checking available memory before loading models.Loading ~265 MB of ONNX models could cause issues on memory-constrained devices. While this is an on-device feature for desktop, checking available memory and providing a clear error message would improve user experience.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
frontend/src-tauri/src/tts.rsfrontend/src/services/tts/TTSContext.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
- frontend/src/services/tts/TTSContext.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Greptile Review
- GitHub Check: build-ios
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: build-linux
- GitHub Check: build-android
- GitHub Check: Cloudflare Pages
|
two small nits:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
frontend/src-tauri/src/tts.rs (2)
549-550: Consider making voice selection configurable.The voice style is hardcoded to
F2.json, but all four voice models (F1, F2, M1, M2) are downloaded. Consider exposing voice selection as a parameter in thetts_load_modelsortts_synthesizecommands to allow users to choose their preferred voice style.Example: Add voice parameter to load command
#[tauri::command] -pub async fn tts_load_models(state: tauri::State<'_, Mutex<TTSState>>) -> Result<(), String> { +pub async fn tts_load_models( + state: tauri::State<'_, Mutex<TTSState>>, + voice: Option<String> +) -> Result<(), String> { let models_dir = get_tts_models_dir().map_err(|e| e.to_string())?; + let voice_name = voice.unwrap_or_else(|| "F2".to_string()); - let style = load_voice_style(&models_dir) + let style = load_voice_style(&models_dir, &voice_name) .map_err(|e| format!("Failed to load voice style: {}", e))?;Then update
load_voice_stylesignature:-fn load_voice_style(models_dir: &Path) -> Result<Style> { - let style_path = models_dir.join("F2.json"); +fn load_voice_style(models_dir: &Path, voice_name: &str) -> Result<Style> { + let style_path = models_dir.join(format!("{}.json", voice_name));
815-816: Consider exposing synthesis parameters for user customization.The inference steps (
10) and speed multiplier (1.2) are hardcoded, but users might want to adjust the quality/speed tradeoff. Consider adding optional parameters totts_synthesize:Example: Add optional parameters
#[tauri::command] pub async fn tts_synthesize( text: String, state: tauri::State<'_, Mutex<TTSState>>, + inference_steps: Option<usize>, + speed: Option<f32> ) -> Result<TTSSynthesizeResponse, String> { + let steps = inference_steps.unwrap_or(10); + let speed_multiplier = speed.unwrap_or(1.2); + let audio = tts - .synthesize(&text, &style, 10, 1.2) + .synthesize(&text, &style, steps, speed_multiplier) .map_err(|e| format!("TTS synthesis failed: {}", e))?;Typical ranges:
inference_steps: 5-20(higher = better quality, slower),speed: 0.8-1.5(lower = slower speech, more natural).frontend/src/components/TTSDownloadDialog.tsx (2)
56-66: Simplify nested ternary logic for better readability.The deeply nested ternary operators (4 levels) make this code harder to read and maintain. Consider extracting the description text into a helper function or using early returns.
Refactor suggestion
+ const getDialogDescription = () => { + if (isNotAvailable) { + return "TTS is only available in the desktop app."; + } + if (isReady) { + return "TTS is ready! You can now listen to assistant messages."; + } + if (hasError) { + return "There was an error setting up TTS."; + } + if (isProcessing) { + return "Setting up TTS. Please keep this window open."; + } + return `Listen to assistant messages with natural-sounding speech. This requires a one-time download of ~${Math.round(totalSizeMB)} MB.`; + }; + <DialogDescription className="text-base"> - {isNotAvailable - ? "TTS is only available in the desktop app." - : isReady - ? "TTS is ready! You can now listen to assistant messages." - : hasError - ? "There was an error setting up TTS." - : isProcessing - ? "Setting up TTS. Please keep this window open." - : `Listen to assistant messages with natural-sounding speech. This requires a one-time download of ~${Math.round(totalSizeMB)} MB.`} + {getDialogDescription()} </DialogDescription>
170-199: Footer button logic follows similar nested ternary pattern.While this section is somewhat more manageable than the description text, consider applying a similar extraction pattern for consistency and maintainability if refactoring the DialogDescription (lines 56-66).
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
frontend/src-tauri/src/tts.rsfrontend/src/components/TTSDownloadDialog.tsxfrontend/src/components/UnifiedChat.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Use path aliases (@/*maps to./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid usinganytype
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Runjust format,just lint, andjust buildafter making TypeScript/React changes to ensure code quality and compilation
Files:
frontend/src/components/UnifiedChat.tsxfrontend/src/components/TTSDownloadDialog.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management
Applied to files:
frontend/src/components/UnifiedChat.tsx
🧬 Code graph analysis (2)
frontend/src/components/UnifiedChat.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
useTTS(304-310)frontend/src/components/ui/button.tsx (1)
Button(62-62)frontend/src/components/TTSDownloadDialog.tsx (1)
TTSDownloadDialog(18-203)
frontend/src/components/TTSDownloadDialog.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
useTTS(304-310)frontend/src/components/ui/dialog.tsx (6)
Dialog(92-92)DialogContent(97-97)DialogHeader(98-98)DialogTitle(100-100)DialogDescription(101-101)DialogFooter(99-99)frontend/src/components/ui/button.tsx (1)
Button(62-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Greptile Review
- GitHub Check: build-linux
- GitHub Check: build-ios
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: build-android
- GitHub Check: Cloudflare Pages
🔇 Additional comments (8)
frontend/src-tauri/src/tts.rs (3)
100-104: LGTM! OOV handling now uses padding token.The change from
-1to0for out-of-vocabulary characters correctly uses the padding token instead of an invalid sentinel value, which prevents potential embedding lookup failures in the ONNX models.
304-373: Sentence boundary handling preserves punctuation correctly.The implementation at line 331 uses
para[last_end..m.start() + 1]which correctly includes the terminal punctuation character in each sentence chunk, addressing the previous concern about lost punctuation.
675-766: LGTM! Download reliability improvements properly implemented.The implementation now includes:
- HTTP timeouts (300s request, 30s connect) to prevent indefinite hangs
- Temp file with
.partextension to avoid treating partial downloads as complete- Cleanup of partial downloads before retry (line 710)
- Explicit flush and atomic rename (lines 754-759) to finalize downloads
These changes address all previous reliability concerns.
frontend/src/components/UnifiedChat.tsx (5)
208-215: LGTM! Timer cleanup properly implemented.The
useEffectcleanup function now properly clears thelongPressTimeron component unmount, preventing the timeout from firing after unmount and attempting to callonManage(). This addresses the previous review concern.
217-220: Correct platform check for TTS availability.The early return for non-Tauri environments properly ensures the TTS button only renders in the desktop app, where the TTS backend is available.
222-235: Click handler logic correctly manages TTS states.The implementation properly handles three cases:
- Not ready/error → trigger setup flow
- Playing current message → stop playback
- Ready and idle → start playback
790-800: LGTM! Copy and TTS buttons now always visible.The buttons are rendered without hover states, making them consistently visible as requested in the PR feedback. The flex layout cleanly presents both action buttons together.
3382-3383: TTS dialog properly integrated into component hierarchy.The dialog is correctly wired with state managed at the
UnifiedChatlevel and callbacks passed to child components. The integration follows React best practices for component composition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 6 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @frontend/src/services/tts/TTSContext.tsx:
- Around line 230-266: The code creates an object URL (audioUrl) from audioBlob
but only revokes it in the source.onended handler, so if an error occurs before
playback the URL leaks; update the try/catch flow in the function that uses
base64ToBlob, audioUrl, audioContextRef, and sourceNodeRef so that audioUrl is
revoked in the catch block (and any early-return/error paths) — ensure you track
audioUrl in a variable scoped outside the try and call
URL.revokeObjectURL(audioUrl) in the catch before cleaning up
audioContextRef.current and sourceNodeRef.current, guarding against a
null/undefined audioUrl.
- Around line 271-286: The stop() function can leak object URLs because it stops
the AudioBufferSourceNode directly and onended may not fire, so revoke the audio
URL created in speak(); add an audioUrlRef (e.g., audioUrlRef.current set in
speak() where audioUrl is created) and in stop() check audioUrlRef.current, call
URL.revokeObjectURL(audioUrlRef.current) and clear it, then proceed to
stop/close and null refs (sourceNodeRef, audioContextRef) and reset state; also
ensure the onended handler clears and revokes audioUrlRef.current as well to
avoid double-revoke issues.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
frontend/src-tauri/src/tts.rsfrontend/src/services/tts/TTSContext.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Use path aliases (@/*maps to./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid usinganytype
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Runjust format,just lint, andjust buildafter making TypeScript/React changes to ensure code quality and compilation
Files:
frontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management
Applied to files:
frontend/src/services/tts/TTSContext.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: build-linux
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: Greptile Review
- GitHub Check: build-android
- GitHub Check: build-ios
- GitHub Check: Cloudflare Pages
🔇 Additional comments (14)
frontend/src/services/tts/TTSContext.tsx (6)
1-58: LGTM on imports and type definitions.The imports correctly use path aliases, and the type definitions are well-structured with proper TypeScript strictness. The use of
isTauriDesktopis correct for desktop-only TTS functionality.
62-77: Appropriate state and ref initialization.The Web Audio API refs (
audioContextRef,sourceNodeRef) correctly replace the previousHTMLAudioElementapproach to avoid hijacking macOS media controls. State initialization is clean.
79-118: Status checking logic is well-implemented.The status flow correctly handles all states (ready → loading → not_downloaded) with proper error handling. The auto-load when models are downloaded but not loaded provides good UX.
120-162: Download flow handles event listener cleanup correctly.The progress event listener is properly cleaned up in both success (line 141) and error (lines 157-159) paths.
338-361: Helper functions are well-implemented.The
preprocessTextForTTScorrectly handles both complete and streaming (unclosed) think blocks. The separation is appropriate — frontend strips reasoning content, backend handles TTS-specific text normalization.base64ToBlobimplementation is standard and correct.
288-305: Unmount cleanup should also revoke the object URL.Same issue as
stop()— if the component unmounts while audio is playing, theaudioUrlis never revoked.Likely an incorrect or invalid review comment.
frontend/src-tauri/src/tts.rs (8)
1-49: Regex pre-compilation is well-implemented.All regexes are now compiled once using
once_cell::sync::Lazy, addressing the previous performance concern. Good coverage of markdown patterns, emojis, diacritics, and punctuation normalization.
50-74: Model file definitions are adequate for progress tracking.The expected sizes are used for progress estimation rather than validation. Minor discrepancies between actual and expected sizes will only affect progress bar accuracy, not functionality.
131-137: OOV handling correctly uses padding token.Using
0(padding token) for out-of-vocabulary characters is the right approach, as it avoids potential embedding lookup failures that could occur with invalid indices like-1.
304-387: Text chunking properly preserves sentence punctuation.The sentence splitting at line 330 now correctly includes the punctuation mark (
m.start() + 1). Thesplit_by_wordsfallback handles long sentences gracefully. The remaining edge case (single word exceedingmax_len) is acceptable since it's rare and preferable to truncating words.
691-782: Download implementation is robust with proper safeguards.The implementation correctly addresses all previous concerns:
- HTTP timeouts (300s overall, 30s connect) prevent indefinite blocking
- Temp file pattern (
.partextension) with atomic rename prevents partial file corruption- Explicit flush before rename ensures data integrity
- Previous partial downloads are cleaned up before retry
812-855: Synthesis command has good lock handling.Cloning
styleto avoid borrow conflicts and dropping the guard before CPU-intensive base64 encoding minimizes lock contention. The synthesis parameters (10 steps, 1.2× speed) align with PR objectives.
563-566: Voice selection acknowledged as future work.The TODO comment appropriately documents that voice selection is currently hardcoded to F2. This is acceptable for the initial implementation.
866-884: Model deletion correctly unloads before removing files.The sequence (unload from memory → delete directory) ensures no file handles are held when deletion occurs.
903ad6d to
016e4e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 2 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
frontend/src-tauri/src/tts.rs (1)
832-832: Consider making TTS parameters configurable.The synthesis currently uses hardcoded values: 10 inference steps and 1.2× speed. Consider adding optional parameters to
tts_synthesizeto allow users to adjust quality (inference steps) and playback speed based on their preferences.frontend/src/components/TTSDownloadDialog.tsx (1)
56-66: Consider refactoring nested ternary for readability.The description text uses deeply nested ternary operators. While functional, extracting this logic into a helper function would improve readability:
♻️ Optional refactor
const getDescription = () => { if (isNotAvailable) return "TTS is only available in the desktop app."; if (isReady) return "TTS is ready! You can now listen to assistant messages."; if (hasError) return "There was an error setting up TTS."; if (isProcessing) return "Setting up TTS. Please keep this window open."; return `Listen to assistant messages with natural-sounding speech. This requires a one-time download of ~${Math.round(totalSizeMB)} MB.`; }; // Then in JSX: <DialogDescription className="text-base"> {getDescription()} </DialogDescription>
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
frontend/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
docs/tts-research.mdfrontend/src-tauri/Cargo.tomlfrontend/src-tauri/src/lib.rsfrontend/src-tauri/src/tts.rsfrontend/src/app.tsxfrontend/src/components/TTSDownloadDialog.tsxfrontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsxfrontend/src/services/tts/index.tsjustfile
🚧 Files skipped from review as they are similar to previous changes (4)
- frontend/src/app.tsx
- justfile
- frontend/src-tauri/src/lib.rs
- frontend/src/services/tts/index.ts
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Use path aliases (@/*maps to./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid usinganytype
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Runjust format,just lint, andjust buildafter making TypeScript/React changes to ensure code quality and compilation
Files:
frontend/src/components/TTSDownloadDialog.tsxfrontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management
Applied to files:
frontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (1)
frontend/src/components/TTSDownloadDialog.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
useTTS(317-323)frontend/src/components/ui/dialog.tsx (6)
Dialog(92-92)DialogContent(97-97)DialogHeader(98-98)DialogTitle(100-100)DialogDescription(101-101)DialogFooter(99-99)frontend/src/components/ui/button.tsx (1)
Button(62-62)
🪛 LanguageTool
docs/tts-research.md
[grammar] ~5-~5: Ensure spelling is correct
Context: ...ligible levels.The 44 Million Parameter AdvantageThe model operates with approximately 44 mi...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[uncategorized] ~5-~5: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...d) can often reside entirely within the System Level Cache (SLC) or high-speed RAM partition...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[grammar] ~5-~5: Use a hyphen to join words.
Context: ...features. The dynamic nature of its flow matching steps and custom tokenizers mak...
(QB_NEW_EN_HYPHEN)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...he Tauri Framework Context: Integration RealitiesThe user's choice of Tauri as the applicati...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...It compiles down to a native binary.The Webview (Frontend): Written in web technologies...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...).3.1 The "Sidecar Pattern" and Desktop SuccessOn desktop operating systems, Tauri suppor...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...obile Wall": Why Sidecars Fail on iOS & AndroidThe user's requirement for iOS and Android ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...fecycle.The Dependency Hell of Embedded PythonThe alternative to a sidecar is embedding t...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...ource path helpers).Android Integration StrategyJNI and Shared Libraries: Android requires ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...g the models.5.1 Real-Time Factor (RTF) BenchmarksThe "Real-Time Factor" measures the speed o...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...s the integration roadmap.6.1 Rust Core ConfigurationThe integration avoids the sidecar pattern ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...hin the Rust process.Step 1: Dependency ManagementIn src-tauri/Cargo.toml:Ini, TOML[dependen...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~13-~13: Ensure spelling is correct
Context: ...back rodio = "0.19" Step 2: Model Asset BundlingThe 268MB model file must be accessible to ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~26-~26: Ensure spelling is correct
Context: ...capability.10. Conclusion and Strategic RecommendationThe comparative analysis yields a definitiv...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~162-~162: Consider using a shorter alternative to avoid wordiness.
Context: ...ox models (350-500 million parameters), making it better suited for resource-constrained devices...
(MADE_IT_JJR)
🪛 markdownlint-cli2 (0.18.1)
docs/tts-research.md
26-26: Bare URL used
(MD034, no-bare-urls)
26-26: Bare URL used
(MD034, no-bare-urls)
169-169: Bare URL used
(MD034, no-bare-urls)
175-175: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Greptile Review
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: build-linux
- GitHub Check: build-android
- GitHub Check: build-ios
🔇 Additional comments (2)
frontend/src-tauri/Cargo.toml (1)
44-44: Updateortfrom 2.0.0-rc.7 to a stable release or confirm pre-release acceptability.ort 2.0.0 has not been stabilized—it remains in pre-release with the latest version at 2.0.0-rc.10. Using a pre-release version in production carries stability and support risks. Either upgrade to rc.10 if staying on pre-release is acceptable, or switch to a stable version if available.
frontend/src/components/UnifiedChat.tsx (1)
192-278: Well-implemented TTS button component.The implementation demonstrates several good practices:
- Proper cleanup: The
useEffectat lines 208-215 correctly cleans up the long-press timer on unmount, addressing the concern from the previous review- Long-press handling: The pointer event handlers properly clear the timer on both
pointerUpandpointerLeave, providing good UX- Platform awareness: Returns
nullfor non-Tauri environments, ensuring the feature is desktop-only- Accessibility: Includes descriptive
aria-labelfor screen readers- State management: Correctly uses the TTS context hook for shared state
The logic flow is clear and handles all the necessary states (not downloaded, loading, playing, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds on-device text-to-speech functionality using Supertonic, a lightweight TTS engine that runs inference via ONNX Runtime. The implementation is desktop-only, downloading ~265MB of models from HuggingFace on first use and processing all audio locally.
Key Changes
- Rust TTS Engine (
frontend/src-tauri/src/tts.rs): Complete ONNX-based inference pipeline with text preprocessing, chunking, denoising loop, and WAV encoding - React Integration (
TTSContext.tsx): Context provider managing download, model loading, and audio playback via Web Audio API - UI Components: Download dialog with progress tracking and speaker buttons on assistant messages (desktop only)
- Platform Detection: Properly scoped to desktop environments using
isTauriDesktop()checks
Issues Found
- Race condition in
tts_synthesize:sample_rateread while holding lock but used after lock released - could mismatch with audio data if TTS unloaded concurrently - Performance: Multiple large array clones inside 10-iteration denoising loop (line 515-520) - allocates significant memory
- File handling:
.with_extension("part")may produce unexpected results for multi-dot filenames - Cleanup edge case: Event listener cleanup not guaranteed in error path (line 152-161)
- RC dependency: Using
ort = "2.0.0-rc.7"release candidate in production
Architecture Notes
The implementation follows a sound architecture pattern: desktop-only feature with proper platform detection, atomic file downloads with .part extension, and clean separation between Rust inference and React playback. The 10-step denoising with 1.2x speed and 50ms chunk pause delivers high-quality output as documented.
Confidence Score: 3/5
- This PR has multiple performance and correctness issues that should be addressed before merging
- The implementation introduces a race condition in audio synthesis that could cause crashes or corrupted audio, uses an unstable RC dependency, and has performance concerns with repeated large allocations in hot paths. While the architecture is sound and the feature works, these issues warrant fixes before production deployment.
- Pay close attention to
frontend/src-tauri/src/tts.rs(race condition on line 839-843, performance issues in denoising loop) andfrontend/src/services/tts/TTSContext.tsx(event listener cleanup)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| frontend/src-tauri/src/tts.rs | 3/5 | TTS engine implementation with multiple clones in hot paths and potential race condition in audio data handling |
| frontend/src/services/tts/TTSContext.tsx | 4/5 | TTS React context with proper cleanup but potential memory leak in error paths |
| frontend/src-tauri/Cargo.toml | 3/5 | Dependencies added for TTS - using release candidate version which may have stability concerns |
| frontend/src/components/TTSDownloadDialog.tsx | 5/5 | Clean download UI component with proper state management |
| frontend/src/components/UnifiedChat.tsx | 5/5 | TTS button integration into chat UI, properly scoped to desktop only |
| frontend/src/app.tsx | 5/5 | TTSProvider added to React component tree |
Sequence Diagram
sequenceDiagram
participant User
participant React as React Frontend<br/>(TTSContext)
participant Tauri as Tauri Backend<br/>(Rust)
participant HF as HuggingFace<br/>(Models)
participant ONNX as ONNX Runtime<br/>(Inference)
User->>React: Open TTS Dialog
React->>Tauri: tts_get_status()
Tauri->>Tauri: Check models_downloaded
Tauri->>Tauri: Check models_loaded
Tauri-->>React: Status response
alt Models not downloaded
User->>React: Click Download
React->>Tauri: tts_download_models()
loop For each model file
Tauri->>HF: HTTP GET model file
HF-->>Tauri: Stream chunks
Tauri->>Tauri: Write to .part file
Tauri->>React: Emit progress event
React->>User: Update progress bar
end
Tauri->>Tauri: Rename .part to final
Tauri-->>React: Download complete
React->>Tauri: tts_load_models()
Tauri->>ONNX: Load 4 ONNX sessions
Tauri->>Tauri: Load voice style JSON
Tauri-->>React: Models loaded
end
User->>React: Click speaker icon
React->>Tauri: tts_synthesize(text)
Tauri->>Tauri: Preprocess text
Tauri->>Tauri: Chunk text (300 chars)
loop For each chunk
Tauri->>ONNX: Duration prediction
ONNX-->>Tauri: Duration values
Tauri->>ONNX: Text encoding
ONNX-->>Tauri: Text embeddings
Tauri->>Tauri: Sample noisy latent
loop 10 denoising steps
Tauri->>ONNX: Vector estimation
ONNX-->>Tauri: Denoised latent
end
Tauri->>ONNX: Vocoder (latent→wav)
ONNX-->>Tauri: Audio samples
end
Tauri->>Tauri: Concatenate chunks
Tauri->>Tauri: Encode to base64 WAV
Tauri-->>React: Audio base64 + metadata
React->>React: Decode to AudioBuffer
React->>React: Play via Web Audio API
React-->>User: Audio playback
Additional Comments (1)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Adds on-device text-to-speech using Supertonic, enabling assistant message audio playback on desktop with ~265MB one-time model download from HuggingFace.
Key Changes
- Rust TTS engine (
tts.rs): Complete ONNX Runtime integration with text preprocessing, model download/management, and audio synthesis - React TTS context: State management with Web Audio API playback, platform detection, and proper resource cleanup
- UI integration: TTS button on assistant messages with long-press settings access, download dialog with progress tracking
- Desktop-only: Properly scoped with
#[cfg(desktop)]flags and runtime platform checks
Technical Highlights
- Uses ONNX Runtime for cross-platform inference without Python dependencies
- Implements streaming synthesis with 50ms silence between chunks for natural pacing
- Handles partial download resume by cleaning up
.partfiles - Web Audio API prevents hijacking system media controls
- Comprehensive markdown/emoji stripping in text preprocessing
Minor Suggestions
- Voice currently hardcoded to F2 - consider adding user preference API for F1/F2/M1/M2 voices
- File validation could check size matches
expected_sizeto catch corrupted partial downloads
Confidence Score: 5/5
- This PR is safe to merge with minimal risk - well-architected feature addition with proper platform scoping
- Score reflects solid implementation across Rust and TypeScript, proper resource cleanup, desktop-only scoping preventing mobile issues, comprehensive error handling, and good architectural decisions (ONNX vs PyTorch, Web Audio API)
- No files require special attention - implementation is production-ready with only minor enhancement suggestions
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| frontend/src-tauri/src/tts.rs | 4/5 | new TTS engine with ONNX inference, text preprocessing, model download/management - solid implementation with minor suggestions for voice selection and file validation |
| frontend/src/services/tts/TTSContext.tsx | 5/5 | React context for TTS state management with proper cleanup, Web Audio API integration, and Tauri platform detection - well structured |
| frontend/src/components/TTSDownloadDialog.tsx | 5/5 | TTS setup dialog with download progress UI, model management, and clear user messaging - clean implementation |
| frontend/src/components/UnifiedChat.tsx | 5/5 | integrated TTS button with long-press management dialog, proper cleanup, and desktop-only rendering - well integrated |
| frontend/src-tauri/src/lib.rs | 5/5 | registered TTS module and commands with proper desktop-only compilation flags |
| frontend/src-tauri/Cargo.toml | 5/5 | added TTS dependencies (ort, ndarray, hound, etc.) as desktop-only with proper target configuration |
Sequence Diagram
sequenceDiagram
participant User
participant UI as UnifiedChat UI
participant TTS as TTSContext
participant Tauri as Rust Backend
participant Models as TTS Models
Note over User,Models: First Time Setup Flow
User->>UI: Click TTS button (models not downloaded)
UI->>TTS: Check status
TTS->>Tauri: tts_get_status()
Tauri-->>TTS: models_downloaded: false
TTS-->>UI: status: not_downloaded
UI->>User: Show TTSDownloadDialog
User->>UI: Click Download
UI->>TTS: startDownload()
TTS->>Tauri: tts_download_models()
loop For each model file
Tauri->>Models: Download from HuggingFace
Models-->>Tauri: Chunk stream
Tauri-->>TTS: Emit tts-download-progress event
TTS-->>UI: Update progress bar
end
Tauri-->>TTS: Download complete
TTS->>Tauri: tts_load_models()
Tauri->>Models: Load ONNX sessions + voice style
Models-->>Tauri: Sessions initialized
Tauri-->>TTS: Models loaded
TTS-->>UI: status: ready
Note over User,Models: Speech Synthesis Flow
User->>UI: Click TTS button (ready state)
UI->>TTS: speak(text, messageId)
TTS->>TTS: preprocessTextForTTS (remove think blocks)
TTS->>Tauri: tts_synthesize(text)
Tauri->>Tauri: chunk_text() + preprocess_text()
loop For each chunk
Tauri->>Tauri: UnicodeProcessor.call()
Tauri->>Models: Duration prediction (dp_ort)
Tauri->>Models: Text encoding (text_enc_ort)
Tauri->>Models: Sample noisy latent
loop Denoising steps (10 iterations)
Tauri->>Models: Vector estimation (vector_est_ort)
end
Tauri->>Models: Vocoder inference
Tauri->>Tauri: Append audio chunk + silence
end
Tauri->>Tauri: wav_to_base64()
Tauri-->>TTS: audio_base64, sample_rate
TTS->>TTS: base64ToBlob() + Web Audio API setup
TTS->>UI: Update isPlaying state
UI-->>User: Audio playback
TTS-->>UI: onended: isPlaying = false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (8)
frontend/src-tauri/src/tts.rs (8)
50-50: Consider pinning the model version instead of using "main".The HuggingFace URL references the
mainbranch, which means the models can change unexpectedly. This creates risks:
- Breaking changes in model format or structure
- Inconsistent behavior across installations
- Difficulty reproducing issues
Consider pinning to a specific commit SHA or release tag for stability.
Example:
-const HUGGINGFACE_BASE_URL: &str = "https://huggingface.co/Supertone/supertonic/resolve/main"; +const HUGGINGFACE_BASE_URL: &str = "https://huggingface.co/Supertone/supertonic/resolve/abc123def456"; // Pin to specific commit
193-195: Consider handling "e.g." and "i.e." without trailing comma.The current implementation only replaces
"e.g.,"and"i.e.,"(with comma), but these abbreviations often appear without commas. Consider adding replacements for both variants:text = text.replace("@", " at "); -text = text.replace("e.g.,", "for example, "); -text = text.replace("i.e.,", "that is, "); +text = text.replace("e.g.", "for example"); +text = text.replace("i.e.", "that is");Note: This will match both "e.g.," and "e.g." since the comma will be preserved if present.
310-310: Consider moving static regex to module level.The
RE_PARAstatic is defined inside thechunk_textfunction. While functionally correct, it's inconsistent with the other regex patterns defined at module level (lines 19-48). Moving it to the top improves consistency and makes all regex patterns discoverable in one place.static RE_SENTENCE: Lazy<Regex> = Lazy::new(|| Regex::new(r"([.!?])\s+").unwrap()); +static RE_PARA: Lazy<Regex> = Lazy::new(|| Regex::new(r"\n\s*\n").unwrap()); fn chunk_text(text: &str, max_len: usize) -> Vec<String> { let text = text.trim(); if text.is_empty() { return vec![String::new()]; } - static RE_PARA: Lazy<Regex> = Lazy::new(|| Regex::new(r"\n\s*\n").unwrap()); let paragraphs: Vec<&str> = RE_PARA.split(text).collect();
516-541: Denoising loop clones large array on each iteration.Line 518 clones
xt(a 3D array) on every iteration of the denoising loop. For 10 steps (default), this creates 10 copies of a potentially large tensor. While necessary for ONNX Runtime's value creation, this could impact memory usage for long audio.Consider documenting this behavior or exploring if ONNX Runtime supports views/references to reduce allocations.
Note: This is primarily a performance consideration for very long text synthesis. Current 50ms chunk approach (line 423) likely keeps memory usage reasonable.
563-564: TODO: Voice selection API is not implemented.The comment indicates voice selection is hardcoded to F2, with F1, M1, M2 available but not selectable. Since the PR summary mentions "supports four voice styles", this limitation should be addressed or tracked.
Do you want me to:
- Generate a Tauri command to support runtime voice selection?
- Open a new issue to track this enhancement?
608-614: Add error context to ONNX session loading.The ONNX session loading uses
?for error propagation without additional context. If model loading fails, users won't know which specific model file caused the issue. Consider adding context:- let dp_ort = - Session::builder()?.commit_from_file(models_dir.join("duration_predictor.onnx"))?; - let text_enc_ort = - Session::builder()?.commit_from_file(models_dir.join("text_encoder.onnx"))?; - let vector_est_ort = - Session::builder()?.commit_from_file(models_dir.join("vector_estimator.onnx"))?; - let vocoder_ort = Session::builder()?.commit_from_file(models_dir.join("vocoder.onnx"))?; + let dp_ort = Session::builder()? + .commit_from_file(models_dir.join("duration_predictor.onnx")) + .context("Failed to load duration_predictor.onnx")?; + let text_enc_ort = Session::builder()? + .commit_from_file(models_dir.join("text_encoder.onnx")) + .context("Failed to load text_encoder.onnx")?; + let vector_est_ort = Session::builder()? + .commit_from_file(models_dir.join("vector_estimator.onnx")) + .context("Failed to load vector_estimator.onnx")?; + let vocoder_ort = Session::builder()? + .commit_from_file(models_dir.join("vocoder.onnx")) + .context("Failed to load vocoder.onnx")?;
724-776: Consider adding retry logic for transient network failures.The download implementation handles errors but doesn't retry on transient failures (network timeouts, temporary server issues). For large model downloads (~265 MB), this could frustrate users with unstable connections.
Consider implementing exponential backoff retry for network errors:
- Retry 3-5 times with increasing delays
- Distinguish between retryable errors (timeouts, 5xx) and non-retryable (404, authentication)
- Preserve progress by supporting HTTP range requests for resumable downloads
Example pattern:
for attempt in 0..MAX_RETRIES { match download_with_timeout(&client, &url, &temp_path).await { Ok(_) => break, Err(e) if is_retryable(&e) && attempt < MAX_RETRIES - 1 => { tokio::time::sleep(Duration::from_secs(2_u64.pow(attempt))).await; continue; } Err(e) => return Err(e), } }
833-833: Hardcoded synthesis parameters should be configurable.The TTS synthesis uses fixed values:
total_step: 10(inference steps)speed: 1.2(playback speed multiplier)These significantly affect output quality and duration but cannot be adjusted by users. Consider exposing these as optional parameters in the Tauri command signature or a configuration API.
#[tauri::command] pub async fn tts_synthesize( text: String, + speed: Option<f32>, + steps: Option<usize>, state: tauri::State<'_, Mutex<TTSState>>, ) -> Result<TTSSynthesizeResponse, String> { // ... let audio = tts - .synthesize(&text, &style, 10, 1.2) + .synthesize(&text, &style, steps.unwrap_or(10), speed.unwrap_or(1.2)) .map_err(|e| format!("TTS synthesis failed: {}", e))?;
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
frontend/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
frontend/src-tauri/Cargo.tomlfrontend/src-tauri/src/tts.rsfrontend/src/services/tts/TTSContext.tsx
🚧 Files skipped from review as they are similar to previous changes (2)
- frontend/src-tauri/Cargo.toml
- frontend/src/services/tts/TTSContext.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-ios
- GitHub Check: build-linux
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: build-android
🔇 Additional comments (10)
frontend/src-tauri/src/tts.rs (10)
75-111: LGTM! Data structures are well-designed.The config and style structures properly model the TTS pipeline requirements. Good use of
#[serde(rename)]for thedtypefield to handle reserved keywords.
117-145: LGTM! Unicode processing handles OOV characters safely.The bounds checking on line 132 prevents index-out-of-bounds errors, and the fallback to padding token (0) for unknown characters is a safe default. The masking logic correctly handles variable-length text inputs.
628-648: LGTM! WAV encoding properly handles float-to-int conversion.The clamping on line 640 prevents overflow when converting float samples to 16-bit integers. This is essential for avoiding audio artifacts.
661-680: LGTM! Status check is efficient and correct.The implementation properly checks file existence and state without unnecessary I/O. The lock is held briefly only to check state.
698-702: Verify timeout values are appropriate for large models.The HTTP client uses a 300-second (5-minute) timeout for the entire request and 30-second connect timeout. For users on slow connections downloading 265 MB, 5 minutes may be insufficient.
Consider if the timeout should be calculated based on expected download speed (e.g., 1 Mbps minimum = ~35 minutes for 265 MB) or removed entirely for the download phase (keeping only the connect timeout).
785-804: LGTM! Model loading correctly manages state.The implementation properly loads models and updates state under the mutex. Previous model instances will be dropped automatically when replaced, ensuring proper resource cleanup.
844-844: Good practice: dropping lock before slow encoding.Releasing the state lock before base64 encoding (line 844) is excellent for preventing lock contention during the potentially slow encoding operation. This allows other commands (like status checks) to proceed concurrently.
858-865: LGTM! Unload correctly releases model resources.Setting the
Optionfields toNonewill drop the ONNX sessions and free GPU/CPU resources. Simple and correct.
867-885: LGTM! Delete models safely unloads before filesystem cleanup.The implementation correctly unloads models from memory before deleting files, preventing use-after-free scenarios. The scoped lock (lines 870-874) is good practice to minimize lock duration.
23-23: The regex analysis contains a logical error. The patternr"_([^_\s][^_]*)_"actually will match single-character italic text like_a_.Breaking down the pattern:
[^_\s]matches exactly one character (not underscore or whitespace)[^_]*matches zero or more additional characters- Therefore
_a_matches with capture group = "a"The design choice to exclude whitespace in the first character ([^_\s]) is intentional—it prevents false matches on patterns like
_ something_. This pattern is consistent with other format regex patterns in the file (RE_ITALIC, RE_BOLD2, etc.), all of which require a minimum of 1 character.No change needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Adds on-device text-to-speech using Supertonic ONNX models, enabling users to listen to assistant messages on desktop. The implementation is well-architected with proper separation between Rust inference backend and React UI layer.
Key Changes
- Rust TTS engine: 905-line implementation using ONNX Runtime with text preprocessing, chunking, and streaming synthesis
- React state management: TTSContext handles download progress, model lifecycle, and audio playback using Web Audio API
- Desktop-only feature: Properly scoped with conditional compilation and platform detection
- One-time download: ~265MB models downloaded from HuggingFace on first use, stored in app data directory
- Smart preprocessing: Removes markdown formatting, emojis, and
<think>blocks before synthesis
Issues Found
- Logic issue: Potential panic in
tts.rsline 459 iftext_idsarray is empty - needs validation before accessingtext_ids[0] - Hardcoded values: Voice style (F2) and synthesis parameters (10 steps, 1.2x speed) are hardcoded - consider exposing as user preferences
Strengths
- Comprehensive error handling throughout download and synthesis flows
- Proper cleanup of audio resources and event listeners
- Good UX with progress tracking and clear status messaging
- Well-researched technical choice documented in
tts-research.md
Confidence Score: 4/5
- Safe to merge with minor fix recommended - the empty input validation issue is unlikely to occur in practice but should be addressed
- Score reflects solid architecture and implementation quality, with one logical issue (empty array access) that could cause a panic. The hardcoded parameters are design choices rather than bugs. Desktop-only scoping is appropriate, error handling is thorough, and the feature is well-integrated.
- Pay close attention to
frontend/src-tauri/src/tts.rsline 459 - add validation before accessingtext_ids[0]to prevent potential panic
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| frontend/src-tauri/src/tts.rs | 4/5 | New TTS engine using Supertonic ONNX models. Well-structured with proper error handling, text preprocessing, and streaming synthesis. Minor issue: potential panic on empty input. |
| frontend/src/services/tts/TTSContext.tsx | 5/5 | React context managing TTS state and playback. Clean implementation with proper cleanup, Web Audio API usage, and preprocessing to remove chain-of-thought blocks. |
| frontend/src/components/TTSDownloadDialog.tsx | 5/5 | UI dialog for TTS setup with progress tracking and model management. Good UX with disabled state handling during downloads and clear messaging. |
| frontend/src/components/UnifiedChat.tsx | 5/5 | Added TTS button to assistant messages with long-press for settings. Desktop-only feature with proper platform detection and state management. |
| frontend/src-tauri/Cargo.toml | 5/5 | Added desktop-only TTS dependencies (ONNX Runtime, ndarray, reqwest, etc.). Proper conditional compilation for desktop platforms. |
Sequence Diagram
sequenceDiagram
participant User
participant UI as UnifiedChat UI
participant TTS as TTSContext
participant Tauri as Rust Backend
participant ONNX as ONNX Runtime
participant HF as HuggingFace
Note over User,HF: Initial Setup Flow
User->>UI: Opens TTS setup dialog
UI->>TTS: checkStatus()
TTS->>Tauri: invoke("tts_get_status")
Tauri-->>TTS: {models_downloaded: false}
TTS-->>UI: status: "not_downloaded"
User->>UI: Clicks "Download"
UI->>TTS: startDownload()
TTS->>Tauri: invoke("tts_download_models")
loop For each model file
Tauri->>HF: HTTP GET model file
HF-->>Tauri: Stream chunks
Tauri->>UI: emit("tts-download-progress")
UI-->>User: Update progress bar
end
Tauri->>Tauri: Save models to app data dir
Tauri-->>TTS: Download complete
TTS->>Tauri: invoke("tts_load_models")
Tauri->>ONNX: Load ONNX sessions
ONNX-->>Tauri: Sessions ready
Tauri-->>TTS: Models loaded
TTS-->>UI: status: "ready"
Note over User,HF: Text-to-Speech Flow
User->>UI: Clicks speaker icon
UI->>TTS: speak(text, messageId)
TTS->>TTS: preprocessTextForTTS()
Note right of TTS: Remove <think> blocks
TTS->>Tauri: invoke("tts_synthesize", {text})
Tauri->>Tauri: chunk_text(300 chars)
loop For each chunk
Tauri->>Tauri: preprocess_text()
Tauri->>ONNX: Run duration_predictor
ONNX-->>Tauri: duration
Tauri->>ONNX: Run text_encoder
ONNX-->>Tauri: text_emb
Tauri->>Tauri: sample_noisy_latent()
loop 10 denoising steps
Tauri->>ONNX: Run vector_estimator
ONNX-->>Tauri: denoised_latent
end
Tauri->>ONNX: Run vocoder
ONNX-->>Tauri: wav_data
end
Tauri->>Tauri: wav_to_base64()
Tauri-->>TTS: {audio_base64, sample_rate}
TTS->>TTS: base64ToBlob()
TTS->>TTS: AudioContext.decodeAudioData()
TTS->>TTS: source.start()
TTS-->>User: Audio playback starts
User->>UI: Clicks stop icon
UI->>TTS: stop()
TTS->>TTS: source.stop()
TTS-->>User: Audio stops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @frontend/src-tauri/src/tts.rs:
- Around line 50-71: Extend MODEL_FILES to include a SHA256 hex string for each
entry (e.g. change the tuple type used by MODEL_FILES to include the checksum
field) and update tts_download_models to compute the SHA256 of the downloaded
file bytes (using a standard crate like sha2::Sha256 and hex encoding) before
accepting the file; after writing the file (or on an in-memory buffer), compute
hex(SHA256(bytes)) and compare to the checksum from MODEL_FILES, and if they
differ delete/ignore the file and return or retry the download with an error,
otherwise proceed as before; retain HUGGINGFACE_BASE_URL and the existing
name/path/size checks but make checksum verification mandatory in
tts_download_models so corrupted or tampered downloads are rejected.
- Around line 709-729: The current skip logic only checks meta.len() > 0 and can
treat partial downloads as complete; update the check to compare the file size
to the expected_size (e.g., meta.len() == expected_size or meta.len() >=
expected_size) and only skip if the sizes match; when skipping, increment
total_downloaded by meta.len() (or expected_size if you choose ==) and emit
DownloadProgress using the actual size so progress is accurate; if the size does
not match expected_size treat the file as invalid and remove it as already
implemented.
- Around line 516-541: The loop unnecessarily clones xt when calling
Value::from_array(xt.clone()); change this to pass xt by value
(Value::from_array(xt)) to avoid the extra allocation and ensure ownership
semantics are correct, and adjust subsequent usage so xt is reassigned after
extracting "denoised_latent". Also hoist current_step_array and
current_step_value creation out of the loop and update their contents in-place
each iteration (e.g., via ndarray mapv_inplace or Zip) instead of reallocating
them each time; update references around vector_est_ort.run(...) to use the
moved/updated current_step_value.
🧹 Nitpick comments (5)
frontend/src-tauri/src/tts.rs (5)
304-387: Consider simplifying the chunking logic.The
chunk_textfunction implements sophisticated paragraph/sentence/word-aware chunking, but the nested conditional logic (lines 314-379) is complex with high cyclomatic complexity. While the implementation appears correct, it may be difficult to maintain.♻️ Optional refactor: Extract helper functions
Consider extracting sub-functions:
chunk_paragraph(para: &str, max_len: usize) -> Vec<String>chunk_sentence(sentence: &str, max_len: usize) -> Vec<String>This would improve readability and testability while preserving the existing behavior.
413-447: Consider making synthesis parameters configurable.The function hardcodes
max_len=300(line 421) andsilence_duration=0.05(line 423). While these values align with the PR objectives (50ms pause), consider exposing them as configuration parameters for future flexibility (e.g., adjusting pause duration per voice style or use case).
562-565: Track the TODO for voice selection API.The voice is currently hardcoded to F2 (line 565). The PR objectives mention four voices (F1, F2, M1, M2) are supported. This TODO should be addressed in a future PR to enable user voice selection.
Do you want me to open a new issue to track implementing the voice selection API?
778-787: Strengthen download validation with expected_size.The code validates
file_downloaded != expected_lenonly whencontent_lengthisSome(line 778). However,expected_sizefromMODEL_FILESis always available. Consider also validating againstexpected_sizeas a fallback or additional check to catch cases where the server's Content-Length header is missing or incorrect.🔍 Suggested enhancement
if let Some(expected_len) = expected_len { if file_downloaded != expected_len { drop(file); let _ = fs::remove_file(&temp_path); return Err(format!( "Incomplete download for {}: expected {} bytes, got {}", file_name, expected_len, file_downloaded )); } } + // Also validate against our known expected size + if file_downloaded != *expected_size { + drop(file); + let _ = fs::remove_file(&temp_path); + return Err(format!( + "Downloaded file size mismatch for {}: expected {} bytes, got {}", + file_name, expected_size, file_downloaded + )); + }
852-854: Consider exposing synthesis parameters to frontend.The synthesis call hardcodes
total_step=10andspeed=1.2(line 853). While these align with the PR objectives, consider allowing the frontend to adjust these parameters (quality vs. speed trade-off) in a future enhancement.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
frontend/src-tauri/src/tts.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build-linux
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: build-ios
- GitHub Check: build-android
- GitHub Check: Cloudflare Pages
🔇 Additional comments (12)
frontend/src-tauri/src/tts.rs (12)
113-145: LGTM: Unicode processor is well-implemented.The bounds checking (line 132) and graceful OOV handling (line 136) prevent potential panics and ensure robust text-to-token conversion.
147-220: LGTM: Comprehensive text preprocessing for TTS.The preprocessing pipeline appropriately handles markdown removal, Unicode normalization, symbol replacement, and punctuation formatting. The auto-punctuation logic (lines 216-218) correctly checks for existing punctuation before adding a period.
222-302: LGTM: Helper functions are correctly implemented.The mask generation, noisy latent sampling, and word-based text splitting logic are all sound and handle edge cases appropriately (e.g., empty text, zero lengths).
389-411: State management design is appropriate.The
TTSStateandTextToSpeechstructs correctly separate public state (with Mutex for thread-safety) from internal implementation details. ReturningMutex<Self>fromnew()is unconventional but works well with Tauri's state management pattern.
596-626: ONNX Runtime error handling relies on proper file validation.The ONNX session creation (lines 608-614) will propagate errors via
?if model files are corrupted or invalid. This is appropriate, but ensure that the download verification (intts_download_models) is robust enough to prevent corrupted files from reaching this point. This relates to the earlier comment about adding checksum verification.
628-648: LGTM: WAV encoding is correctly implemented.The sample clamping (line 640) prevents overflow when converting to 16-bit integers, and the scoped writer (lines 637-645) ensures proper WAV finalization before encoding.
661-680: LGTM: Status check is straightforward and correct.The function appropriately checks file existence and state loading without holding locks unnecessarily.
789-796: LGTM: Atomic file finalization pattern.The sequence of
flush(),sync_all(),drop(), andrename()(lines 790-796) ensures data is fully written to disk before replacing the final file. This prevents corruption if the process is interrupted.
805-824: LGTM: Efficient lock usage in model loading.The implementation correctly loads models (lines 811-814) before acquiring the state lock (line 817), minimizing lock contention and preventing blocking other operations during the expensive I/O and ONNX session initialization.
863-867: LGTM: Excellent lock release pattern.Dropping the state lock (line 864) before the expensive base64 encoding prevents blocking other TTS operations during encoding. This is a performance best practice.
878-905: LGTM: Model cleanup commands are correctly implemented.The
tts_delete_modelsfunction correctly unloads models from memory before deleting files (lines 890-894), preventing use-after-delete issues.
23-23: The italic regex pattern analysis contains errors.The pattern
r"_([^_\s][^_]*)_"will actually match_a_and_a b_contrary to the review comment. The pattern requires the first character to be non-whitespace and non-underscore, then allows zero or more non-underscore characters (which includes spaces). Both test cases satisfy this:_a_matches with a single character, and_a b_matches with a space in the middle. The pattern correctly rejects_ test_because the underscore cannot be followed by whitespace.If there are legitimate concerns about the italic matching behavior (such as not matching single-character italics intentionally), those should be evaluated separately against the actual intended use cases in the codebase.
Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @frontend/src-tauri/Cargo.toml:
- Around line 42-54: The Cargo dependency for the ONNX Runtime crate is pinned
to the unstable release candidate "ort = \"2.0.0-rc.10\"" which may introduce
breaking changes; update the dependency declaration for the desktop target in
Cargo.toml to use the stable release "ort = \"1.16.3\"" instead (modify the
entry under target.'cfg(any(target_os = "macos", target_os = "linux", target_os
= "windows"))'.dependencies where the ort line appears), run cargo update/build
to verify compatibility, and if any 2.0-specific APIs were used adjust code to
the 1.16.3 API surface or keep the RC only if those features are required.
🧹 Nitpick comments (3)
frontend/src-tauri/src/tts.rs (3)
477-479: Consider making chunk size and silence duration configurable.The chunk size (300 characters) and inter-chunk silence (0.05 seconds) are currently hardcoded. While these values are reasonable, making them configurable parameters would provide flexibility for different use cases.
622-625: Reminder: Voice selection is currently hardcoded to F2.The TODO comment indicates that voice selection API is needed. The PR objectives acknowledge this limitation. The implementation currently supports four voices (F1, F2, M1, M2) but defaults to F2.
Would you like me to help design a voice selection API, or should this be tracked in a separate issue?
922-965: Good mutex management with early lock release.The implementation correctly:
- Clones the style to avoid borrow conflicts (lines 929-934)
- Drops the guard before expensive encoding operation (line 953), minimizing lock contention
- Validates inputs (empty text/audio)
The hardcoded inference parameters (10 steps, 1.2× speed) match the PR objectives but could be made configurable in the future.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
frontend/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
frontend/src-tauri/Cargo.tomlfrontend/src-tauri/src/tts.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build-linux
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: build-android
- GitHub Check: build-ios
- GitHub Check: Cloudflare Pages
🔇 Additional comments (7)
frontend/src-tauri/src/tts.rs (7)
51-53: Excellent security practice: pinned model revision.Pinning the HuggingFace revision to a specific commit ensures reproducibility and prevents supply chain attacks through model substitution.
55-119: Strong integrity verification with size and SHA-256 checks.Each model file includes both expected size and SHA-256 checksum, providing robust verification during download to prevent corruption or tampering.
178-201: Good defensive handling of out-of-vocabulary characters.The implementation correctly handles Unicode characters outside the indexer range by using the padding token (0) rather than panicking. This ensures robustness when processing diverse text inputs.
203-276: Comprehensive text preprocessing for TTS quality.The preprocessing pipeline is thorough and well-structured:
- Unicode normalization (NFKD) for consistent character handling
- Markdown formatting removal for clean text
- Punctuation and spacing corrections
- Automatic sentence termination when needed
The use of pre-compiled regex patterns (via
Lazystatic) ensures efficient repeated processing.
360-443: Well-designed hierarchical text chunking strategy.The chunking logic correctly handles multiple levels:
- Split by paragraphs (preserving structure)
- Split by sentences (preserving natural boundaries)
- Split by words (fallback for oversized segments)
This approach maintains natural speech boundaries while respecting length constraints.
754-892: Robust model download implementation with comprehensive verification.The download logic includes multiple layers of protection:
- Atomic downloads via temp files with
.partextension- Dual size verification (Content-Length header + actual bytes)
- SHA-256 integrity verification
- Proper file finalization (flush, sync, rename)
- Cleanup of partial downloads on failure
- Progress events for UI feedback
This is production-quality download handling.
977-994: Correct cleanup order in model deletion.The implementation properly unloads models from memory before deleting files, preventing potential access violations. The existence check before deletion (line 987) handles the edge case where models were never downloaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds on-device text-to-speech functionality using Supertonic, a fast ONNX-based TTS engine that runs entirely locally without cloud API calls.
Key Changes
- Rust backend (
tts.rs): Implements Supertonic TTS with ONNX Runtime, downloading ~265MB of models from HuggingFace with SHA256 verification, text preprocessing (markdown removal, unicode normalization), and streaming synthesis with 10-step denoising - React integration: New
TTSContextmanaging download progress, model loading, and Web Audio API playback;TTSDownloadDialogprovides polished UX for one-time model download - Desktop-only feature: Properly gated with
#[cfg(desktop)]in Rust andisTauriEnvchecks in React; speaker buttons appear only in Tauri desktop environment - Smart preprocessing: Removes
<think>blocks from assistant messages before synthesis to avoid reading internal reasoning aloud
Architecture Highlights
- Models pinned to specific HuggingFace revision with hardcoded SHA256 checksums for integrity
- Streaming download with progress events emitted to frontend
- ONNX Runtime enables cross-platform inference (macOS/Linux/Windows) with potential for NPU acceleration
- Voice style (F2), inference steps (10), and speed (1.2x) are currently hardcoded but functional
- Clean separation: Rust handles compute-heavy synthesis, React handles UI/audio playback via Web Audio API
Testing Notes
Desktop builds are marked as working in the PR description. iOS/Android builds noted as needing testing, which is appropriate since TTS is intentionally desktop-only (dependencies are platform-gated in Cargo.toml).
Confidence Score: 4/5
- Safe to merge with minor suggestions for future enhancements
- Code is well-structured with proper error handling, security best practices (SHA256 verification), and appropriate platform gating. The implementation is production-ready for desktop. Score is 4 (not 5) due to hardcoded configuration values that would benefit from being user-configurable in future iterations.
- All files look solid.
tts.rshas a few hardcoded values (voice, inference params) that are noted for future enhancement but don't block merging.
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| frontend/src-tauri/src/tts.rs | 4/5 | Implements on-device TTS using Supertonic ONNX models with SHA256-verified downloads, text preprocessing, and streaming synthesis. Well-structured with proper error handling, though voice/parameter configuration is hardcoded. |
| frontend/src-tauri/src/lib.rs | 5/5 | Added TTS module import and registered 6 TTS Tauri commands (get_status, download_models, load_models, synthesize, unload_models, delete_models). Desktop-only via #[cfg(desktop)]. Clean integration. |
| frontend/src/services/tts/TTSContext.tsx | 5/5 | React context managing TTS lifecycle with download progress tracking, Web Audio API playback, and automatic cleanup. Properly handles Tauri environment detection and preprocesses text to remove <think> blocks before synthesis. |
| frontend/src/components/TTSDownloadDialog.tsx | 5/5 | Polished download UI with progress tracking, model deletion, and clear error states. Prevents dialog close during processing. Good UX for 265MB model download flow. |
| frontend/src/app.tsx | 5/5 | Added TTSProvider to app provider tree, positioned correctly between TooltipProvider and BillingServiceProvider. Minimal, appropriate change. |
| frontend/src/components/UnifiedChat.tsx | 4/5 | Added speaker button component to assistant messages with long-press settings gesture. Desktop-only via isTauriEnv check. Integrates cleanly with existing message rendering. |
| frontend/src-tauri/Cargo.toml | 5/5 | Added 12 TTS dependencies for desktop targets only via target.'cfg(...)'. Includes ONNX Runtime, audio encoding (hound), HTTP streaming (reqwest), and cryptography (sha2). Appropriate platform gating. |
| docs/tts-research.md | 5/5 | Comprehensive research document comparing Supertonic vs Chatterbox for Tauri integration. Details architecture, performance, and cross-platform deployment. Justifies Supertonic selection for mobile compatibility. |
Sequence Diagram
sequenceDiagram
participant User
participant UI as React UI<br/>(UnifiedChat)
participant TTS as TTSContext<br/>(React)
participant Dialog as TTSDownloadDialog
participant Tauri as Tauri IPC
participant Rust as Rust Backend<br/>(tts.rs)
participant HF as HuggingFace
participant ONNX as ONNX Runtime
participant Audio as Web Audio API
Note over User,Audio: Initial Setup Flow
User->>UI: Opens app
UI->>TTS: Auto-check status on mount
TTS->>Tauri: invoke("tts_get_status")
Tauri->>Rust: tts_get_status()
Rust->>Rust: Check if models exist<br/>and have correct size
Rust-->>Tauri: TTSStatusResponse
Tauri-->>TTS: {models_downloaded, models_loaded}
alt Models not downloaded
TTS->>TTS: Set status="not_downloaded"
User->>UI: Clicks speaker button
UI->>Dialog: Show download dialog
User->>Dialog: Clicks "Download"
Dialog->>TTS: startDownload()
TTS->>Tauri: invoke("tts_download_models")
Tauri->>Rust: tts_download_models()
loop For each model file
Rust->>HF: HTTP GET model file
HF-->>Rust: Stream chunks
Rust->>Rust: SHA256 hash verification
Rust->>Tauri: emit("tts-download-progress")
Tauri->>TTS: Download progress event
TTS->>Dialog: Update progress bar
end
Rust-->>Tauri: Download complete
TTS->>Tauri: invoke("tts_load_models")
Tauri->>Rust: tts_load_models()
Rust->>ONNX: Load 4 ONNX models<br/>(text_encoder, duration_predictor,<br/>vector_estimator, vocoder)
Rust->>Rust: Load F2 voice style
Rust->>Rust: Store in TTSState Mutex
Rust-->>Tauri: Success
Tauri-->>TTS: Models loaded
TTS->>TTS: Set status="ready"
else Models downloaded but not loaded
TTS->>Tauri: invoke("tts_load_models")
Tauri->>Rust: tts_load_models()
Rust->>ONNX: Load models into memory
Rust-->>Tauri: Success
TTS->>TTS: Set status="ready"
end
Note over User,Audio: Speech Synthesis Flow
User->>UI: Clicks speaker button<br/>on assistant message
UI->>TTS: speak(text, messageId)
TTS->>TTS: preprocessTextForTTS()<br/>(remove <think> blocks)
TTS->>Tauri: invoke("tts_synthesize", {text})
Tauri->>Rust: tts_synthesize(text)
Rust->>Rust: preprocess_text()<br/>(remove markdown, normalize)
Rust->>Rust: chunk_text(text, 300)<br/>(split into sentences)
loop For each chunk
Rust->>Rust: UnicodeProcessor.call()<br/>(tokenize text)
Rust->>ONNX: duration_predictor.run()<br/>(predict phoneme duration)
Rust->>ONNX: text_encoder.run()<br/>(encode text with F2 style)
Rust->>Rust: sample_noisy_latent()<br/>(initialize with gaussian noise)
loop 10 denoising steps
Rust->>ONNX: vector_estimator.run()<br/>(denoise latent)
end
Rust->>ONNX: vocoder.run()<br/>(latent → waveform)
Rust->>Rust: Append 50ms silence
end
Rust->>Rust: wav_to_base64()<br/>(encode as 16-bit WAV)
Rust-->>Tauri: TTSSynthesizeResponse<br/>{audio_base64, sample_rate}
Tauri-->>TTS: Audio data
TTS->>TTS: base64ToBlob()
TTS->>Audio: Create AudioContext
TTS->>Audio: decodeAudioData()
TTS->>Audio: createBufferSource()
TTS->>Audio: source.start()
Audio-->>User: Plays audio
User->>UI: Clicks stop button
UI->>TTS: stop()
TTS->>Audio: source.stop()
TTS->>Audio: audioContext.close()
TTS->>TTS: Cleanup resources
- Add Rust TTS module with Supertonic ONNX inference - Download models (~265MB) from HuggingFace on first use - Add TTSContext for React state management - Add TTSDownloadDialog for model setup - Add speaker button to assistant messages (desktop only) - Settings: F2 voice, 10 inference steps, 1.2x speed Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Preserve sentence punctuation when chunking text - Add download timeout (5min) and partial file cleanup with .part temp files - Fix audio URL memory leaks in stop/unmount/error handlers - Use isTauriDesktop instead of isTauri (TTS is desktop-only) Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Long-press speaker button to open TTS management dialog - Add delete option to free up storage (~264 MB) - Add tts_delete_models Rust command - Show storage info when TTS is ready Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add TypeScript preprocessing to remove <think>...</think> blocks - Add Rust preprocessing to strip markdown formatting (**bold**, *italic*, etc.) - Remove code blocks, headers, emojis from TTS input - Handle unclosed think tags (streaming edge case) Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add #[cfg(desktop)] to tts module declaration - Move TTS dependencies to desktop-only target in Cargo.toml - Prevents TTS code from being compiled for iOS/Android Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add useEffect cleanup for longPressTimer on unmount - Use 0 (padding token) instead of -1 for OOV characters - Make copy/speaker buttons always visible (not hover-only) - Fix progress bar colors (use primary/20 background) Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Clear mediaSession metadata and action handlers so TTS audio doesn't intercept system play/pause and track skip controls. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Use lazy_static regexes (once_cell) for one-time compilation - Replace while loops with single-pass regex for duplicate quotes - Add split_by_words() to handle text longer than max_len in chunks - Add TODO for voice selection (currently hardcoded to F2) Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Replace HTMLAudioElement with Web Audio API (AudioContext + AudioBufferSourceNode) which doesn't register with the Media Session API. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
7d53d03 to
6509a68
Compare
|
@TestFlight build |
|
🚀 TestFlight deployment triggered! Check the Actions tab for progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In @docs/tts-research.md:
- Line 26: Replace the bare URLs at the referenced locations (lines 26, 169,
175) with markdown links by wrapping each URL in [descriptive text](https://...)
or at minimum using the URL itself as link text like
[https://huggingface.co/...](https://huggingface.co/...), ensuring any
occurrences of raw "https://..." in the content are converted to proper markdown
link syntax for improved rendering and accessibility; search for the raw URLs in
the doc and update them accordingly.
- Around line 159-247: The "Grok" section duplicates content already covered in
the "Claude" analysis; remove or consolidate it into a single authoritative
analysis to avoid redundancy. Find the "Grok" header and its full subsection
(the duplicated comparisons of Supertonic vs Chatterbox, tech stack,
performance, platform insights, recommendations, and citations) and either
delete that entire block or merge its unique points into the existing "Claude"
section (preserve any unique facts such as Supertonic 2 release notes or
specific citations), then update references/citation list accordingly so only
one consolidated TTS analysis remains.
- Around line 5-26: The doc has missing blank lines and inconsistent header
markup—ensure each section header (e.g., "The 44 Million Parameter Advantage",
"The ONNX-Native Runtime", "The Sidecar Pattern", "The Mobile Wall", etc.) is
followed by a single blank line before its paragraph content and convert nested
section labels (e.g., subsection titles like "Supertonic 2: The Principles of
Architectural Distillation", "Chatterbox Turbo: The Llama-Based Heavyweight",
"3.1 The "Sidecar Pattern" and Desktop Success") into proper Markdown headers
(use ##/###/#### to reflect hierarchy) so formatting is consistent and readable
throughout the document.
🧹 Nitpick comments (9)
docs/tts-research.md (2)
29-156: The "Claude" section is better structured but may still be redundant.This analysis is significantly more readable and concise than the "Gemini" section. However, since the PR implements only Supertonic and not Chatterbox, roughly 40% of this content (comparisons, Chatterbox recommendations) is surplus and adds maintenance burden.
If this document is intended as decision justification, keep it; if it's permanent developer reference documentation, consider refactoring to focus primarily on Supertonic integration and relegating Chatterbox to a brief "considered alternatives" section.
5-26: Correct grammar and spelling issues flagged by LanguageTool.Static analysis identified several grammar issues:
- Line 5: Missing line break between "levels.The" (should be "levels. The")
- Line 5: "System Level Cache" should be "system-level cache" (hyphenated compound modifier)
- Various other section headers have similar spacing/punctuation issues throughout the Gemini section
These are primarily whitespace and hyphenation issues that make the document appear less polished.
frontend/src/app.tsx (1)
17-17: Consider using path alias for consistency.The import uses a relative path
"./services/tts/TTSContext". Per coding guidelines, TypeScript/React files should use path aliases (@/*maps to./src/*) for imports.♻️ Suggested change
-import { TTSProvider } from "./services/tts/TTSContext"; +import { TTSProvider } from "@/services/tts";Note: Import from the index file (
@/services/tts) rather than directly from TTSContext for cleaner imports.frontend/src/services/tts/TTSContext.tsx (1)
208-272: Consider reusing AudioContext for better performance.The
speakfunction creates a newAudioContexton each invocation (line 235). While this works correctly and contexts are properly closed, reusing a single context would be more efficient and avoid potential issues with browser limits on the number of AudioContext instances.♻️ Potential optimization
Consider maintaining a single
AudioContextinstance and reusing it:// At the provider level const audioContextRef = useRef<AudioContext | null>(null); // In speak function const getAudioContext = () => { if (!audioContextRef.current || audioContextRef.current.state === 'closed') { audioContextRef.current = new AudioContext(); } return audioContextRef.current; }; const audioContext = getAudioContext(); // ... rest of the codeThis is a minor optimization and not critical for functionality.
frontend/src/components/UnifiedChat.tsx (1)
65-65: Use centralized TTS import path.The import uses
@/services/tts/TTSContextdirectly. For consistency, consider importing from the barrel export@/services/ttsinstead, which re-exports the same functions.♻️ Suggested change
-import { useTTS } from "@/services/tts/TTSContext"; +import { useTTS } from "@/services/tts";frontend/src-tauri/src/tts.rs (4)
121-129: Consider using thehexcrate for encoding.This custom hex encoding implementation works correctly, but the
hexcrate is the idiomatic choice in the Rust ecosystem and would simplify maintenance.♻️ Alternative using the hex crate
Add to
Cargo.toml:hex = "0.4"Replace the function:
-fn bytes_to_hex(bytes: &[u8]) -> String { - const HEX: &[u8; 16] = b"0123456789abcdef"; - let mut out = String::with_capacity(bytes.len() * 2); - for &b in bytes { - out.push(HEX[(b >> 4) as usize] as char); - out.push(HEX[(b & 0x0f) as usize] as char); - } - out -} +use hex; + +fn bytes_to_hex(bytes: &[u8]) -> String { + hex::encode(bytes) +}
203-276: Consider decomposing this function for better maintainability.The
preprocess_textfunction handles multiple text transformations (normalization, markdown removal, symbol replacement, punctuation fixes) in a single 73-line function. Breaking it into smaller, focused helper functions would improve readability and testability.For example:
fn preprocess_text(text: &str) -> String { let text = normalize_unicode(text); let text = strip_markdown(&text); let text = normalize_symbols(&text); let text = fix_punctuation_spacing(&text); let text = cleanup_whitespace(&text); ensure_ending_punctuation(&text) }
477-479: Hardcoded synthesis parameters limit configurability.The chunk size (
max_len = 300) and silence duration (0.05) are hardcoded. While these initial values are reasonable, making them configurable would improve flexibility for future tuning.This aligns with the hardcoded voice selection (F2) and synthesis parameters (10 steps, 1.2× speed) mentioned in the PR objectives—all reasonable choices for the initial implementation but good candidates for future parameterization.
505-541: Add validation for the speed parameter.Line 539 divides duration by speed without validation. If
speedis zero or negative, this will panic or produce unexpected results. While currently hardcoded to 1.2, adding defensive validation would prevent future issues.♻️ Suggested validation
fn infer( &mut self, text_list: &[String], style: &Style, total_step: usize, speed: f32, ) -> Result<(Vec<f32>, Vec<f32>)> { + if speed <= 0.0 { + return Err(anyhow::anyhow!("Speed must be positive, got {}", speed)); + } + let bsz = text_list.len(); let (text_ids, text_mask) = self.text_processor.call(text_list);
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
frontend/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (11)
docs/tts-research.mdfrontend/src-tauri/Cargo.tomlfrontend/src-tauri/src/lib.rsfrontend/src-tauri/src/tts.rsfrontend/src-tauri/tauri.conf.jsonfrontend/src/app.tsxfrontend/src/components/TTSDownloadDialog.tsxfrontend/src/components/UnifiedChat.tsxfrontend/src/services/tts/TTSContext.tsxfrontend/src/services/tts/index.tsjustfile
🚧 Files skipped from review as they are similar to previous changes (4)
- frontend/src/components/TTSDownloadDialog.tsx
- frontend/src-tauri/Cargo.toml
- justfile
- frontend/src-tauri/tauri.conf.json
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Use path aliases (@/*maps to./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid usinganytype
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Runjust format,just lint, andjust buildafter making TypeScript/React changes to ensure code quality and compilation
Files:
frontend/src/services/tts/TTSContext.tsxfrontend/src/services/tts/index.tsfrontend/src/components/UnifiedChat.tsxfrontend/src/app.tsx
🧠 Learnings (4)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management
Applied to files:
frontend/src/services/tts/TTSContext.tsxfrontend/src/components/UnifiedChat.tsx
📚 Learning: 2025-09-21T01:46:39.084Z
Learnt from: AnthonyRonning
Repo: OpenSecretCloud/Maple PR: 246
File: frontend/src/utils/platform/index.ts:28-44
Timestamp: 2025-09-21T01:46:39.084Z
Learning: In TypeScript/JavaScript projects, when creating a convenience re-export index file (e.g., `src/utils/platform/index.ts`) that aggregates exports from a sibling module (e.g., `src/utils/platform.ts`), the relative import path `../platform` is correct and necessary. Using an absolute path like `@/utils/platform` from the index file would create a circular dependency since the absolute path resolves to the index file itself due to Node.js module resolution preferring index files in directories.
Applied to files:
frontend/src/services/tts/index.ts
📚 Learning: 2025-09-21T01:46:39.084Z
Learnt from: AnthonyRonning
Repo: OpenSecretCloud/Maple PR: 246
File: frontend/src/utils/platform/index.ts:28-44
Timestamp: 2025-09-21T01:46:39.084Z
Learning: In TypeScript/JavaScript projects, when creating a convenience re-export index file (e.g., `src/utils/platform/index.ts`) that aggregates exports from a sibling module (e.g., `src/utils/platform.ts`), the relative import path `../platform` is correct and necessary. Using an absolute path like `@/utils/platform` from the index file would create a circular dependency since the absolute path resolves to the index file itself due to Node.js module resolution preferring index files in directories. This is a standard and proper pattern for module organization.
Applied to files:
frontend/src/services/tts/index.ts
📚 Learning: 2025-08-30T22:07:39.291Z
Learnt from: AnthonyRonning
Repo: OpenSecretCloud/Maple PR: 212
File: frontend/src/billing/billingApi.ts:652-674
Timestamp: 2025-08-30T22:07:39.291Z
Learning: In frontend/src/billing/billingApi.ts and similar TypeScript files, the team prefers to rely on TypeScript's type inference through function return type declarations rather than adding explicit type casts to response.json() calls. This reduces code verbosity while maintaining type safety.
Applied to files:
frontend/src/app.tsx
🧬 Code graph analysis (4)
frontend/src-tauri/src/lib.rs (1)
frontend/src-tauri/src/tts.rs (8)
new(174-176)new(451-456)tts_get_status(722-744)tts_download_models(755-892)tts_load_models(895-913)tts_synthesize(923-965)tts_unload_models(968-974)tts_delete_models(977-994)
frontend/src/services/tts/TTSContext.tsx (1)
frontend/src/services/tts/index.ts (3)
TTSStatus(2-2)TTSProvider(1-1)useTTS(1-1)
frontend/src/components/UnifiedChat.tsx (2)
frontend/src/services/tts/TTSContext.tsx (1)
useTTS(322-328)frontend/src/components/TTSDownloadDialog.tsx (1)
TTSDownloadDialog(18-203)
frontend/src/app.tsx (5)
frontend/src/services/tts/TTSContext.tsx (1)
TTSProvider(62-320)frontend/src/components/BillingServiceProvider.tsx (1)
BillingServiceProvider(5-14)frontend/src/components/ProxyEventListener.tsx (1)
ProxyEventListener(7-58)frontend/src/components/UpdateEventListener.tsx (1)
UpdateEventListener(11-65)frontend/src/components/DeepLinkHandler.tsx (1)
DeepLinkHandler(9-109)
🪛 LanguageTool
docs/tts-research.md
[grammar] ~5-~5: Ensure spelling is correct
Context: ...ligible levels.The 44 Million Parameter AdvantageThe model operates with approximately 44 mi...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[uncategorized] ~5-~5: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...d) can often reside entirely within the System Level Cache (SLC) or high-speed RAM partition...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[typographical] ~5-~5: Consider using an en dash here instead of a hyphen.
Context: ... to be under a certain threshold, often 200MB-4GB depending on OS version) and the Google...
(QB_NEW_EN_DASH_RULE_EN)
[grammar] ~5-~5: Use a hyphen to join words.
Context: ...features. The dynamic nature of its flow matching steps and custom tokenizers mak...
(QB_NEW_EN_HYPHEN)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...he Tauri Framework Context: Integration RealitiesThe user's choice of Tauri as the applicati...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...It compiles down to a native binary.The Webview (Frontend): Written in web technologies...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...).3.1 The "Sidecar Pattern" and Desktop SuccessOn desktop operating systems, Tauri suppor...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...obile Wall": Why Sidecars Fail on iOS & AndroidThe user's requirement for iOS and Android ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...fecycle.The Dependency Hell of Embedded PythonThe alternative to a sidecar is embedding t...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...ource path helpers).Android Integration StrategyJNI and Shared Libraries: Android requires ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...g the models.5.1 Real-Time Factor (RTF) BenchmarksThe "Real-Time Factor" measures the speed o...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...s the integration roadmap.6.1 Rust Core ConfigurationThe integration avoids the sidecar pattern ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~5-~5: Ensure spelling is correct
Context: ...hin the Rust process.Step 1: Dependency ManagementIn src-tauri/Cargo.toml:Ini, TOML[dependen...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~13-~13: Ensure spelling is correct
Context: ...back rodio = "0.19" Step 2: Model Asset BundlingThe 268MB model file must be accessible to ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~26-~26: Ensure spelling is correct
Context: ...capability.10. Conclusion and Strategic RecommendationThe comparative analysis yields a definitiv...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~162-~162: Consider using a shorter alternative to avoid wordiness.
Context: ...ox models (350-500 million parameters), making it better suited for resource-constrained devices...
(MADE_IT_JJR)
🪛 markdownlint-cli2 (0.18.1)
docs/tts-research.md
26-26: Bare URL used
(MD034, no-bare-urls)
26-26: Bare URL used
(MD034, no-bare-urls)
169-169: Bare URL used
(MD034, no-bare-urls)
175-175: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build-android
- GitHub Check: build-macos (universal-apple-darwin)
- GitHub Check: build-linux
- GitHub Check: build-ios
- GitHub Check: Cloudflare Pages
🔇 Additional comments (14)
docs/tts-research.md (1)
97-102: Correct voice count: Supertonic 2 includes 10 preset voices (5 male, 5 female), not 11.The multilingual support claim (English, Korean, Spanish, Portuguese, and French) and January 6, 2026 release date are accurate according to the HuggingFace model card. However, the voice count is 10 total voices—5 female (F1–F5) and 5 male (M1–M5)—not 11 as stated.
Likely an incorrect or invalid review comment.
frontend/src/app.tsx (1)
102-109: LGTM! TTS context properly integrated.The TTSProvider correctly wraps the application tree, making TTS functionality available to all child components. The nesting order is appropriate with TTSProvider providing context to BillingServiceProvider and downstream components.
frontend/src/services/tts/index.ts (1)
1-2: LGTM! Proper re-export pattern.The re-export pattern correctly uses relative imports from the sibling TTSContext module. This barrel export enables cleaner imports throughout the codebase (e.g.,
import { useTTS } from "@/services/tts").Based on learnings, this relative import pattern is correct and necessary to avoid circular dependencies.
frontend/src-tauri/src/lib.rs (1)
6-7: LGTM! TTS backend properly integrated for desktop builds.The TTS module is correctly:
- Conditionally compiled for desktop only via
#[cfg(desktop)]- Managed in application state with
TTSState::new()- Exposed through six Tauri commands (get_status, download_models, load_models, synthesize, unload_models, delete_models)
The non-desktop build path correctly excludes TTS, aligning with the desktop-only requirement mentioned in PR objectives.
Also applies to: 39-39, 49-54
frontend/src/services/tts/TTSContext.tsx (4)
1-12: LGTM! Imports are clean and properly organized.All necessary React hooks and Tauri APIs are imported. The imports follow the coding guidelines with proper path aliases.
62-126: LGTM! Provider initialization and status checking are well-implemented.The TTSProvider correctly:
- Checks Tauri desktop environment once at mount (line 64)
- Initializes status based on environment (line 67)
- Auto-checks status on mount for Tauri environments (lines 122-126)
- Handles model loading automatically when models are downloaded but not loaded (lines 100-110)
Error handling and state transitions are clean and well-structured.
333-343: LGTM! Text preprocessing correctly handles think blocks.The
preprocessTextForTTSfunction appropriately:
- Removes completed
<think>...</think>blocks (line 337)- Handles unclosed
<think>tags for streaming edge cases (line 340)- Returns trimmed text suitable for speech synthesis
This ensures users don't hear the model's internal reasoning when using TTS.
275-297: LGTM! Cleanup effect is comprehensive and correct.The unmount cleanup properly handles:
- Event listener cleanup (unlistenRef)
- Audio source node cleanup (sourceNodeRef)
- AudioContext closure (audioContextRef)
- Object URL revocation (audioUrlRef)
All potential resource leaks are prevented.
frontend/src/components/UnifiedChat.tsx (4)
192-278: LGTM! TTSButton implementation is solid.The TTSButton component is well-implemented with:
- Proper TTS state management via
useTTShook- Click handler for speak/stop (lines 222-235)
- Long press (500ms) to open manage dialog (lines 237-248)
- Timer cleanup in useEffect (lines 209-215)
- Desktop-only rendering via
isTauriEnvcheck (lines 218-220)- Appropriate disabled states during processing (lines 250-254)
- Accessible aria-labels (line 267)
- Visual feedback with loading spinner and play/stop icons (lines 269-275)
The pointer event handling (
onPointerDown,onPointerUp,onPointerLeave) provides good cross-platform support.
790-800: Consider reviewer feedback on button visibility.Based on the PR objectives, a reviewer (marksftw) noted that the copy and speaker buttons are only visible on hover and recommended making them always visible for better discoverability, citing ChatGPT as an example.
Currently, the buttons are rendered in a flex container (line 792), but I don't see explicit hover-only classes in this code. The visibility may be controlled by the Button component's default styling. Consider verifying whether these controls should be made always visible instead of hover-only.
Could you verify the current visibility behavior of the copy and TTS buttons in the assistant messages? If they're currently hover-only, consider making them always visible as suggested by the reviewer for improved discoverability.
468-478: LGTM! MessageList TTS integration is clean.The TTS callbacks are properly typed and passed through to MessageList:
onTTSSetupOpentriggers the TTS setup dialogonTTSManagealso triggers the setup dialog (for management)The implementation correctly wires the dialog state to these callbacks.
Also applies to: 2721-2722
3382-3383: LGTM! TTS dialog properly integrated.The
TTSDownloadDialogis correctly rendered with controlled open state, enabling users to download and manage TTS models.frontend/src-tauri/src/tts.rs (2)
754-892: Excellent download integrity verification! 🔒This implementation demonstrates strong security practices:
- Pinned revision (line 52) ensures reproducible downloads
- SHA-256 checksums (lines 868-876) prevent tampering or corruption
- Dual size validation catches both incomplete downloads and mismatches with expected constants
- Atomic file operations (flush → sync → rename from
.part, lines 879-885) prevent partial writes- Partial download cleanup (line 796) maintains clean state on retry
The layered verification approach (HTTP Content-Length → expected size → SHA-256) provides defense in depth. This addresses the "fix: pin and verify TTS model downloads" commit mentioned in the PR objectives.
922-965: Good concurrency pattern with early lock release.The implementation correctly:
- Clones the style reference to avoid borrow conflicts (lines 929-934)
- Releases the Mutex guard before encoding (line 953), minimizing lock hold time during the expensive base64 encoding operation
The hardcoded synthesis parameters (
total_step=10, speed=1.2) match the PR objectives and are appropriate for the initial implementation. These could be exposed as API parameters in future iterations for greater flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Implements on-device TTS using Supertonic with ONNX Runtime inference, including secure model downloads with pinned HuggingFace revisions and SHA256 verification. The implementation is well-architected with proper separation of concerns between Rust backend and React frontend.
Key Changes:
- Rust backend (
tts.rs): 994 lines implementing text preprocessing, ONNX inference pipeline (duration prediction, text encoding, denoising, vocoding), and Tauri commands for model lifecycle management - React context (
TTSContext.tsx): State management with download progress tracking, Web Audio API playback, and automatic model loading - UI components: Download dialog with progress indicators and speaker buttons on assistant messages (desktop-only)
- Security: Pinned model revision and SHA256 checksum verification for all 10 model files (~265MB total)
- Platform: Desktop-only feature with conditional compilation, proper Linux AppImage media framework bundling
Issues Found:
- Critical: Missing
AudioContext.resume()handling for browser autoplay policy (line 233 in TTSContext.tsx) - Minor: Long-press timer cleanup issue in UnifiedChat.tsx that could leak on early unmount (line 206)
- Enhancement opportunities: Hardcoded voice (F2), inference parameters (10 steps, 1.2x speed), and chunk silence (50ms) with no configuration API
Confidence Score: 4/5
- Safe to merge with one critical fix needed for AudioContext autoplay handling
- The implementation is solid with excellent security practices (pinned models, SHA256 verification), robust error handling, and proper resource management. However, the missing AudioContext.resume() handling is a critical issue that will cause silent failures on many browsers due to autoplay policies. The timer cleanup issue is minor but should be addressed. Once the AudioContext issue is fixed, this is production-ready.
- Pay close attention to
frontend/src/services/tts/TTSContext.tsxfor the AudioContext autoplay fix andfrontend/src/components/UnifiedChat.tsxfor the timer cleanup
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| frontend/src-tauri/src/tts.rs | 4/5 | Comprehensive TTS implementation using ONNX Runtime with robust security (pinned models, SHA256 verification), but hardcoded voice and inference parameters. Found minor issues with error handling. |
| frontend/src/services/tts/TTSContext.tsx | 3/5 | React context for TTS state management with good error handling, but missing AudioContext.resume() handling for browser autoplay policy. Clean resource management. |
| frontend/src/components/TTSDownloadDialog.tsx | 5/5 | Well-designed UI component for TTS setup with clear status states, progress indicators, and error handling. No issues found. |
| frontend/src/components/UnifiedChat.tsx | 3/5 | Added TTS button component to chat messages. Minor issue with long-press timer cleanup logic that could leak if component unmounts during timer. |
| frontend/src-tauri/src/lib.rs | 5/5 | Proper conditional compilation for desktop-only TTS feature with all commands registered correctly. No issues. |
| frontend/src-tauri/Cargo.toml | 5/5 | TTS dependencies properly scoped to desktop platforms only with appropriate feature flags. No issues. |
Sequence Diagram
sequenceDiagram
participant User
participant UI as TTSDownloadDialog
participant Context as TTSContext
participant Tauri as Tauri IPC
participant Rust as TTS Module (Rust)
participant HF as HuggingFace
Note over User,HF: Initial Setup Flow
User->>UI: Opens TTS Dialog
UI->>Context: checkStatus()
Context->>Tauri: invoke("tts_get_status")
Tauri->>Rust: tts_get_status()
Rust-->>Tauri: {models_downloaded, models_loaded}
Tauri-->>Context: Status response
Context-->>UI: status="not_downloaded"
User->>UI: Clicks Download
UI->>Context: startDownload()
Context->>Context: Set status="downloading"
Context->>Tauri: listen("tts-download-progress")
Context->>Tauri: invoke("tts_download_models")
loop For each model file
Rust->>HF: Download model file
HF-->>Rust: Stream chunks
Rust->>Rust: Verify SHA256 checksum
Rust->>Tauri: emit("tts-download-progress")
Tauri-->>Context: Progress event
Context-->>UI: Update progress bar
end
Rust-->>Tauri: Download complete
Context->>Context: Set status="loading"
Context->>Tauri: invoke("tts_load_models")
Tauri->>Rust: tts_load_models()
Rust->>Rust: Load ONNX models into memory
Rust->>Rust: Load voice style (F2)
Rust-->>Tauri: Models loaded
Tauri-->>Context: Success
Context-->>UI: status="ready"
Note over User,HF: TTS Synthesis Flow
User->>UI: Clicks speaker button on message
UI->>Context: speak(text, messageId)
Context->>Context: Preprocess text (remove think blocks)
Context->>Tauri: invoke("tts_synthesize", {text})
Tauri->>Rust: tts_synthesize(text)
Rust->>Rust: preprocess_text() - remove markdown
Rust->>Rust: chunk_text() - split by sentences
loop For each chunk
Rust->>Rust: Unicode indexing
Rust->>Rust: Duration prediction (ONNX)
Rust->>Rust: Text encoding (ONNX)
Rust->>Rust: Sample noisy latent
loop Denoising steps (10 iterations)
Rust->>Rust: Vector estimation (ONNX)
end
Rust->>Rust: Vocoder synthesis (ONNX)
Rust->>Rust: Append 50ms silence
end
Rust->>Rust: Encode WAV to base64
Rust-->>Tauri: {audio_base64, sample_rate}
Tauri-->>Context: Synthesis response
Context->>Context: Decode base64 to Blob
Context->>Context: Create AudioContext
Context->>Context: Decode audio data
Context->>Context: Play via Web Audio API
Context-->>UI: Update isPlaying state
Note over Context: Audio playback complete
Context->>Context: Cleanup AudioContext
Context-->>UI: isPlaying=false
|
✅ TestFlight deployment completed successfully! |
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Adds on-device text-to-speech functionality using Supertonic TTS engine with ONNX Runtime for desktop platforms. The implementation downloads ~265MB of models from HuggingFace on first use, performs all synthesis locally, and integrates cleanly into the chat UI.
Key Changes
- Backend (Rust): Comprehensive TTS engine with ONNX model loading, text preprocessing (markdown stripping), chunking, and WAV synthesis with SHA256 verification for downloads
- Frontend (React): TTSContext for state management, TTSDownloadDialog for setup flow, and speaker buttons in chat messages with play/stop controls
- Architecture: Desktop-only feature using conditional compilation, with proper cleanup of audio resources and download progress tracking
Issues Found
- State management race condition:
isPlayingstate set before async synthesis completes, could leave state inconsistent on error - Mutex contention: TTS synthesis holds lock during expensive ONNX inference, blocking concurrent status checks
- Minor UX issues: Long-press gesture timing could conflict with click, AudioContext accumulation, and hardcoded F2 voice style
Security & Quality
- Strong security: SHA256 checksums verify all downloads, pinned HuggingFace revision prevents supply chain attacks
- Good error handling and cleanup throughout
- Desktop-only compilation prevents mobile build issues
Confidence Score: 4/5
- Safe to merge with minor race condition in state management that should be addressed
- The implementation is well-structured with strong security practices (SHA256 verification, pinned revisions) and comprehensive error handling. However, there's a state management race condition in TTSContext where
isPlayingis set before synthesis completes, and the mutex locking strategy in Rust could cause UX lag during synthesis. These are non-critical issues that don't affect core functionality but should be improved for production quality. - Pay close attention to
frontend/src/services/tts/TTSContext.tsx(state race condition at line 222) andfrontend/src-tauri/src/tts.rs(mutex contention at line 927)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| frontend/src-tauri/src/tts.rs | 4/5 | New TTS engine using ONNX Runtime - comprehensive implementation with model download, SHA256 verification, text preprocessing, and WAV synthesis |
| frontend/src/services/tts/TTSContext.tsx | 4/5 | React context for TTS state management - handles model lifecycle, download progress, audio playback with Web Audio API |
| frontend/src/components/TTSDownloadDialog.tsx | 5/5 | Clean UI component for TTS setup dialog with download progress, status indicators, and model management |
| frontend/src/components/UnifiedChat.tsx | 4/5 | Integrated TTS button into chat messages with play/stop controls and long-press management gesture |
| frontend/src-tauri/Cargo.toml | 5/5 | Added TTS dependencies (ONNX, ndarray, hound, etc.) with desktop-only target configuration |
Sequence Diagram
sequenceDiagram
participant User
participant UI as UnifiedChat
participant TTS as TTSContext
participant Rust as tts.rs (Tauri)
participant HF as HuggingFace
participant ONNX as ONNX Runtime
Note over User,ONNX: First-Time Setup Flow
User->>UI: Click speaker icon
UI->>TTS: Check status
TTS->>Rust: tts_get_status()
Rust-->>TTS: models_downloaded: false
TTS->>UI: Show download dialog
User->>UI: Confirm download
UI->>TTS: startDownload()
TTS->>Rust: tts_download_models()
loop For each model file
Rust->>HF: Download model file
HF-->>Rust: Stream file chunks
Rust->>Rust: Verify SHA256 checksum
Rust->>TTS: Emit progress event
TTS->>UI: Update progress bar
end
Rust-->>TTS: Download complete
TTS->>Rust: tts_load_models()
Rust->>ONNX: Load 4 ONNX models
ONNX-->>Rust: Models ready
Rust->>Rust: Load voice style (F2)
Rust-->>TTS: Status: ready
TTS->>UI: Show ready state
Note over User,ONNX: Speech Synthesis Flow
User->>UI: Click speaker icon
UI->>TTS: speak(text, messageId)
TTS->>TTS: preprocessTextForTTS()
TTS->>Rust: tts_synthesize(text)
Rust->>Rust: preprocess_text() - strip markdown
Rust->>Rust: chunk_text() - split into sentences
loop For each text chunk
Rust->>ONNX: duration_predictor.run()
Rust->>ONNX: text_encoder.run()
Rust->>Rust: sample_noisy_latent()
loop Denoising steps (10)
Rust->>ONNX: vector_estimator.run()
end
Rust->>ONNX: vocoder.run()
Rust->>Rust: Append WAV chunk
end
Rust->>Rust: wav_to_base64()
Rust-->>TTS: { audio_base64, sample_rate }
TTS->>TTS: base64ToBlob()
TTS->>TTS: Create AudioContext
TTS->>TTS: source.start()
TTS-->>UI: Playing
UI-->>User: Audio plays
Summary
Adds text-to-speech functionality to Maple desktop app using Supertonic, a fast on-device TTS engine. Fixes #363
Features
Technical Details
Testing
Notes
~/Library/Application Support/cloud.opensecret.maple/tts_models/on macOSSummary by CodeRabbit
New Features
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.