Skip to content

Conversation

@AnthonyRonning
Copy link
Contributor

@AnthonyRonning AnthonyRonning commented Dec 4, 2025

Summary

Adds text-to-speech functionality to Maple desktop app using Supertonic, a fast on-device TTS engine. Fixes #363

Features

  • On-device TTS: All processing happens locally, no cloud API calls
  • One-time download: ~265MB models downloaded from HuggingFace on first use
  • Speaker button: Added to assistant messages (desktop only)
  • High quality: F2 voice, 10 inference steps, 1.2x speed, 50ms pause between chunks

Technical Details

  • Rust backend using ONNX Runtime for inference
  • React context for state management
  • Downloads models to app data directory
  • Supports 4 voice styles (F1, F2, M1, M2) - currently using F2

Testing

  • Desktop build works
  • iOS build (need to test)
  • Android build (need to test)

Notes

  • TTS button only appears in desktop app (Tauri environment)
  • Models stored in ~/Library/Application Support/cloud.opensecret.maple/tts_models/ on macOS

Summary by CodeRabbit

  • New Features

    • On-device TTS: download/load/unload/delete models with progress, synthesize WAV (base64) and play/stop audio; per-message read‑aloud controls integrated into chat and app-wide TTS provider/hook.
  • Documentation

    • Added comprehensive TTS research and integration guidance.
  • Chores

    • Added desktop debug build targets and enabled Linux AppImage media bundling.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 4, 2025

📝 Walkthrough

Walkthrough

Adds a cross-platform on-device TTS feature: Rust/Tauri backend (ONNX pipeline + Tauri commands), React TTS context + hook, UI for model download/management and per-message playback, build targets and AppImage media bundling, and a TTS research doc.

Changes

Cohort / File(s) Summary
Rust TTS Backend
frontend/src-tauri/Cargo.toml, frontend/src-tauri/src/lib.rs, frontend/src-tauri/src/tts.rs
New desktop-target dependencies and a full TTS module: model discovery/download, load/unload, ONNX multi-stage synthesis pipeline, TTSState lifecycle, ONNX sessions, file management, streaming progress events, and Tauri commands (tts_get_status, tts_download_models, tts_load_models, tts_synthesize, tts_unload_models, tts_delete_models).
Frontend TTS Context
frontend/src/services/tts/TTSContext.tsx, frontend/src/services/tts/index.ts
New TTSProvider, useTTS hook and TTSStatus type; manages desktop detection, IPC invokes/events, download progress, synthesize invocation, base64 → audio playback, audio lifecycle, and exposes actions (checkStatus, startDownload, deleteModels, speak, stop).
UI Components & Chat Integration
frontend/src/components/TTSDownloadDialog.tsx, frontend/src/components/UnifiedChat.tsx, frontend/src/app.tsx
Adds TTSDownloadDialog modal and per-message TTSButton; extends message-list props with TTS callbacks (onTTSSetupOpen, onTTSManage); wraps app with TTSProvider and wires TTS flows into chat UI.
Build & Bundling
justfile, frontend/src-tauri/tauri.conf.json
Adds desktop-build-debug and desktop-build-debug-no-cc just targets; enables bundle.linux.appimage.bundleMediaFramework: true in Tauri config.
Documentation
docs/tts-research.md
New research doc comparing local TTS options, deployment strategies, mobile/desktop constraints, benchmarks and an integration roadmap.

Sequence Diagram(s)

sequenceDiagram
    participant UI as TTSDownloadDialog
    participant React as TTS Context\n(useTTS)
    participant Tauri as Tauri IPC
    participant Rust as TTS Backend\n(tts.rs)
    participant FS as File System

    UI->>React: startDownload()
    React->>Tauri: invoke tts_download_models()
    Tauri->>Rust: download request
    Rust->>FS: stream & save model files
    loop progress events
      Rust-->>Tauri: emit download progress
      Tauri-->>React: tts-download-progress event
      React-->>UI: update progress UI
    end
    Rust-->>Tauri: download complete
    React->>Tauri: invoke tts_load_models()
    Tauri->>Rust: load ONNX sessions / init engine
    Rust-->>Tauri: load complete
    Tauri-->>React: ready
    React-->>UI: status = ready
Loading
sequenceDiagram
    participant User as User
    participant UI as TTSButton
    participant React as TTS Context
    participant Tauri as Tauri IPC
    participant Rust as TTS Backend
    participant Audio as Browser Audio

    User->>UI: click play(text)
    UI->>React: speak(text, id)
    React->>Tauri: invoke tts_synthesize(text)
    Tauri->>Rust: synthesize request
    Rust->>Rust: preprocess → duration → encode → denoise → vocoder
    Rust-->>Tauri: TTSSynthesizeResponse (audio_base64)
    Tauri-->>React: deliver response
    React->>Audio: create Blob/URL and play
    Audio-->>React: onended
    React-->>UI: update playing state
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐇 I hopped through crates and ONNX nights,

I fetched the models, threaded all the bytes,
I taught the app to hum and sing,
Click play — the desktop hears the spring,
A rabbit stitched the audio wings.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main feature addition: on-device TTS implementation using Supertonic.
Linked Issues check ✅ Passed The PR fully implements the requirements from issue #363: adds TTS functionality for voice audio responses using on-device local processing.
Out of Scope Changes check ✅ Passed All changes are directly related to TTS feature implementation. Added documentation and build targets support the core feature and are not out of scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6509a68 and 8004fa1.

📒 Files selected for processing (1)
  • frontend/src/services/tts/TTSContext.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
  • frontend/src/services/tts/TTSContext.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-android
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: build-linux
  • GitHub Check: build-ios
  • GitHub Check: Cloudflare Pages

Comment @coderabbitai help to get the list of available commands and usage tips.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Dec 4, 2025

Deploying maple with  Cloudflare Pages  Cloudflare Pages

Latest commit: 8004fa1
Status: ✅  Deploy successful!
Preview URL: https://7230bc57.maple-ca8.pages.dev
Branch Preview URL: https://feature-tts-supertonic.maple-ca8.pages.dev

View logs

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 4, 2025

Greptile Summary

This PR adds on-device text-to-speech functionality using Supertonic, a fast ONNX-based TTS engine. The implementation downloads ~265MB of models from HuggingFace on first use and performs all synthesis locally on the desktop app.

Key Changes:

  • Rust backend (tts.rs): Complete TTS pipeline with ONNX Runtime for inference, Unicode text processing, model downloading with progress tracking, and WAV encoding
  • React context (TTSContext.tsx): State management for TTS lifecycle, download progress, and Web Audio API playback
  • UI integration: Download dialog, per-message speaker buttons with long-press for settings
  • Desktop-only: Dependencies properly scoped to desktop platforms; gracefully degrades on web/mobile

Previous Review Feedback Addressed:

  • Regex compilation optimized using Lazy static (no longer recompiled on every call)
  • Memory leaks fixed: object URLs properly revoked on stop and unmount
  • File integrity: explicit flush before renaming temp files to final paths
  • Sentence splitting preserves punctuation correctly

Minor Issues:

  • State inconsistency when preprocessed text is empty (one logic issue found)
  • Voice selection hardcoded to F2 (style improvement suggested)
  • Duplicate markdown removal split between frontend and backend (acceptable design choice)

Confidence Score: 4/5

  • This PR is safe to merge with one minor logic fix recommended
  • The implementation is solid with comprehensive error handling, proper resource cleanup, and desktop-only scoping. Previous review concerns have been addressed. One logic issue exists (state inconsistency on empty text) but won't cause crashes. The ONNX models are downloaded from the official Supertonic HuggingFace repo with integrity checks.
  • frontend/src/services/tts/TTSContext.tsx - fix state inconsistency when preprocessed text is empty (line 217)

Important Files Changed

Filename Overview
frontend/src-tauri/src/tts.rs Core TTS implementation with ONNX inference, text preprocessing, model downloading, and WAV encoding. Previous review feedback addressed (regex compilation, URL revocation, flush on download).
frontend/src/services/tts/TTSContext.tsx React context managing TTS state, download progress, and audio playback via Web Audio API. All memory leak issues from previous reviews have been fixed.
frontend/src/components/TTSDownloadDialog.tsx Modal dialog for TTS setup with download progress, status indicators, and model management. Clean implementation with proper state handling.
frontend/src-tauri/Cargo.toml Added desktop-only TTS dependencies (ort, ndarray, hound, etc.) properly scoped to desktop platforms only.

Sequence Diagram

sequenceDiagram
    participant User
    participant UI as React UI
    participant TTSContext
    participant Tauri as Tauri Bridge
    participant Rust as Rust Backend
    participant HF as HuggingFace
    participant ONNX as ONNX Runtime

    Note over User,ONNX: Initial Setup Flow
    User->>UI: Opens TTS Dialog
    UI->>TTSContext: checkStatus()
    TTSContext->>Tauri: invoke("tts_get_status")
    Tauri->>Rust: tts_get_status()
    Rust->>Rust: Check if models exist
    Rust-->>Tauri: TTSStatusResponse
    Tauri-->>TTSContext: {models_downloaded, models_loaded}
    TTSContext-->>UI: Update status

    alt Models not downloaded
        User->>UI: Click Download
        UI->>TTSContext: startDownload()
        TTSContext->>Tauri: invoke("tts_download_models")
        Tauri->>Rust: tts_download_models()
        loop For each model file
            Rust->>HF: HTTP GET model file
            HF-->>Rust: Stream chunks
            Rust->>Rust: Write to temp file
            Rust->>Tauri: emit("tts-download-progress")
            Tauri-->>TTSContext: Progress event
            TTSContext-->>UI: Update progress bar
        end
        Rust->>Rust: Flush and rename temp files
        Rust-->>Tauri: Success
    end

    Note over User,ONNX: Model Loading Flow
    TTSContext->>Tauri: invoke("tts_load_models")
    Tauri->>Rust: tts_load_models()
    Rust->>Rust: Load tts.json config
    Rust->>Rust: Load unicode_indexer.json
    Rust->>ONNX: Load duration_predictor.onnx
    Rust->>ONNX: Load text_encoder.onnx
    Rust->>ONNX: Load vector_estimator.onnx
    Rust->>ONNX: Load vocoder.onnx
    Rust->>Rust: Load F2.json voice style
    Rust->>Rust: Store in TTSState
    Rust-->>Tauri: Success
    Tauri-->>TTSContext: Models loaded
    TTSContext-->>UI: status = "ready"

    Note over User,ONNX: Text-to-Speech Synthesis
    User->>UI: Click speaker icon
    UI->>TTSContext: speak(text, messageId)
    TTSContext->>TTSContext: preprocessTextForTTS()
    TTSContext->>Tauri: invoke("tts_synthesize", {text})
    Tauri->>Rust: tts_synthesize(text)
    Rust->>Rust: chunk_text(text, 300)
    loop For each chunk
        Rust->>Rust: preprocess_text(chunk)
        Rust->>Rust: UnicodeProcessor.call()
        Rust->>ONNX: Run duration_predictor
        ONNX-->>Rust: Duration predictions
        Rust->>ONNX: Run text_encoder
        ONNX-->>Rust: Text embeddings
        Rust->>Rust: sample_noisy_latent()
        loop 10 denoising steps
            Rust->>ONNX: Run vector_estimator
            ONNX-->>Rust: Denoised latent
        end
        Rust->>ONNX: Run vocoder
        ONNX-->>Rust: Audio waveform
        Rust->>Rust: Concatenate with silence
    end
    Rust->>Rust: wav_to_base64()
    Rust-->>Tauri: TTSSynthesizeResponse
    Tauri-->>TTSContext: {audio_base64, sample_rate}
    TTSContext->>TTSContext: base64ToBlob()
    TTSContext->>TTSContext: Create AudioContext
    TTSContext->>TTSContext: Decode audio buffer
    TTSContext->>User: Play audio via Web Audio API
    TTSContext-->>UI: isPlaying = true

    Note over User,ONNX: Stop/Cleanup
    User->>UI: Click stop or unmount
    UI->>TTSContext: stop()
    TTSContext->>TTSContext: sourceNode.stop()
    TTSContext->>TTSContext: audioContext.close()
    TTSContext->>TTSContext: URL.revokeObjectURL()
    TTSContext-->>UI: isPlaying = false
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
frontend/src/components/UnifiedChat.tsx (1)

179-233: TTSButton implementation looks solid with proper environment check and state handling.

The component correctly:

  • Returns null when not in Tauri environment (line 193-195)
  • Handles different TTS states appropriately
  • Provides proper aria-labels for accessibility

However, there's a potential issue with the async handleClick function:

The speak call on line 207 is awaited but errors are not handled. Consider adding error handling:

  const handleClick = async () => {
    if (status === "not_downloaded" || status === "error") {
      onNeedsSetup();
      return;
    }

    if (status === "ready") {
      if (isThisPlaying) {
        stop();
      } else {
-        await speak(text, messageId);
+        try {
+          await speak(text, messageId);
+        } catch (error) {
+          console.error("TTS playback failed:", error);
+        }
      }
    }
  };
frontend/src-tauri/src/tts.rs (3)

115-219: Consider compiling regexes once to improve performance.

The preprocess_text function creates multiple Regex objects on every call. Since this function is called for each text chunk during synthesis, this causes repeated compilation overhead.

Consider using lazy_static or once_cell::sync::Lazy to compile regexes once:

use once_cell::sync::Lazy;

static EMOJI_PATTERN: Lazy<Regex> = Lazy::new(|| {
    Regex::new(r"[\x{1F600}-\x{1F64F}...]").unwrap()
});

static DIACRITICS_PATTERN: Lazy<Regex> = Lazy::new(|| {
    Regex::new(r"[\u{0302}...]").unwrap()
});

// ... other regexes

fn preprocess_text(text: &str) -> String {
    let mut text: String = text.nfkd().collect();
    text = EMOJI_PATTERN.replace_all(&text, "").to_string();
    // ...
}

This is especially important since chunk_text may split long messages into many chunks, each processed separately.


734-771: Mutex held during CPU-intensive synthesis may cause contention.

The mutex guard is held at line 739 and only released at line 759, after the potentially long-running synthesize() call completes. For longer texts, ONNX inference across multiple chunks could take several seconds, blocking all other TTS commands.

Consider whether the ONNX sessions truly need mutable access. If Session::run only requires &self, you could restructure to minimize lock duration:

// Clone what's needed, release lock quickly
let (tts_data, style) = {
    let guard = state.lock().map_err(|e| e.to_string())?;
    let style = guard.style.as_ref().ok_or("Voice style not loaded")?.clone();
    // If possible, clone or Arc the session references
    // ...
};

// Synthesize outside the lock
let audio = /* ... */;

Alternatively, if mutable access is truly required by ONNX Runtime, this is acceptable for now but document the limitation. For production, consider using tokio::task::spawn_blocking to avoid blocking the async runtime.


536-566: Consider configuring ONNX session options for better performance.

The ONNX sessions are created with default settings. For a better user experience, you might want to configure the execution:

use ort::session::SessionBuilder;

let dp_ort = SessionBuilder::new()?
    .with_intra_threads(4)?  // Limit thread usage
    .commit_from_file(models_dir.join("duration_predictor.onnx"))?;

This is optional but could help with:

  • Controlling CPU usage during inference
  • Potentially enabling GPU acceleration in the future
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad67576 and fc94dfa.

⛔ Files ignored due to path filters (1)
  • frontend/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • frontend/src-tauri/Cargo.toml (1 hunks)
  • frontend/src-tauri/src/lib.rs (4 hunks)
  • frontend/src-tauri/src/pdf_extractor.rs (1 hunks)
  • frontend/src-tauri/src/tts.rs (1 hunks)
  • frontend/src/app.tsx (2 hunks)
  • frontend/src/components/TTSDownloadDialog.tsx (1 hunks)
  • frontend/src/components/UnifiedChat.tsx (8 hunks)
  • frontend/src/services/tts/TTSContext.tsx (1 hunks)
  • frontend/src/services/tts/index.ts (1 hunks)
  • justfile (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx,js,jsx}: Use 2-space indentation, double quotes, and a 100-character line limit for formatting
Use camelCase for variable and function names
Use try/catch with specific error types for error handling

Files:

  • frontend/src/services/tts/index.ts
  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/app.tsx
  • frontend/src/services/tts/TTSContext.tsx
  • frontend/src/components/UnifiedChat.tsx
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use strict TypeScript typing and avoid any when possible

Files:

  • frontend/src/services/tts/index.ts
  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/app.tsx
  • frontend/src/services/tts/TTSContext.tsx
  • frontend/src/components/UnifiedChat.tsx
🧠 Learnings (1)
📚 Learning: 2025-07-19T21:31:44.925Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-19T21:31:44.925Z
Learning: Applies to src/**/*.{tsx} : Use React context for global state management when needed

Applied to files:

  • frontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (4)
frontend/src/components/TTSDownloadDialog.tsx (1)
frontend/src/services/tts/TTSContext.tsx (1)
  • useTTS (251-257)
frontend/src/app.tsx (5)
frontend/src/services/tts/index.ts (1)
  • TTSProvider (1-1)
frontend/src/components/BillingServiceProvider.tsx (1)
  • BillingServiceProvider (5-14)
frontend/src/components/ProxyEventListener.tsx (1)
  • ProxyEventListener (7-58)
frontend/src/components/UpdateEventListener.tsx (1)
  • UpdateEventListener (11-65)
frontend/src/components/DeepLinkHandler.tsx (1)
  • DeepLinkHandler (9-109)
frontend/src/services/tts/TTSContext.tsx (1)
frontend/src/services/tts/index.ts (3)
  • TTSStatus (2-2)
  • TTSProvider (1-1)
  • useTTS (1-1)
frontend/src-tauri/src/lib.rs (1)
frontend/src-tauri/src/tts.rs (7)
  • new (86-88)
  • new (341-346)
  • tts_get_status (602-620)
  • tts_download_models (631-704)
  • tts_load_models (707-725)
  • tts_synthesize (735-771)
  • tts_unload_models (774-780)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-ios
  • GitHub Check: build-linux
  • GitHub Check: build-android
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (16)
justfile (1)

46-57: Desktop debug build targets look consistent and useful

The new debug and debug-no-cc recipes mirror the existing release targets and should help diagnose desktop issues without impacting current workflows.

frontend/src/services/tts/TTSContext.tsx (1)

60-72: Overall TTS provider shape and status lifecycle look solid

The context cleanly encapsulates TTS state (status, errors, progress, playback) and exposes a small surface (checkStatus, startDownload, speak, stop). Checking isTauri() once, defaulting to "not_available" off-Tauri, and auto-checking status on mount is consistent with the existing platform utilities and React-context usage.

frontend/src/app.tsx (1)

17-17: TTSProvider placement in the provider tree looks appropriate

Wrapping BillingServiceProvider (and thus ProxyEventListener, UpdateEventListener, DeepLinkHandler, and InnerApp) with TTSProvider ensures all consumers of useTTS sit under the context without altering existing provider ordering. This is a good integration point.

Also applies to: 102-109

frontend/src-tauri/Cargo.toml (1)

42-53: TTS dependency set looks coherent; please verify cross-platform builds and versions

The added crates (ONNX Runtime via ort, ndarray with rayon, reqwest with streaming, dirs, etc.) are consistent with a local Supertonic TTS pipeline. Since they’re in the shared [dependencies] section (not cfg-gated), they’ll be compiled for all targets, including Android/iOS.

Please verify:

  • That ort = "2.0.0-rc.7" and the other new crates build cleanly for your mobile targets.
  • That there are no unexpected size or licensing issues introduced by this set.

Consider adding cfg-gating or feature flags later if you decide TTS should remain desktop-only.

frontend/src-tauri/src/pdf_extractor.rs (1)

19-27: PDF extractor changes are formatting-only and safe

The adjustments here are purely stylistic (parameter layout and whitespace) and do not affect decoding, file-type branching, or response construction.

Also applies to: 33-38, 43-51

frontend/src/services/tts/index.ts (1)

1-2: Barrel re-exports for TTS are clean and useful

Re-exporting TTSProvider, useTTS, and TTSStatus from the TTS module simplifies imports for consumers and keeps the public surface centralized.

frontend/src-tauri/src/lib.rs (2)

5-8: TTS state management and command wiring on desktop look correct

Registering mod tts;, managing tts::TTSState::new(), and adding the TTS commands to the desktop invoke_handler cleanly integrates the backend TTS pipeline. Scoping these commands to the desktop builder matches the “desktop-only TTS” goal.

Also applies to: 39-54


387-399: Update-ready event emission and logging structure are clear

The app_handle.emit("update-ready", UpdateReadyPayload { ... }) block, along with the success/failure logging, is straightforward and keeps the updater behavior unchanged while improving observability.

frontend/src/components/TTSDownloadDialog.tsx (1)

1-160: TTS download/setup dialog behavior matches the context state machine well

The dialog cleanly maps TTSStatus into UX states (not available, error, processing, ready, initial), wires startDownload appropriately, and prevents accidental closure while work is in progress. The progress bar and messaging around the one-time ~MB download and local-only processing are clear.

frontend/src/components/UnifiedChat.tsx (3)

412-427: LGTM!

The MessageList signature update correctly adds the onTTSSetupOpen callback with proper typing. The integration follows the existing pattern for handling dialogs in this component.


626-633: LGTM!

The TTSButton integration mirrors the existing CopyButton pattern, reusing the same text extraction logic. The button is properly placed within the hover-reveal action group for assistant messages.


3023-3024: LGTM!

The TTS setup dialog follows the same pattern as other dialogs in the component (e.g., WebSearchInfoDialog, ContextLimitDialog). State management is consistent with the rest of the codebase.

frontend/src-tauri/src/tts.rs (4)

335-347: LGTM!

The TTSState design with Mutex<Self> is appropriate here. While using std::sync::Mutex in async contexts can be problematic if held across .await points, the current implementation correctly performs all mutex-guarded operations synchronously before any awaits.


281-333: Text chunking logic is functional but has the same regex compilation overhead.

The function works correctly for splitting text into manageable chunks. The regex objects (para_re, sentence_re) are recreated on each call - same recommendation as preprocess_text applies.

Note: The sentence boundary regex ([.!?])\s+ may not handle all edge cases (e.g., abbreviations like "Dr. Smith"), but this is acceptable for TTS where minor imperfections in chunk boundaries won't significantly affect output quality.


568-588: LGTM!

The WAV encoding function correctly clamps audio samples to prevent overflow and uses proper 16-bit PCM encoding. The in-memory buffer approach is efficient for the expected audio sizes.


706-725: LGTM!

The model loading function is straightforward. While it doesn't explicitly check if models are downloaded first, the error messages from load_tts_engine and load_voice_style will be informative if files are missing. The frontend's TTSContext guards against calling this before download completes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
frontend/src/services/tts/TTSContext.tsx (2)

10-10: Use isTauriDesktop() instead of isTauri() to prevent mobile TTS errors.

According to past review comments, TTS commands are only registered in the desktop builder. Using isTauri() returns true for both desktop and mobile Tauri environments, which would cause "unknown command" errors on mobile when invoking tts_get_status.

Replace isTauri with isTauriDesktop throughout this file (lines 10, 62, 78, 112, 118, 163).

-import { isTauri } from "@/utils/platform";
+import { isTauriDesktop } from "@/utils/platform";

And update usage:

-  const isTauriEnv = isTauri();
+  const isTauriEnv = isTauriDesktop();

208-215: Memory leak: Audio object URL not revoked on stop() or unmount.

When stop() is called or the provider unmounts during playback, the object URL created in speak() is never revoked, causing memory leaks over repeated play/stop cycles.

Add a ref to track the URL and revoke it in stop() and the unmount effect:

  const audioRef = useRef<HTMLAudioElement | null>(null);
+  const audioUrlRef = useRef<string | null>(null);

In stop():

  const stop = useCallback(() => {
    if (audioRef.current) {
      audioRef.current.pause();
      audioRef.current = null;
    }
+    if (audioUrlRef.current) {
+      URL.revokeObjectURL(audioUrlRef.current);
+      audioUrlRef.current = null;
+    }
    setIsPlaying(false);
    setCurrentPlayingId(null);
  }, []);

And store the URL in speak():

  const audioUrl = URL.createObjectURL(audioBlob);
+  audioUrlRef.current = audioUrl;
frontend/src-tauri/src/tts.rs (2)

295-347: Sentence splitting loses punctuation marks.

The regex pattern r"([.!?])\s+" uses a capture group, but split() discards captured groups, causing sentences to lose their ending punctuation.

Consider using split_inclusive or adjusting the pattern:

-    let sentence_re = Regex::new(r"([.!?])\s+").unwrap();
+    let sentence_re = Regex::new(r"(?<=[.!?])\s+").unwrap();

Or manually preserve punctuation by iterating matches instead of splitting. This affects the naturalness of TTS output since sentences will be synthesized without proper endings.


650-668: Add request timeout and handle partial downloads.

Two reliability concerns from past reviews still apply:

  1. No HTTP timeout: If the server hangs, the download blocks indefinitely.

  2. Partial file on failure: If download fails mid-stream, a partial file remains. Next tts_get_status will report models as downloaded.

+ use std::time::Duration;

  let client = reqwest::Client::builder()
+     .timeout(Duration::from_secs(300))
+     .connect_timeout(Duration::from_secs(30))
      .build()
-     .map_err(|e| format!("Failed to create HTTP client: {}", e))?;
+     .unwrap_or_else(|_| reqwest::Client::new());

For partial files, download to a .part file and rename on success:

+ let temp_path = file_path.with_extension("part");
- let mut file = File::create(&file_path)
+ let mut file = File::create(&temp_path)
      ...
+ // After successful download:
+ fs::rename(&temp_path, &file_path)?;
🧹 Nitpick comments (1)
frontend/src/components/UnifiedChat.tsx (1)

626-633: Consider extracting the text content logic to avoid duplication.

The same text extraction logic is duplicated between CopyButton and TTSButton. While acceptable for now, consider extracting this to a helper if more buttons are added.

const messageText = message.content
  .filter((p) => "text" in p && p.text)
  .map((p) => ("text" in p ? p.text : ""))
  .join("");
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc94dfa and e607185.

⛔ Files ignored due to path filters (1)
  • frontend/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • frontend/src-tauri/Cargo.toml (1 hunks)
  • frontend/src-tauri/src/lib.rs (7 hunks)
  • frontend/src-tauri/src/pdf_extractor.rs (1 hunks)
  • frontend/src-tauri/src/tts.rs (1 hunks)
  • frontend/src/app.tsx (2 hunks)
  • frontend/src/components/TTSDownloadDialog.tsx (1 hunks)
  • frontend/src/components/UnifiedChat.tsx (8 hunks)
  • frontend/src/services/tts/TTSContext.tsx (1 hunks)
  • frontend/src/services/tts/index.ts (1 hunks)
  • justfile (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • frontend/src/app.tsx
  • frontend/src/services/tts/index.ts
  • justfile
  • frontend/src-tauri/Cargo.toml
  • frontend/src-tauri/src/pdf_extractor.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx,js,jsx}: Use 2-space indentation, double quotes, and a 100-character line limit for formatting
Use camelCase for variable and function names
Use try/catch with specific error types for error handling

Files:

  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use strict TypeScript typing and avoid any when possible

Files:

  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-07-19T21:31:44.925Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-19T21:31:44.925Z
Learning: Applies to src/**/*.{tsx} : Use React context for global state management when needed

Applied to files:

  • frontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (3)
frontend/src/components/TTSDownloadDialog.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
  • useTTS (251-257)
frontend/src/components/ui/dialog.tsx (6)
  • Dialog (92-92)
  • DialogContent (97-97)
  • DialogHeader (98-98)
  • DialogTitle (100-100)
  • DialogDescription (101-101)
  • DialogFooter (99-99)
frontend/src/components/ui/button.tsx (1)
  • Button (62-62)
frontend/src/services/tts/TTSContext.tsx (1)
frontend/src/services/tts/index.ts (3)
  • TTSStatus (2-2)
  • TTSProvider (1-1)
  • useTTS (1-1)
frontend/src-tauri/src/lib.rs (1)
frontend/src-tauri/src/tts.rs (8)
  • init_tts_backend (20-30)
  • new (100-102)
  • new (355-360)
  • tts_get_status (616-634)
  • tts_download_models (645-718)
  • tts_load_models (721-739)
  • tts_synthesize (749-785)
  • tts_unload_models (788-794)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-ios
  • GitHub Check: build-android
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: build-linux
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (11)
frontend/src/components/UnifiedChat.tsx (2)

179-233: TTSButton implementation looks solid.

The component properly:

  • Guards rendering with isTauriEnv check
  • Handles all TTS status states appropriately
  • Provides clear visual feedback (spinner during loading states, stop icon during playback)
  • Uses the correct disabled states during transitional operations

3023-3024: TTS dialog integration is clean.

The dialog is properly controlled via state and placed appropriately in the component tree.

frontend/src/services/tts/TTSContext.tsx (1)

259-267: Helper function implementation is correct.

The base64ToBlob function correctly decodes base64 and creates a Blob with the appropriate MIME type.

frontend/src/components/TTSDownloadDialog.tsx (3)

18-24: Dialog implementation is well-structured.

The component properly handles all TTS status states with appropriate UI feedback. The handleDownload wrapper is a simple passthrough, which is fine but could be inlined if no additional logic is needed.


70-78: Error display is correctly guarded.

The condition hasError && error ensures the error panel only shows when there's an actual error message to display, preventing empty error states.


93-105: Progress bar behavior is well-designed.

The progress bar correctly shows 100% during the loading phase (after download completes), and the percentage text is appropriately hidden during non-download states.

frontend/src-tauri/src/lib.rs (2)

396-407: Minor formatting changes to emit call.

These are style/formatting adjustments with no functional impact.


264-273: > Likely an incorrect or invalid review comment.

frontend/src-tauri/src/tts.rs (3)

765-767: Synthesis parameters are hardcoded as per design.

The hardcoded values (10 inference steps, 1.2x speed) match the PR description. Consider exposing these as optional parameters in the future for user customization.


550-580: TTS engine loading is straightforward and correct.

The ONNX session creation uses default builder settings. Consider adding thread configuration or optimization options if performance tuning is needed later.


18-30: Platform-specific TTS backend initialization is correct.

The conditional compilation properly selects tract for iOS/Android and ONNX Runtime for desktop. This aligns with the cross-platform architecture.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@AnthonyRonning AnthonyRonning force-pushed the feature/tts-supertonic branch from fc94dfa to ca87d46 Compare January 7, 2026 18:27
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In @docs/tts-research.md:
- Around line 1-30: The Gemini section has many run-on paragraphs and missing
blank lines (e.g., "2026The trajectory", "SuccessOn desktop") and improperly
formatted code and URLs; fix by inserting blank lines between all headings (like
"## Gemini"), subheadings (e.g., "2. Architectural Deconstruction"), and
paragraph blocks so each logical paragraph is separated, correct concatenated
sentences into proper sentences, convert bare URLs into markdown links with
descriptive text (replace bare URLs with [link text](url)), and ensure every
code block uses fenced code blocks with language identifiers and closing ```
markers (e.g., ```rust, ```json) where the diff shows example snippets (the
Cargo.toml and Rust/JavaScript examples) so the document is valid Markdown and
readable.

In @frontend/src-tauri/src/tts.rs:
- Around line 281-333: The sentence-splitting regex in chunk_text discards
terminal punctuation because sentence_re = r"([.!?])\s+" is used with split();
update chunk_text to preserve sentence-ending punctuation by either using a
lookahead regex like r"(?<=[.!?])\s+" for splitting or use
split_inclusive/iterate matches so the punctuation stays part of each part; keep
the surrounding logic (current accumulation, trimming, max_len checks, and
fallback behavior) the same but operate on parts that include their terminal
punctuation so TTS prosody is preserved.

In @frontend/src/services/tts/TTSContext.tsx:
- Around line 161-215: The speak function creates an object URL (audioUrl) but
only revokes it in onended/onerror, causing leaks when stop() is called,
audio.play() throws, or the component unmounts; fix by tracking the created URL
in a ref (e.g., audioUrlRef) and centralize cleanup in a helper (e.g.,
revokeAudioUrl) that revokes URL, clears audioRef and audioUrlRef, and updates
state; call that helper from onended, onerror, the catch block in speak, stop(),
and a useEffect cleanup on unmount so all exit paths revoke the object URL.
🧹 Nitpick comments (3)
docs/tts-research.md (2)

3-27: Trim verbosity in the Gemini section for better readability.

The opening Gemini section (lines 3–27) reads as lengthy, marketing-oriented prose with excessive punctuation and repetitive framing. The content is technically sound, but 25 lines of introductory preamble before substantive technical details may obscure the key findings. Consider condensing the executive summary to 3–5 sentences and moving deeper narrative to subsections if detail is necessary.

Example: Replace the verbose opening with a concise summary:

-Definitive Comparative Analysis of Local TTS Architectures for Cross-Platform Tauri Integration: Supertonic 2 vs. Chatterbox Turbo1. Executive Strategic Overview: The Local Inference Convergence in 2026The trajectory of computational linguistics... [excessive framing]
+## Executive Summary

+For Tauri developers building cross-platform local TTS, Supertonic 2 (ONNX-native, 44M params) outweighs Chatterbox Turbo (PyTorch-based, 350M params) due to superior mobile compatibility, smaller footprint, and direct Rust integration. Chatterbox excels only when voice cloning or paralinguistic features are essential requirements.

29-180: Consolidate or clearly distinguish the three AI research perspectives.

The document contains three separate comprehensive analyses (Gemini, Claude, Grok) covering largely overlapping ground: architecture, model sizes, cross-platform support, performance, and Tauri integration. While multiple perspectives can add depth, the repetition may reduce clarity. If all three are retained for documentation completeness, add a brief preamble explaining their purpose (e.g., "The following sections represent independent research analyses to validate findings across sources").

Alternatively, extract consensus findings into a single unified recommendation, placing detailed comparative tables and architecture discussions in a "Detailed Comparisons" section, and move the three analyses to an appendix or separate file if historical record is important.

frontend/src-tauri/src/tts.rs (1)

734-771: Synthesis command holds mutex during potentially slow inference.

The tts_synthesize command acquires a mutex lock at line 739 and holds it through the entire synthesis operation (lines 751-753), which involves multiple ONNX model runs. This blocks other potential TTS operations.

The lock is released before base64 encoding (line 759), which is good, but consider whether the synthesis itself could be done with a shorter lock duration by cloning necessary state upfront.

For the current single-user desktop app context, this is acceptable. If concurrent TTS requests become possible in the future, consider restructuring to minimize lock duration.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e607185 and ca87d46.

⛔ Files ignored due to path filters (1)
  • frontend/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • docs/tts-research.md
  • frontend/src-tauri/Cargo.toml
  • frontend/src-tauri/src/lib.rs
  • frontend/src-tauri/src/tts.rs
  • frontend/src/app.tsx
  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
  • frontend/src/services/tts/index.ts
  • justfile
🚧 Files skipped from review as they are similar to previous changes (4)
  • frontend/src-tauri/Cargo.toml
  • frontend/src/services/tts/index.ts
  • justfile
  • frontend/src/app.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Use path aliases (@/* maps to ./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid using any type
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Run just format, just lint, and just build after making TypeScript/React changes to ensure code quality and compilation

Files:

  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
  • frontend/src/components/TTSDownloadDialog.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management

Applied to files:

  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (3)
frontend/src/components/UnifiedChat.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
  • useTTS (251-257)
frontend/src/components/ui/button.tsx (1)
  • Button (62-62)
frontend/src/components/TTSDownloadDialog.tsx (1)
  • TTSDownloadDialog (18-160)
frontend/src-tauri/src/lib.rs (1)
frontend/src-tauri/src/tts.rs (7)
  • new (86-88)
  • new (341-346)
  • tts_get_status (602-620)
  • tts_download_models (631-704)
  • tts_load_models (707-725)
  • tts_synthesize (735-771)
  • tts_unload_models (774-780)
frontend/src/components/TTSDownloadDialog.tsx (2)
frontend/src/components/ui/dialog.tsx (6)
  • Dialog (92-92)
  • DialogContent (97-97)
  • DialogHeader (98-98)
  • DialogTitle (100-100)
  • DialogDescription (101-101)
  • DialogFooter (99-99)
frontend/src/components/ui/button.tsx (1)
  • Button (62-62)
🪛 LanguageTool
docs/tts-research.md

[grammar] ~5-~5: Ensure spelling is correct
Context: ...ligible levels.The 44 Million Parameter AdvantageThe model operates with approximately 44 mi...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[uncategorized] ~5-~5: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...d) can often reside entirely within the System Level Cache (SLC) or high-speed RAM partition...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[typographical] ~5-~5: Consider using an en dash here instead of a hyphen.
Context: ... to be under a certain threshold, often 200MB-4GB depending on OS version) and the Google...

(QB_NEW_EN_DASH_RULE_EN)


[grammar] ~5-~5: Use a hyphen to join words.
Context: ...features. The dynamic nature of its flow matching steps and custom tokenizers mak...

(QB_NEW_EN_HYPHEN)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...he Tauri Framework Context: Integration RealitiesThe user's choice of Tauri as the applicati...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...).3.1 The "Sidecar Pattern" and Desktop SuccessOn desktop operating systems, Tauri suppor...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...obile Wall": Why Sidecars Fail on iOS & AndroidThe user's requirement for iOS and Android ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...fecycle.The Dependency Hell of Embedded PythonThe alternative to a sidecar is embedding t...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...ource path helpers).Android Integration StrategyJNI and Shared Libraries: Android requires ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...g the models.5.1 Real-Time Factor (RTF) BenchmarksThe "Real-Time Factor" measures the speed o...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...s the integration roadmap.6.1 Rust Core ConfigurationThe integration avoids the sidecar pattern ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...hin the Rust process.Step 1: Dependency ManagementIn src-tauri/Cargo.toml:Ini, TOML[dependen...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~13-~13: Ensure spelling is correct
Context: ...back rodio = "0.19" Step 2: Model Asset BundlingThe 268MB model file must be accessible to ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~26-~26: Ensure spelling is correct
Context: ...capability.10. Conclusion and Strategic RecommendationThe comparative analysis yields a definitiv...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~162-~162: Consider using a shorter alternative to avoid wordiness.
Context: ...ox models (350-500 million parameters), making it better suited for resource-constrained devices...

(MADE_IT_JJR)

🪛 markdownlint-cli2 (0.18.1)
docs/tts-research.md

26-26: Bare URL used

(MD034, no-bare-urls)


26-26: Bare URL used

(MD034, no-bare-urls)


169-169: Bare URL used

(MD034, no-bare-urls)


175-175: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Greptile Review
  • GitHub Check: build-linux
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: build-ios
  • GitHub Check: build-android
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (9)
docs/tts-research.md (2)

200-220: Verify performance benchmarks and resource claims against primary sources.

The document cites specific metrics (e.g., RTF of 0.006 on M4 Pro WebGPU at line 119, ~5GB VRAM for Chatterbox at line 199, 0.3 RTF on Raspberry Pi at line 120). While generally reasonable, these claims lack inline citations to benchmark reports or official documentation. For a research document, adding references or footnotes (e.g., linking to the GitHub repos' own benchmarks or published evaluations) strengthens credibility.


233-246: Ensure all citations are accurate and up-to-date.

The "Key Citations" section (lines 233–246) lists GitHub repos, blog posts, and videos. These are helpful, but verify:

  1. GitHub repo URLs match the latest stable branches (e.g., Supertone Inc. repo path).
  2. Blog and video dates align with the document's "January 2026" context (e.g., "Supertonic TTS is incredible" post should be recent).
  3. All links remain active; add a note if any are known to be archived or updated.
frontend/src-tauri/src/lib.rs (1)

6-6: TTS module integration looks correct for desktop-only functionality.

The TTS module, state management, and command handlers are appropriately scoped to the desktop build path. The mobile invoke handler (lines 261-263) correctly excludes TTS commands, aligning with the PR objective that TTS is desktop-only.

Also applies to: 38-38, 48-52

frontend/src/components/UnifiedChat.tsx (2)

192-246: Well-structured TTSButton component with proper state handling.

The component correctly:

  • Guards against non-Tauri environments by returning null early (line 206-208)
  • Handles all status states appropriately in handleClick
  • Disables the button during transitional states (checking, downloading, loading)
  • Shows appropriate icons for playing/stopped/loading states

The implementation follows React best practices with proper hook usage and conditional rendering.


756-764: TTS integration wired correctly through the component tree.

The TTS setup flow is properly connected:

  • ttsSetupDialogOpen state manages dialog visibility
  • onTTSSetupOpen callback is passed through MessageList to TTSButton
  • TTSDownloadDialog is rendered with appropriate props

Also applies to: 880-880, 3346-3348

frontend/src/components/TTSDownloadDialog.tsx (1)

18-159: Clean implementation of the TTS download dialog with comprehensive state handling.

The dialog properly handles all TTS lifecycle states:

  • Prevents dismissal during processing (line 34)
  • Shows appropriate messaging for each state (not_available, error, checking, downloading, loading, ready)
  • Progress bar and percentage display are conditionally rendered based on state
  • Footer buttons adapt to the current status

The component follows coding guidelines with proper path aliases, TypeScript types, and React patterns.

frontend/src/services/tts/TTSContext.tsx (2)

259-266: base64ToBlob helper is correctly implemented.

The function properly decodes base64 to bytes and creates a Blob with the specified MIME type.


62-62: Use isTauriDesktop() instead of isTauri() to prevent mobile invoke errors.

TTS commands are only registered in the desktop invoke handler (lib.rs lines 48-52), not in the mobile handler (lines 261-263). The current isTauri() check returns true for both desktop and mobile Tauri environments, which causes "unknown command" errors when tts_get_status is invoked on mobile.

Replace isTauri() with isTauriDesktop() at all usage points (lines 62, 78, 112, 118, 163) to ensure TTS operations are only attempted on desktop.

Also applies to: 78-78, 112-112, 118-118, 163-163

⛔ Skipped due to learnings
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to src-tauri/**/*.rs : Use `#[tauri::command]` attribute for Tauri command handlers
frontend/src-tauri/src/tts.rs (1)

1-588: Solid TTS engine implementation with proper ONNX integration.

The TTS pipeline is well-structured:

  • Text preprocessing handles Unicode normalization, emoji removal, and punctuation cleanup
  • Multi-stage inference (duration prediction → text encoding → denoising loop → vocoder)
  • Proper chunking for long texts with silence between chunks
  • WAV encoding with correct sample format

The ONNX session management and ndarray operations follow established patterns.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@AnthonyRonning AnthonyRonning force-pushed the feature/tts-supertonic branch from eab4fbc to 4898520 Compare January 7, 2026 19:11
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
frontend/src-tauri/src/tts.rs (1)

115-219: Consider caching compiled regexes for performance.

Multiple regexes are compiled on each call to preprocess_text(). For better performance, especially with longer texts or frequent calls, consider using lazy_static or once_cell::sync::Lazy to compile regexes once.

♻️ Example using once_cell
use once_cell::sync::Lazy;

static EMOJI_PATTERN: Lazy<Regex> = Lazy::new(|| {
    Regex::new(r"[\x{1F600}-\x{1F64F}...]").unwrap()
});

static DIACRITICS_PATTERN: Lazy<Regex> = Lazy::new(|| {
    Regex::new(r"[\u{0302}...]").unwrap()
});

// Then use &*EMOJI_PATTERN in the function
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eab4fbc and 4898520.

📒 Files selected for processing (2)
  • frontend/src-tauri/src/tts.rs
  • frontend/src/services/tts/TTSContext.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Use path aliases (@/* maps to ./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid using any type
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Run just format, just lint, and just build after making TypeScript/React changes to ensure code quality and compilation

Files:

  • frontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management

Applied to files:

  • frontend/src/services/tts/TTSContext.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Greptile Review
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: build-linux
  • GitHub Check: build-ios
  • GitHub Check: build-android
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (17)
frontend/src/services/tts/TTSContext.tsx (8)

1-13: LGTM!

Imports are clean and follow the coding guidelines with path aliases (@/utils/platform).


14-56: LGTM!

Type definitions are well-structured and align with the Rust backend response types.


60-74: LGTM!

Good pattern to check isTauriDesktop() once at mount and derive initial status accordingly.


76-108: LGTM!

The status checking logic correctly handles the various model states with appropriate error handling.


117-159: LGTM!

Download flow correctly manages the event listener lifecycle and handles both success and error paths.


161-213: Memory leak fixes properly implemented.

The speak callback now correctly revokes object URLs in all exit paths:

  • When stopping current audio to play new (lines 168-170)
  • On playback end/error (lines 190, 197)
  • In the catch block (lines 206-208)

This addresses the previous review feedback.


215-240: LGTM!

The stop() callback and unmount cleanup effect properly revoke object URLs, addressing the previous memory leak concerns.


264-280: LGTM!

The useTTS hook follows the standard context hook pattern with proper error handling for missing provider, and the base64ToBlob helper is clear and functional.

frontend/src-tauri/src/tts.rs (9)

1-41: LGTM!

Imports and constants are well-organized. Model file definitions with expected sizes enable proper progress tracking.


43-113: LGTM!

Configuration structs and UnicodeProcessor are well-structured. The processor gracefully handles out-of-range Unicode characters by assigning -1.


281-350: Punctuation preservation fix confirmed.

The sentence splitting now correctly preserves terminal punctuation by taking para[last_end..m.start() + 1] which includes the punctuation character. This addresses the previous review feedback about lost sentence endings.


352-511: LGTM!

The TTS engine implementation is well-structured with proper state management. The synthesis flow correctly:

  • Chunks text for manageable processing
  • Adds inter-chunk silence for natural pacing
  • Runs the multi-stage ONNX pipeline (duration prediction → text encoding → denoising → vocoder)

521-551: Voice style is hardcoded to F2.

The voice style is hardcoded to "F2.json" (line 522). Per the PR summary, this is intentional ("default/currently using F2"). If voice selection is planned for the future, consider accepting the style name as a parameter or adding a TODO comment.


647-738: Download reliability improvements confirmed.

The download implementation now includes all the previously requested reliability improvements:

  • HTTP timeouts (300s total, 30s connect) at lines 655-658
  • Partial file cleanup (.part extension) at lines 664, 682
  • Atomic file finalization (flush + rename) at lines 727-731

This prevents corrupted model files from being treated as valid.


768-805: Lock is held during synthesis.

The mutex guard is held at line 773 through the synthesis call at lines 785-787, which could take a noticeable amount of time for longer texts. While the guard is correctly dropped before encoding (line 793), this means concurrent TTS commands (e.g., status checks) will block during synthesis.

For a single-user desktop app this is likely acceptable, but consider documenting this behavior or adding a comment noting the intentional lock scope.


618-637: LGTM!

The status command correctly reports download and load state. The file existence check combined with the .part file mechanism in downloads should be sufficient to detect incomplete downloads.


740-759: LGTM!

Model loading and unloading commands are straightforward with proper error handling and logging.

Also applies to: 807-814

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
frontend/src-tauri/src/tts.rs (1)

618-637: Consider: Optional file integrity verification.

The status check uses simple file existence (line 626). While the .part temp-file download pattern mitigates partial-file issues, you could optionally verify file sizes match expected_size from MODEL_FILES for additional robustness against corruption.

Optional enhancement for file integrity
 let models_downloaded = MODEL_FILES
     .iter()
-    .all(|(name, _, _)| models_dir.join(name).exists());
+    .all(|(name, _, expected_size)| {
+        if let Ok(metadata) = models_dir.join(name).metadata() {
+            metadata.len() == *expected_size
+        } else {
+            false
+        }
+    });
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4898520 and 0214737.

📒 Files selected for processing (5)
  • frontend/src-tauri/src/lib.rs
  • frontend/src-tauri/src/tts.rs
  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
🚧 Files skipped from review as they are similar to previous changes (2)
  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/components/UnifiedChat.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Use path aliases (@/* maps to ./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid using any type
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Run just format, just lint, and just build after making TypeScript/React changes to ensure code quality and compilation

Files:

  • frontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management

Applied to files:

  • frontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (1)
frontend/src-tauri/src/lib.rs (1)
frontend/src-tauri/src/tts.rs (8)
  • new (86-88)
  • new (358-363)
  • tts_get_status (619-637)
  • tts_download_models (648-738)
  • tts_load_models (741-759)
  • tts_synthesize (769-805)
  • tts_unload_models (808-814)
  • tts_delete_models (817-834)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Greptile Review
  • GitHub Check: build-android
  • GitHub Check: build-ios
  • GitHub Check: build-linux
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (9)
frontend/src/services/tts/TTSContext.tsx (4)

1-58: LGTM: Clean type definitions and imports.

The imports follow the project's path alias conventions, and the TypeScript type definitions are strict and well-structured. The exported TTSStatus type provides a clear public API surface.


62-77: Well-addressed: Correct desktop-only detection.

The use of isTauriDesktop() (line 64) correctly ensures TTS functionality is only available on desktop Tauri environments, preventing the "unknown command" errors that would occur on mobile. This addresses the concern raised in previous reviews.


190-269: Excellent: Comprehensive audio resource cleanup.

The audio playback implementation now properly revokes object URLs in all exit paths:

  • When stopping previous audio (line 198)
  • On playback end (line 219) and error (line 226)
  • In the error catch block (line 236)
  • In the explicit stop() function (line 248)
  • On component unmount (line 265)

This comprehensively addresses the memory leak concerns raised in previous reviews.


302-310: LGTM: Correct base64-to-blob conversion.

The helper function uses standard browser APIs correctly to convert the base64-encoded audio data from the backend into a playable Blob.

frontend/src-tauri/src/tts.rs (4)

1-79: LGTM: Well-structured configuration and dependencies.

The model definitions, constants, and configuration types are clearly organized. The TOTAL_MODEL_SIZE of ~265 MB aligns with the PR description, and the serde annotations correctly support JSON serialization for config files and voice styles.


281-350: Well-fixed: Sentence chunking now preserves punctuation.

The chunk_text function now uses find_iter with manual slice construction (line 308: para[last_end..m.start() + 1]) rather than split(), ensuring terminal punctuation is preserved in each sentence. This addresses the concern raised in previous reviews about lost punctuation affecting TTS prosody.


647-738: Excellent: Robust download implementation with timeouts and atomic completion.

The download function now includes:

  • HTTP timeouts (300s total, 30s connect) to prevent indefinite hangs (lines 656-657)
  • Atomic file completion using a .part temp file pattern (line 664)
  • Explicit flush and rename on success (lines 727-731)

This prevents both indefinite blocking and partial-file corruption issues raised in previous reviews. The existence check (line 667) correctly tests the final file_path, not the temp file.


740-834: LGTM: Clean command implementations with proper locking.

The Tauri commands are well-structured:

  • tts_load_models correctly loads both engine and style into shared state
  • tts_synthesize properly clones the style (line 780) to avoid borrow conflicts and drops the guard early (line 793) to minimize lock contention during encoding
  • tts_delete_models correctly unloads models before filesystem removal

The hardcoded synthesis parameters (10 steps, 1.2× speed at line 786) align with the PR description's configuration.

frontend/src-tauri/src/lib.rs (1)

6-6: LGTM: Correct desktop-only TTS integration.

The TTS module is properly integrated into the desktop builder only:

  • Module declaration (line 6)
  • State initialization with TTSState::new() (line 38)
  • Six TTS commands registered in the desktop invoke_handler (lines 48-53)

The mobile builder (lines 262-264) correctly excludes TTS commands, preventing "unknown command" errors when the frontend's isTauriDesktop() check correctly gates TTS usage.

Also applies to: 38-53

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
frontend/src-tauri/src/tts.rs (7)

114-241: Consider refactoring into smaller helper functions.

This 127-line function handles normalization, markdown removal, emoji removal, symbol replacement, punctuation fixing, and more. Breaking it into focused helpers (e.g., remove_markdown, normalize_symbols, fix_punctuation) would improve readability and testability.


434-537: Optional: Extract pipeline stages into helper methods.

The 103-line infer method orchestrates a multi-stage pipeline (duration prediction, text encoding, latent sampling, denoising loop, vocoding). Consider extracting stages like predict_duration, encode_text, denoise_latent, and generate_waveform to improve readability.


548-578: Make voice style configurable.

The voice style is hardcoded to F2.json (line 549), but the PR summary mentions four available styles (F1, F2, M1, M2). Consider adding a parameter to allow users to select their preferred voice, or document why F2 is the default-only choice.

Example: Add voice parameter
-fn load_voice_style(models_dir: &Path) -> Result<Style> {
-    let style_path = models_dir.join("F2.json");
+fn load_voice_style(models_dir: &Path, voice: &str) -> Result<Style> {
+    let style_path = models_dir.join(format!("{}.json", voice));
     let file = File::open(&style_path).context("Failed to open voice style file")?;
     // ... rest of function

Then expose voice selection in Tauri commands or config.


651-653: Consider verifying file size in addition to existence.

The status check only verifies file existence. If a previous download was interrupted despite the temp file safeguards, or if files become corrupted on disk, models_downloaded will return true but loading will fail.

Optional: Add size verification
 let models_downloaded = MODEL_FILES
     .iter()
-    .all(|(name, _, _)| models_dir.join(name).exists());
+    .all(|(name, _, expected_size)| {
+        let path = models_dir.join(name);
+        path.exists() && path.metadata().map(|m| m.len() == *expected_size).unwrap_or(false)
+    });

795-838: Consider making synthesis parameters configurable.

The total_step (10) and speed (1.2) parameters are hardcoded at line 815. While these match the PR's documented defaults, exposing them as optional parameters would allow users to trade quality/speed or adjust playback rate.

Example: Add optional parameters
 #[tauri::command]
 pub async fn tts_synthesize(
     text: String,
+    speed: Option<f32>,
+    steps: Option<usize>,
     state: tauri::State<'_, Mutex<TTSState>>,
 ) -> Result<TTSSynthesizeResponse, String> {
     // ... existing code ...
     
+    let speed = speed.unwrap_or(1.2);
+    let steps = steps.unwrap_or(10);
+    
     let audio = tts
-        .synthesize(&text, &style, 10, 1.2)
+        .synthesize(&text, &style, steps, speed)
         .map_err(|e| format!("TTS synthesis failed: {}", e))?;

674-765: Optional: Add cancellation support for long downloads.

Downloading 265 MB can take significant time on slower connections. Consider adding cancellation support so users can abort if needed. Tauri supports abort signals through event listeners.


767-786: Consider checking available memory before loading models.

Loading ~265 MB of ONNX models could cause issues on memory-constrained devices. While this is an on-device feature for desktop, checking available memory and providing a clear error message would improve user experience.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0214737 and 803408d.

📒 Files selected for processing (2)
  • frontend/src-tauri/src/tts.rs
  • frontend/src/services/tts/TTSContext.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
  • frontend/src/services/tts/TTSContext.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Greptile Review
  • GitHub Check: build-ios
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: build-linux
  • GitHub Check: build-android
  • GitHub Check: Cloudflare Pages

@marksftw
Copy link
Contributor

marksftw commented Jan 7, 2026

two small nits:

  1. I think the "copy" and "speaker" buttons should always be on the screen. I didn't know where to find the speaker icon at first until I moved the mouse. It looks like ChatGPT has settled on having them ever-present on the screen underneath each response.
  2. The colors on the model download progress bar felt like they should be flipped. in dark mode, blue filling up the white. In light mode, purple filling up black.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
frontend/src-tauri/src/tts.rs (2)

549-550: Consider making voice selection configurable.

The voice style is hardcoded to F2.json, but all four voice models (F1, F2, M1, M2) are downloaded. Consider exposing voice selection as a parameter in the tts_load_models or tts_synthesize commands to allow users to choose their preferred voice style.

Example: Add voice parameter to load command
 #[tauri::command]
-pub async fn tts_load_models(state: tauri::State<'_, Mutex<TTSState>>) -> Result<(), String> {
+pub async fn tts_load_models(
+    state: tauri::State<'_, Mutex<TTSState>>,
+    voice: Option<String>
+) -> Result<(), String> {
     let models_dir = get_tts_models_dir().map_err(|e| e.to_string())?;
+    let voice_name = voice.unwrap_or_else(|| "F2".to_string());
     
-    let style = load_voice_style(&models_dir)
+    let style = load_voice_style(&models_dir, &voice_name)
         .map_err(|e| format!("Failed to load voice style: {}", e))?;

Then update load_voice_style signature:

-fn load_voice_style(models_dir: &Path) -> Result<Style> {
-    let style_path = models_dir.join("F2.json");
+fn load_voice_style(models_dir: &Path, voice_name: &str) -> Result<Style> {
+    let style_path = models_dir.join(format!("{}.json", voice_name));

815-816: Consider exposing synthesis parameters for user customization.

The inference steps (10) and speed multiplier (1.2) are hardcoded, but users might want to adjust the quality/speed tradeoff. Consider adding optional parameters to tts_synthesize:

Example: Add optional parameters
 #[tauri::command]
 pub async fn tts_synthesize(
     text: String,
     state: tauri::State<'_, Mutex<TTSState>>,
+    inference_steps: Option<usize>,
+    speed: Option<f32>
 ) -> Result<TTSSynthesizeResponse, String> {
+    let steps = inference_steps.unwrap_or(10);
+    let speed_multiplier = speed.unwrap_or(1.2);
+    
     let audio = tts
-        .synthesize(&text, &style, 10, 1.2)
+        .synthesize(&text, &style, steps, speed_multiplier)
         .map_err(|e| format!("TTS synthesis failed: {}", e))?;

Typical ranges: inference_steps: 5-20 (higher = better quality, slower), speed: 0.8-1.5 (lower = slower speech, more natural).

frontend/src/components/TTSDownloadDialog.tsx (2)

56-66: Simplify nested ternary logic for better readability.

The deeply nested ternary operators (4 levels) make this code harder to read and maintain. Consider extracting the description text into a helper function or using early returns.

Refactor suggestion
+  const getDialogDescription = () => {
+    if (isNotAvailable) {
+      return "TTS is only available in the desktop app.";
+    }
+    if (isReady) {
+      return "TTS is ready! You can now listen to assistant messages.";
+    }
+    if (hasError) {
+      return "There was an error setting up TTS.";
+    }
+    if (isProcessing) {
+      return "Setting up TTS. Please keep this window open.";
+    }
+    return `Listen to assistant messages with natural-sounding speech. This requires a one-time download of ~${Math.round(totalSizeMB)} MB.`;
+  };
+
   <DialogDescription className="text-base">
-    {isNotAvailable
-      ? "TTS is only available in the desktop app."
-      : isReady
-        ? "TTS is ready! You can now listen to assistant messages."
-        : hasError
-          ? "There was an error setting up TTS."
-          : isProcessing
-            ? "Setting up TTS. Please keep this window open."
-            : `Listen to assistant messages with natural-sounding speech. This requires a one-time download of ~${Math.round(totalSizeMB)} MB.`}
+    {getDialogDescription()}
   </DialogDescription>

170-199: Footer button logic follows similar nested ternary pattern.

While this section is somewhat more manageable than the description text, consider applying a similar extraction pattern for consistency and maintainability if refactoring the DialogDescription (lines 56-66).

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bcbbd55 and c83843c.

📒 Files selected for processing (3)
  • frontend/src-tauri/src/tts.rs
  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/components/UnifiedChat.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Use path aliases (@/* maps to ./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid using any type
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Run just format, just lint, and just build after making TypeScript/React changes to ensure code quality and compilation

Files:

  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/components/TTSDownloadDialog.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management

Applied to files:

  • frontend/src/components/UnifiedChat.tsx
🧬 Code graph analysis (2)
frontend/src/components/UnifiedChat.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
  • useTTS (304-310)
frontend/src/components/ui/button.tsx (1)
  • Button (62-62)
frontend/src/components/TTSDownloadDialog.tsx (1)
  • TTSDownloadDialog (18-203)
frontend/src/components/TTSDownloadDialog.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
  • useTTS (304-310)
frontend/src/components/ui/dialog.tsx (6)
  • Dialog (92-92)
  • DialogContent (97-97)
  • DialogHeader (98-98)
  • DialogTitle (100-100)
  • DialogDescription (101-101)
  • DialogFooter (99-99)
frontend/src/components/ui/button.tsx (1)
  • Button (62-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Greptile Review
  • GitHub Check: build-linux
  • GitHub Check: build-ios
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: build-android
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (8)
frontend/src-tauri/src/tts.rs (3)

100-104: LGTM! OOV handling now uses padding token.

The change from -1 to 0 for out-of-vocabulary characters correctly uses the padding token instead of an invalid sentinel value, which prevents potential embedding lookup failures in the ONNX models.


304-373: Sentence boundary handling preserves punctuation correctly.

The implementation at line 331 uses para[last_end..m.start() + 1] which correctly includes the terminal punctuation character in each sentence chunk, addressing the previous concern about lost punctuation.


675-766: LGTM! Download reliability improvements properly implemented.

The implementation now includes:

  • HTTP timeouts (300s request, 30s connect) to prevent indefinite hangs
  • Temp file with .part extension to avoid treating partial downloads as complete
  • Cleanup of partial downloads before retry (line 710)
  • Explicit flush and atomic rename (lines 754-759) to finalize downloads

These changes address all previous reliability concerns.

frontend/src/components/UnifiedChat.tsx (5)

208-215: LGTM! Timer cleanup properly implemented.

The useEffect cleanup function now properly clears the longPressTimer on component unmount, preventing the timeout from firing after unmount and attempting to call onManage(). This addresses the previous review concern.


217-220: Correct platform check for TTS availability.

The early return for non-Tauri environments properly ensures the TTS button only renders in the desktop app, where the TTS backend is available.


222-235: Click handler logic correctly manages TTS states.

The implementation properly handles three cases:

  1. Not ready/error → trigger setup flow
  2. Playing current message → stop playback
  3. Ready and idle → start playback

790-800: LGTM! Copy and TTS buttons now always visible.

The buttons are rendered without hover states, making them consistently visible as requested in the PR feedback. The flex layout cleanly presents both action buttons together.


3382-3383: TTS dialog properly integrated into component hierarchy.

The dialog is correctly wired with state managed at the UnifiedChat level and callbacks passed to child components. The integration follows React best practices for component composition.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @frontend/src/services/tts/TTSContext.tsx:
- Around line 230-266: The code creates an object URL (audioUrl) from audioBlob
but only revokes it in the source.onended handler, so if an error occurs before
playback the URL leaks; update the try/catch flow in the function that uses
base64ToBlob, audioUrl, audioContextRef, and sourceNodeRef so that audioUrl is
revoked in the catch block (and any early-return/error paths) — ensure you track
audioUrl in a variable scoped outside the try and call
URL.revokeObjectURL(audioUrl) in the catch before cleaning up
audioContextRef.current and sourceNodeRef.current, guarding against a
null/undefined audioUrl.
- Around line 271-286: The stop() function can leak object URLs because it stops
the AudioBufferSourceNode directly and onended may not fire, so revoke the audio
URL created in speak(); add an audioUrlRef (e.g., audioUrlRef.current set in
speak() where audioUrl is created) and in stop() check audioUrlRef.current, call
URL.revokeObjectURL(audioUrlRef.current) and clear it, then proceed to
stop/close and null refs (sourceNodeRef, audioContextRef) and reset state; also
ensure the onended handler clears and revokes audioUrlRef.current as well to
avoid double-revoke issues.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c2944a8 and 903ad6d.

📒 Files selected for processing (2)
  • frontend/src-tauri/src/tts.rs
  • frontend/src/services/tts/TTSContext.tsx
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Use path aliases (@/* maps to ./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid using any type
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Run just format, just lint, and just build after making TypeScript/React changes to ensure code quality and compilation

Files:

  • frontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management

Applied to files:

  • frontend/src/services/tts/TTSContext.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: build-linux
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: Greptile Review
  • GitHub Check: build-android
  • GitHub Check: build-ios
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (14)
frontend/src/services/tts/TTSContext.tsx (6)

1-58: LGTM on imports and type definitions.

The imports correctly use path aliases, and the type definitions are well-structured with proper TypeScript strictness. The use of isTauriDesktop is correct for desktop-only TTS functionality.


62-77: Appropriate state and ref initialization.

The Web Audio API refs (audioContextRef, sourceNodeRef) correctly replace the previous HTMLAudioElement approach to avoid hijacking macOS media controls. State initialization is clean.


79-118: Status checking logic is well-implemented.

The status flow correctly handles all states (ready → loading → not_downloaded) with proper error handling. The auto-load when models are downloaded but not loaded provides good UX.


120-162: Download flow handles event listener cleanup correctly.

The progress event listener is properly cleaned up in both success (line 141) and error (lines 157-159) paths.


338-361: Helper functions are well-implemented.

The preprocessTextForTTS correctly handles both complete and streaming (unclosed) think blocks. The separation is appropriate — frontend strips reasoning content, backend handles TTS-specific text normalization. base64ToBlob implementation is standard and correct.


288-305: Unmount cleanup should also revoke the object URL.

Same issue as stop() — if the component unmounts while audio is playing, the audioUrl is never revoked.

Likely an incorrect or invalid review comment.

frontend/src-tauri/src/tts.rs (8)

1-49: Regex pre-compilation is well-implemented.

All regexes are now compiled once using once_cell::sync::Lazy, addressing the previous performance concern. Good coverage of markdown patterns, emojis, diacritics, and punctuation normalization.


50-74: Model file definitions are adequate for progress tracking.

The expected sizes are used for progress estimation rather than validation. Minor discrepancies between actual and expected sizes will only affect progress bar accuracy, not functionality.


131-137: OOV handling correctly uses padding token.

Using 0 (padding token) for out-of-vocabulary characters is the right approach, as it avoids potential embedding lookup failures that could occur with invalid indices like -1.


304-387: Text chunking properly preserves sentence punctuation.

The sentence splitting at line 330 now correctly includes the punctuation mark (m.start() + 1). The split_by_words fallback handles long sentences gracefully. The remaining edge case (single word exceeding max_len) is acceptable since it's rare and preferable to truncating words.


691-782: Download implementation is robust with proper safeguards.

The implementation correctly addresses all previous concerns:

  • HTTP timeouts (300s overall, 30s connect) prevent indefinite blocking
  • Temp file pattern (.part extension) with atomic rename prevents partial file corruption
  • Explicit flush before rename ensures data integrity
  • Previous partial downloads are cleaned up before retry

812-855: Synthesis command has good lock handling.

Cloning style to avoid borrow conflicts and dropping the guard before CPU-intensive base64 encoding minimizes lock contention. The synthesis parameters (10 steps, 1.2× speed) align with PR objectives.


563-566: Voice selection acknowledged as future work.

The TODO comment appropriately documents that voice selection is currently hardcoded to F2. This is acceptable for the initial implementation.


866-884: Model deletion correctly unloads before removing files.

The sequence (unload from memory → delete directory) ensures no file handles are held when deletion occurs.

@AnthonyRonning AnthonyRonning force-pushed the feature/tts-supertonic branch from 903ad6d to 016e4e4 Compare January 8, 2026 01:44
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
frontend/src-tauri/src/tts.rs (1)

832-832: Consider making TTS parameters configurable.

The synthesis currently uses hardcoded values: 10 inference steps and 1.2× speed. Consider adding optional parameters to tts_synthesize to allow users to adjust quality (inference steps) and playback speed based on their preferences.

frontend/src/components/TTSDownloadDialog.tsx (1)

56-66: Consider refactoring nested ternary for readability.

The description text uses deeply nested ternary operators. While functional, extracting this logic into a helper function would improve readability:

♻️ Optional refactor
const getDescription = () => {
  if (isNotAvailable) return "TTS is only available in the desktop app.";
  if (isReady) return "TTS is ready! You can now listen to assistant messages.";
  if (hasError) return "There was an error setting up TTS.";
  if (isProcessing) return "Setting up TTS. Please keep this window open.";
  return `Listen to assistant messages with natural-sounding speech. This requires a one-time download of ~${Math.round(totalSizeMB)} MB.`;
};

// Then in JSX:
<DialogDescription className="text-base">
  {getDescription()}
</DialogDescription>
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 903ad6d and 016e4e4.

⛔ Files ignored due to path filters (1)
  • frontend/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • docs/tts-research.md
  • frontend/src-tauri/Cargo.toml
  • frontend/src-tauri/src/lib.rs
  • frontend/src-tauri/src/tts.rs
  • frontend/src/app.tsx
  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
  • frontend/src/services/tts/index.ts
  • justfile
🚧 Files skipped from review as they are similar to previous changes (4)
  • frontend/src/app.tsx
  • justfile
  • frontend/src-tauri/src/lib.rs
  • frontend/src/services/tts/index.ts
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Use path aliases (@/* maps to ./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid using any type
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Run just format, just lint, and just build after making TypeScript/React changes to ensure code quality and compilation

Files:

  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
🧠 Learnings (1)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management

Applied to files:

  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
🧬 Code graph analysis (1)
frontend/src/components/TTSDownloadDialog.tsx (3)
frontend/src/services/tts/TTSContext.tsx (1)
  • useTTS (317-323)
frontend/src/components/ui/dialog.tsx (6)
  • Dialog (92-92)
  • DialogContent (97-97)
  • DialogHeader (98-98)
  • DialogTitle (100-100)
  • DialogDescription (101-101)
  • DialogFooter (99-99)
frontend/src/components/ui/button.tsx (1)
  • Button (62-62)
🪛 LanguageTool
docs/tts-research.md

[grammar] ~5-~5: Ensure spelling is correct
Context: ...ligible levels.The 44 Million Parameter AdvantageThe model operates with approximately 44 mi...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[uncategorized] ~5-~5: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...d) can often reside entirely within the System Level Cache (SLC) or high-speed RAM partition...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[grammar] ~5-~5: Use a hyphen to join words.
Context: ...features. The dynamic nature of its flow matching steps and custom tokenizers mak...

(QB_NEW_EN_HYPHEN)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...he Tauri Framework Context: Integration RealitiesThe user's choice of Tauri as the applicati...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...It compiles down to a native binary.The Webview (Frontend): Written in web technologies...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...).3.1 The "Sidecar Pattern" and Desktop SuccessOn desktop operating systems, Tauri suppor...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...obile Wall": Why Sidecars Fail on iOS & AndroidThe user's requirement for iOS and Android ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...fecycle.The Dependency Hell of Embedded PythonThe alternative to a sidecar is embedding t...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...ource path helpers).Android Integration StrategyJNI and Shared Libraries: Android requires ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...g the models.5.1 Real-Time Factor (RTF) BenchmarksThe "Real-Time Factor" measures the speed o...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...s the integration roadmap.6.1 Rust Core ConfigurationThe integration avoids the sidecar pattern ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...hin the Rust process.Step 1: Dependency ManagementIn src-tauri/Cargo.toml:Ini, TOML[dependen...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~13-~13: Ensure spelling is correct
Context: ...back rodio = "0.19" Step 2: Model Asset BundlingThe 268MB model file must be accessible to ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~26-~26: Ensure spelling is correct
Context: ...capability.10. Conclusion and Strategic RecommendationThe comparative analysis yields a definitiv...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~162-~162: Consider using a shorter alternative to avoid wordiness.
Context: ...ox models (350-500 million parameters), making it better suited for resource-constrained devices...

(MADE_IT_JJR)

🪛 markdownlint-cli2 (0.18.1)
docs/tts-research.md

26-26: Bare URL used

(MD034, no-bare-urls)


26-26: Bare URL used

(MD034, no-bare-urls)


169-169: Bare URL used

(MD034, no-bare-urls)


175-175: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Greptile Review
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: build-linux
  • GitHub Check: build-android
  • GitHub Check: build-ios
🔇 Additional comments (2)
frontend/src-tauri/Cargo.toml (1)

44-44: Update ort from 2.0.0-rc.7 to a stable release or confirm pre-release acceptability.

ort 2.0.0 has not been stabilized—it remains in pre-release with the latest version at 2.0.0-rc.10. Using a pre-release version in production carries stability and support risks. Either upgrade to rc.10 if staying on pre-release is acceptable, or switch to a stable version if available.

frontend/src/components/UnifiedChat.tsx (1)

192-278: Well-implemented TTS button component.

The implementation demonstrates several good practices:

  • Proper cleanup: The useEffect at lines 208-215 correctly cleans up the long-press timer on unmount, addressing the concern from the previous review
  • Long-press handling: The pointer event handlers properly clear the timer on both pointerUp and pointerLeave, providing good UX
  • Platform awareness: Returns null for non-Tauri environments, ensuring the feature is desktop-only
  • Accessibility: Includes descriptive aria-label for screen readers
  • State management: Correctly uses the TTS context hook for shared state

The logic flow is clear and handles all the necessary states (not downloaded, loading, playing, etc.).

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds on-device text-to-speech functionality using Supertonic, a lightweight TTS engine that runs inference via ONNX Runtime. The implementation is desktop-only, downloading ~265MB of models from HuggingFace on first use and processing all audio locally.

Key Changes

  • Rust TTS Engine (frontend/src-tauri/src/tts.rs): Complete ONNX-based inference pipeline with text preprocessing, chunking, denoising loop, and WAV encoding
  • React Integration (TTSContext.tsx): Context provider managing download, model loading, and audio playback via Web Audio API
  • UI Components: Download dialog with progress tracking and speaker buttons on assistant messages (desktop only)
  • Platform Detection: Properly scoped to desktop environments using isTauriDesktop() checks

Issues Found

  • Race condition in tts_synthesize: sample_rate read while holding lock but used after lock released - could mismatch with audio data if TTS unloaded concurrently
  • Performance: Multiple large array clones inside 10-iteration denoising loop (line 515-520) - allocates significant memory
  • File handling: .with_extension("part") may produce unexpected results for multi-dot filenames
  • Cleanup edge case: Event listener cleanup not guaranteed in error path (line 152-161)
  • RC dependency: Using ort = "2.0.0-rc.7" release candidate in production

Architecture Notes

The implementation follows a sound architecture pattern: desktop-only feature with proper platform detection, atomic file downloads with .part extension, and clean separation between Rust inference and React playback. The 10-step denoising with 1.2x speed and 50ms chunk pause delivers high-quality output as documented.

Confidence Score: 3/5

  • This PR has multiple performance and correctness issues that should be addressed before merging
  • The implementation introduces a race condition in audio synthesis that could cause crashes or corrupted audio, uses an unstable RC dependency, and has performance concerns with repeated large allocations in hot paths. While the architecture is sound and the feature works, these issues warrant fixes before production deployment.
  • Pay close attention to frontend/src-tauri/src/tts.rs (race condition on line 839-843, performance issues in denoising loop) and frontend/src/services/tts/TTSContext.tsx (event listener cleanup)

Important Files Changed

File Analysis

Filename Score Overview
frontend/src-tauri/src/tts.rs 3/5 TTS engine implementation with multiple clones in hot paths and potential race condition in audio data handling
frontend/src/services/tts/TTSContext.tsx 4/5 TTS React context with proper cleanup but potential memory leak in error paths
frontend/src-tauri/Cargo.toml 3/5 Dependencies added for TTS - using release candidate version which may have stability concerns
frontend/src/components/TTSDownloadDialog.tsx 5/5 Clean download UI component with proper state management
frontend/src/components/UnifiedChat.tsx 5/5 TTS button integration into chat UI, properly scoped to desktop only
frontend/src/app.tsx 5/5 TTSProvider added to React component tree

Sequence Diagram

sequenceDiagram
    participant User
    participant React as React Frontend<br/>(TTSContext)
    participant Tauri as Tauri Backend<br/>(Rust)
    participant HF as HuggingFace<br/>(Models)
    participant ONNX as ONNX Runtime<br/>(Inference)

    User->>React: Open TTS Dialog
    React->>Tauri: tts_get_status()
    Tauri->>Tauri: Check models_downloaded
    Tauri->>Tauri: Check models_loaded
    Tauri-->>React: Status response

    alt Models not downloaded
        User->>React: Click Download
        React->>Tauri: tts_download_models()
        loop For each model file
            Tauri->>HF: HTTP GET model file
            HF-->>Tauri: Stream chunks
            Tauri->>Tauri: Write to .part file
            Tauri->>React: Emit progress event
            React->>User: Update progress bar
        end
        Tauri->>Tauri: Rename .part to final
        Tauri-->>React: Download complete
        
        React->>Tauri: tts_load_models()
        Tauri->>ONNX: Load 4 ONNX sessions
        Tauri->>Tauri: Load voice style JSON
        Tauri-->>React: Models loaded
    end

    User->>React: Click speaker icon
    React->>Tauri: tts_synthesize(text)
    Tauri->>Tauri: Preprocess text
    Tauri->>Tauri: Chunk text (300 chars)
    loop For each chunk
        Tauri->>ONNX: Duration prediction
        ONNX-->>Tauri: Duration values
        Tauri->>ONNX: Text encoding
        ONNX-->>Tauri: Text embeddings
        Tauri->>Tauri: Sample noisy latent
        loop 10 denoising steps
            Tauri->>ONNX: Vector estimation
            ONNX-->>Tauri: Denoised latent
        end
        Tauri->>ONNX: Vocoder (latent→wav)
        ONNX-->>Tauri: Audio samples
    end
    Tauri->>Tauri: Concatenate chunks
    Tauri->>Tauri: Encode to base64 WAV
    Tauri-->>React: Audio base64 + metadata
    React->>React: Decode to AudioBuffer
    React->>React: Play via Web Audio API
    React-->>User: Audio playback
Loading

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 8, 2026

Additional Comments (1)

frontend/src/tauri/src/tts.rs
hardcoded HuggingFace URL without HTTPS verification configuration - ensure TLS certificate validation is enabled in reqwest::Client

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Adds on-device text-to-speech using Supertonic, enabling assistant message audio playback on desktop with ~265MB one-time model download from HuggingFace.

Key Changes

  • Rust TTS engine (tts.rs): Complete ONNX Runtime integration with text preprocessing, model download/management, and audio synthesis
  • React TTS context: State management with Web Audio API playback, platform detection, and proper resource cleanup
  • UI integration: TTS button on assistant messages with long-press settings access, download dialog with progress tracking
  • Desktop-only: Properly scoped with #[cfg(desktop)] flags and runtime platform checks

Technical Highlights

  • Uses ONNX Runtime for cross-platform inference without Python dependencies
  • Implements streaming synthesis with 50ms silence between chunks for natural pacing
  • Handles partial download resume by cleaning up .part files
  • Web Audio API prevents hijacking system media controls
  • Comprehensive markdown/emoji stripping in text preprocessing

Minor Suggestions

  • Voice currently hardcoded to F2 - consider adding user preference API for F1/F2/M1/M2 voices
  • File validation could check size matches expected_size to catch corrupted partial downloads

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk - well-architected feature addition with proper platform scoping
  • Score reflects solid implementation across Rust and TypeScript, proper resource cleanup, desktop-only scoping preventing mobile issues, comprehensive error handling, and good architectural decisions (ONNX vs PyTorch, Web Audio API)
  • No files require special attention - implementation is production-ready with only minor enhancement suggestions

Important Files Changed

File Analysis

Filename Score Overview
frontend/src-tauri/src/tts.rs 4/5 new TTS engine with ONNX inference, text preprocessing, model download/management - solid implementation with minor suggestions for voice selection and file validation
frontend/src/services/tts/TTSContext.tsx 5/5 React context for TTS state management with proper cleanup, Web Audio API integration, and Tauri platform detection - well structured
frontend/src/components/TTSDownloadDialog.tsx 5/5 TTS setup dialog with download progress UI, model management, and clear user messaging - clean implementation
frontend/src/components/UnifiedChat.tsx 5/5 integrated TTS button with long-press management dialog, proper cleanup, and desktop-only rendering - well integrated
frontend/src-tauri/src/lib.rs 5/5 registered TTS module and commands with proper desktop-only compilation flags
frontend/src-tauri/Cargo.toml 5/5 added TTS dependencies (ort, ndarray, hound, etc.) as desktop-only with proper target configuration

Sequence Diagram

sequenceDiagram
    participant User
    participant UI as UnifiedChat UI
    participant TTS as TTSContext
    participant Tauri as Rust Backend
    participant Models as TTS Models

    Note over User,Models: First Time Setup Flow
    User->>UI: Click TTS button (models not downloaded)
    UI->>TTS: Check status
    TTS->>Tauri: tts_get_status()
    Tauri-->>TTS: models_downloaded: false
    TTS-->>UI: status: not_downloaded
    UI->>User: Show TTSDownloadDialog
    User->>UI: Click Download
    UI->>TTS: startDownload()
    TTS->>Tauri: tts_download_models()
    loop For each model file
        Tauri->>Models: Download from HuggingFace
        Models-->>Tauri: Chunk stream
        Tauri-->>TTS: Emit tts-download-progress event
        TTS-->>UI: Update progress bar
    end
    Tauri-->>TTS: Download complete
    TTS->>Tauri: tts_load_models()
    Tauri->>Models: Load ONNX sessions + voice style
    Models-->>Tauri: Sessions initialized
    Tauri-->>TTS: Models loaded
    TTS-->>UI: status: ready

    Note over User,Models: Speech Synthesis Flow
    User->>UI: Click TTS button (ready state)
    UI->>TTS: speak(text, messageId)
    TTS->>TTS: preprocessTextForTTS (remove think blocks)
    TTS->>Tauri: tts_synthesize(text)
    Tauri->>Tauri: chunk_text() + preprocess_text()
    loop For each chunk
        Tauri->>Tauri: UnicodeProcessor.call()
        Tauri->>Models: Duration prediction (dp_ort)
        Tauri->>Models: Text encoding (text_enc_ort)
        Tauri->>Models: Sample noisy latent
        loop Denoising steps (10 iterations)
            Tauri->>Models: Vector estimation (vector_est_ort)
        end
        Tauri->>Models: Vocoder inference
        Tauri->>Tauri: Append audio chunk + silence
    end
    Tauri->>Tauri: wav_to_base64()
    Tauri-->>TTS: audio_base64, sample_rate
    TTS->>TTS: base64ToBlob() + Web Audio API setup
    TTS->>UI: Update isPlaying state
    UI-->>User: Audio playback
    TTS-->>UI: onended: isPlaying = false
Loading

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (8)
frontend/src-tauri/src/tts.rs (8)

50-50: Consider pinning the model version instead of using "main".

The HuggingFace URL references the main branch, which means the models can change unexpectedly. This creates risks:

  • Breaking changes in model format or structure
  • Inconsistent behavior across installations
  • Difficulty reproducing issues

Consider pinning to a specific commit SHA or release tag for stability.

Example:

-const HUGGINGFACE_BASE_URL: &str = "https://huggingface.co/Supertone/supertonic/resolve/main";
+const HUGGINGFACE_BASE_URL: &str = "https://huggingface.co/Supertone/supertonic/resolve/abc123def456"; // Pin to specific commit

193-195: Consider handling "e.g." and "i.e." without trailing comma.

The current implementation only replaces "e.g.," and "i.e.," (with comma), but these abbreviations often appear without commas. Consider adding replacements for both variants:

 text = text.replace("@", " at ");
-text = text.replace("e.g.,", "for example, ");
-text = text.replace("i.e.,", "that is, ");
+text = text.replace("e.g.", "for example");
+text = text.replace("i.e.", "that is");

Note: This will match both "e.g.," and "e.g." since the comma will be preserved if present.


310-310: Consider moving static regex to module level.

The RE_PARA static is defined inside the chunk_text function. While functionally correct, it's inconsistent with the other regex patterns defined at module level (lines 19-48). Moving it to the top improves consistency and makes all regex patterns discoverable in one place.

 static RE_SENTENCE: Lazy<Regex> = Lazy::new(|| Regex::new(r"([.!?])\s+").unwrap());
+static RE_PARA: Lazy<Regex> = Lazy::new(|| Regex::new(r"\n\s*\n").unwrap());

 fn chunk_text(text: &str, max_len: usize) -> Vec<String> {
     let text = text.trim();
     if text.is_empty() {
         return vec![String::new()];
     }

-    static RE_PARA: Lazy<Regex> = Lazy::new(|| Regex::new(r"\n\s*\n").unwrap());
     let paragraphs: Vec<&str> = RE_PARA.split(text).collect();

516-541: Denoising loop clones large array on each iteration.

Line 518 clones xt (a 3D array) on every iteration of the denoising loop. For 10 steps (default), this creates 10 copies of a potentially large tensor. While necessary for ONNX Runtime's value creation, this could impact memory usage for long audio.

Consider documenting this behavior or exploring if ONNX Runtime supports views/references to reduce allocations.

Note: This is primarily a performance consideration for very long text synthesis. Current 50ms chunk approach (line 423) likely keeps memory usage reasonable.


563-564: TODO: Voice selection API is not implemented.

The comment indicates voice selection is hardcoded to F2, with F1, M1, M2 available but not selectable. Since the PR summary mentions "supports four voice styles", this limitation should be addressed or tracked.

Do you want me to:

  1. Generate a Tauri command to support runtime voice selection?
  2. Open a new issue to track this enhancement?

608-614: Add error context to ONNX session loading.

The ONNX session loading uses ? for error propagation without additional context. If model loading fails, users won't know which specific model file caused the issue. Consider adding context:

-    let dp_ort =
-        Session::builder()?.commit_from_file(models_dir.join("duration_predictor.onnx"))?;
-    let text_enc_ort =
-        Session::builder()?.commit_from_file(models_dir.join("text_encoder.onnx"))?;
-    let vector_est_ort =
-        Session::builder()?.commit_from_file(models_dir.join("vector_estimator.onnx"))?;
-    let vocoder_ort = Session::builder()?.commit_from_file(models_dir.join("vocoder.onnx"))?;
+    let dp_ort = Session::builder()?
+        .commit_from_file(models_dir.join("duration_predictor.onnx"))
+        .context("Failed to load duration_predictor.onnx")?;
+    let text_enc_ort = Session::builder()?
+        .commit_from_file(models_dir.join("text_encoder.onnx"))
+        .context("Failed to load text_encoder.onnx")?;
+    let vector_est_ort = Session::builder()?
+        .commit_from_file(models_dir.join("vector_estimator.onnx"))
+        .context("Failed to load vector_estimator.onnx")?;
+    let vocoder_ort = Session::builder()?
+        .commit_from_file(models_dir.join("vocoder.onnx"))
+        .context("Failed to load vocoder.onnx")?;

724-776: Consider adding retry logic for transient network failures.

The download implementation handles errors but doesn't retry on transient failures (network timeouts, temporary server issues). For large model downloads (~265 MB), this could frustrate users with unstable connections.

Consider implementing exponential backoff retry for network errors:

  • Retry 3-5 times with increasing delays
  • Distinguish between retryable errors (timeouts, 5xx) and non-retryable (404, authentication)
  • Preserve progress by supporting HTTP range requests for resumable downloads

Example pattern:

for attempt in 0..MAX_RETRIES {
    match download_with_timeout(&client, &url, &temp_path).await {
        Ok(_) => break,
        Err(e) if is_retryable(&e) && attempt < MAX_RETRIES - 1 => {
            tokio::time::sleep(Duration::from_secs(2_u64.pow(attempt))).await;
            continue;
        }
        Err(e) => return Err(e),
    }
}

833-833: Hardcoded synthesis parameters should be configurable.

The TTS synthesis uses fixed values:

  • total_step: 10 (inference steps)
  • speed: 1.2 (playback speed multiplier)

These significantly affect output quality and duration but cannot be adjusted by users. Consider exposing these as optional parameters in the Tauri command signature or a configuration API.

 #[tauri::command]
 pub async fn tts_synthesize(
     text: String,
+    speed: Option<f32>,
+    steps: Option<usize>,
     state: tauri::State<'_, Mutex<TTSState>>,
 ) -> Result<TTSSynthesizeResponse, String> {
     // ... 
     let audio = tts
-        .synthesize(&text, &style, 10, 1.2)
+        .synthesize(&text, &style, steps.unwrap_or(10), speed.unwrap_or(1.2))
         .map_err(|e| format!("TTS synthesis failed: {}", e))?;
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 503318f and f814d89.

⛔ Files ignored due to path filters (1)
  • frontend/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • frontend/src-tauri/Cargo.toml
  • frontend/src-tauri/src/tts.rs
  • frontend/src/services/tts/TTSContext.tsx
🚧 Files skipped from review as they are similar to previous changes (2)
  • frontend/src-tauri/Cargo.toml
  • frontend/src/services/tts/TTSContext.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-ios
  • GitHub Check: build-linux
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: build-android
🔇 Additional comments (10)
frontend/src-tauri/src/tts.rs (10)

75-111: LGTM! Data structures are well-designed.

The config and style structures properly model the TTS pipeline requirements. Good use of #[serde(rename)] for the dtype field to handle reserved keywords.


117-145: LGTM! Unicode processing handles OOV characters safely.

The bounds checking on line 132 prevents index-out-of-bounds errors, and the fallback to padding token (0) for unknown characters is a safe default. The masking logic correctly handles variable-length text inputs.


628-648: LGTM! WAV encoding properly handles float-to-int conversion.

The clamping on line 640 prevents overflow when converting float samples to 16-bit integers. This is essential for avoiding audio artifacts.


661-680: LGTM! Status check is efficient and correct.

The implementation properly checks file existence and state without unnecessary I/O. The lock is held briefly only to check state.


698-702: Verify timeout values are appropriate for large models.

The HTTP client uses a 300-second (5-minute) timeout for the entire request and 30-second connect timeout. For users on slow connections downloading 265 MB, 5 minutes may be insufficient.

Consider if the timeout should be calculated based on expected download speed (e.g., 1 Mbps minimum = ~35 minutes for 265 MB) or removed entirely for the download phase (keeping only the connect timeout).


785-804: LGTM! Model loading correctly manages state.

The implementation properly loads models and updates state under the mutex. Previous model instances will be dropped automatically when replaced, ensuring proper resource cleanup.


844-844: Good practice: dropping lock before slow encoding.

Releasing the state lock before base64 encoding (line 844) is excellent for preventing lock contention during the potentially slow encoding operation. This allows other commands (like status checks) to proceed concurrently.


858-865: LGTM! Unload correctly releases model resources.

Setting the Option fields to None will drop the ONNX sessions and free GPU/CPU resources. Simple and correct.


867-885: LGTM! Delete models safely unloads before filesystem cleanup.

The implementation correctly unloads models from memory before deleting files, preventing use-after-free scenarios. The scoped lock (lines 870-874) is good practice to minimize lock duration.


23-23: The regex analysis contains a logical error. The pattern r"_([^_\s][^_]*)_" actually will match single-character italic text like _a_.

Breaking down the pattern:

  • [^_\s] matches exactly one character (not underscore or whitespace)
  • [^_]* matches zero or more additional characters
  • Therefore _a_ matches with capture group = "a"

The design choice to exclude whitespace in the first character ([^_\s]) is intentional—it prevents false matches on patterns like _ something_. This pattern is consistent with other format regex patterns in the file (RE_ITALIC, RE_BOLD2, etc.), all of which require a minimum of 1 character.

No change needed.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Adds on-device text-to-speech using Supertonic ONNX models, enabling users to listen to assistant messages on desktop. The implementation is well-architected with proper separation between Rust inference backend and React UI layer.

Key Changes

  • Rust TTS engine: 905-line implementation using ONNX Runtime with text preprocessing, chunking, and streaming synthesis
  • React state management: TTSContext handles download progress, model lifecycle, and audio playback using Web Audio API
  • Desktop-only feature: Properly scoped with conditional compilation and platform detection
  • One-time download: ~265MB models downloaded from HuggingFace on first use, stored in app data directory
  • Smart preprocessing: Removes markdown formatting, emojis, and <think> blocks before synthesis

Issues Found

  • Logic issue: Potential panic in tts.rs line 459 if text_ids array is empty - needs validation before accessing text_ids[0]
  • Hardcoded values: Voice style (F2) and synthesis parameters (10 steps, 1.2x speed) are hardcoded - consider exposing as user preferences

Strengths

  • Comprehensive error handling throughout download and synthesis flows
  • Proper cleanup of audio resources and event listeners
  • Good UX with progress tracking and clear status messaging
  • Well-researched technical choice documented in tts-research.md

Confidence Score: 4/5

  • Safe to merge with minor fix recommended - the empty input validation issue is unlikely to occur in practice but should be addressed
  • Score reflects solid architecture and implementation quality, with one logical issue (empty array access) that could cause a panic. The hardcoded parameters are design choices rather than bugs. Desktop-only scoping is appropriate, error handling is thorough, and the feature is well-integrated.
  • Pay close attention to frontend/src-tauri/src/tts.rs line 459 - add validation before accessing text_ids[0] to prevent potential panic

Important Files Changed

File Analysis

Filename Score Overview
frontend/src-tauri/src/tts.rs 4/5 New TTS engine using Supertonic ONNX models. Well-structured with proper error handling, text preprocessing, and streaming synthesis. Minor issue: potential panic on empty input.
frontend/src/services/tts/TTSContext.tsx 5/5 React context managing TTS state and playback. Clean implementation with proper cleanup, Web Audio API usage, and preprocessing to remove chain-of-thought blocks.
frontend/src/components/TTSDownloadDialog.tsx 5/5 UI dialog for TTS setup with progress tracking and model management. Good UX with disabled state handling during downloads and clear messaging.
frontend/src/components/UnifiedChat.tsx 5/5 Added TTS button to assistant messages with long-press for settings. Desktop-only feature with proper platform detection and state management.
frontend/src-tauri/Cargo.toml 5/5 Added desktop-only TTS dependencies (ONNX Runtime, ndarray, reqwest, etc.). Proper conditional compilation for desktop platforms.

Sequence Diagram

sequenceDiagram
    participant User
    participant UI as UnifiedChat UI
    participant TTS as TTSContext
    participant Tauri as Rust Backend
    participant ONNX as ONNX Runtime
    participant HF as HuggingFace

    Note over User,HF: Initial Setup Flow
    User->>UI: Opens TTS setup dialog
    UI->>TTS: checkStatus()
    TTS->>Tauri: invoke("tts_get_status")
    Tauri-->>TTS: {models_downloaded: false}
    TTS-->>UI: status: "not_downloaded"
    
    User->>UI: Clicks "Download"
    UI->>TTS: startDownload()
    TTS->>Tauri: invoke("tts_download_models")
    
    loop For each model file
        Tauri->>HF: HTTP GET model file
        HF-->>Tauri: Stream chunks
        Tauri->>UI: emit("tts-download-progress")
        UI-->>User: Update progress bar
    end
    
    Tauri->>Tauri: Save models to app data dir
    Tauri-->>TTS: Download complete
    
    TTS->>Tauri: invoke("tts_load_models")
    Tauri->>ONNX: Load ONNX sessions
    ONNX-->>Tauri: Sessions ready
    Tauri-->>TTS: Models loaded
    TTS-->>UI: status: "ready"

    Note over User,HF: Text-to-Speech Flow
    User->>UI: Clicks speaker icon
    UI->>TTS: speak(text, messageId)
    TTS->>TTS: preprocessTextForTTS()
    Note right of TTS: Remove <think> blocks
    
    TTS->>Tauri: invoke("tts_synthesize", {text})
    Tauri->>Tauri: chunk_text(300 chars)
    
    loop For each chunk
        Tauri->>Tauri: preprocess_text()
        Tauri->>ONNX: Run duration_predictor
        ONNX-->>Tauri: duration
        Tauri->>ONNX: Run text_encoder
        ONNX-->>Tauri: text_emb
        Tauri->>Tauri: sample_noisy_latent()
        
        loop 10 denoising steps
            Tauri->>ONNX: Run vector_estimator
            ONNX-->>Tauri: denoised_latent
        end
        
        Tauri->>ONNX: Run vocoder
        ONNX-->>Tauri: wav_data
    end
    
    Tauri->>Tauri: wav_to_base64()
    Tauri-->>TTS: {audio_base64, sample_rate}
    
    TTS->>TTS: base64ToBlob()
    TTS->>TTS: AudioContext.decodeAudioData()
    TTS->>TTS: source.start()
    TTS-->>User: Audio playback starts
    
    User->>UI: Clicks stop icon
    UI->>TTS: stop()
    TTS->>TTS: source.stop()
    TTS-->>User: Audio stops
Loading

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In @frontend/src-tauri/src/tts.rs:
- Around line 50-71: Extend MODEL_FILES to include a SHA256 hex string for each
entry (e.g. change the tuple type used by MODEL_FILES to include the checksum
field) and update tts_download_models to compute the SHA256 of the downloaded
file bytes (using a standard crate like sha2::Sha256 and hex encoding) before
accepting the file; after writing the file (or on an in-memory buffer), compute
hex(SHA256(bytes)) and compare to the checksum from MODEL_FILES, and if they
differ delete/ignore the file and return or retry the download with an error,
otherwise proceed as before; retain HUGGINGFACE_BASE_URL and the existing
name/path/size checks but make checksum verification mandatory in
tts_download_models so corrupted or tampered downloads are rejected.
- Around line 709-729: The current skip logic only checks meta.len() > 0 and can
treat partial downloads as complete; update the check to compare the file size
to the expected_size (e.g., meta.len() == expected_size or meta.len() >=
expected_size) and only skip if the sizes match; when skipping, increment
total_downloaded by meta.len() (or expected_size if you choose ==) and emit
DownloadProgress using the actual size so progress is accurate; if the size does
not match expected_size treat the file as invalid and remove it as already
implemented.
- Around line 516-541: The loop unnecessarily clones xt when calling
Value::from_array(xt.clone()); change this to pass xt by value
(Value::from_array(xt)) to avoid the extra allocation and ensure ownership
semantics are correct, and adjust subsequent usage so xt is reassigned after
extracting "denoised_latent". Also hoist current_step_array and
current_step_value creation out of the loop and update their contents in-place
each iteration (e.g., via ndarray mapv_inplace or Zip) instead of reallocating
them each time; update references around vector_est_ort.run(...) to use the
moved/updated current_step_value.
🧹 Nitpick comments (5)
frontend/src-tauri/src/tts.rs (5)

304-387: Consider simplifying the chunking logic.

The chunk_text function implements sophisticated paragraph/sentence/word-aware chunking, but the nested conditional logic (lines 314-379) is complex with high cyclomatic complexity. While the implementation appears correct, it may be difficult to maintain.

♻️ Optional refactor: Extract helper functions

Consider extracting sub-functions:

  • chunk_paragraph(para: &str, max_len: usize) -> Vec<String>
  • chunk_sentence(sentence: &str, max_len: usize) -> Vec<String>

This would improve readability and testability while preserving the existing behavior.


413-447: Consider making synthesis parameters configurable.

The function hardcodes max_len=300 (line 421) and silence_duration=0.05 (line 423). While these values align with the PR objectives (50ms pause), consider exposing them as configuration parameters for future flexibility (e.g., adjusting pause duration per voice style or use case).


562-565: Track the TODO for voice selection API.

The voice is currently hardcoded to F2 (line 565). The PR objectives mention four voices (F1, F2, M1, M2) are supported. This TODO should be addressed in a future PR to enable user voice selection.

Do you want me to open a new issue to track implementing the voice selection API?


778-787: Strengthen download validation with expected_size.

The code validates file_downloaded != expected_len only when content_length is Some (line 778). However, expected_size from MODEL_FILES is always available. Consider also validating against expected_size as a fallback or additional check to catch cases where the server's Content-Length header is missing or incorrect.

🔍 Suggested enhancement
         if let Some(expected_len) = expected_len {
             if file_downloaded != expected_len {
                 drop(file);
                 let _ = fs::remove_file(&temp_path);
                 return Err(format!(
                     "Incomplete download for {}: expected {} bytes, got {}",
                     file_name, expected_len, file_downloaded
                 ));
             }
         }
+        // Also validate against our known expected size
+        if file_downloaded != *expected_size {
+            drop(file);
+            let _ = fs::remove_file(&temp_path);
+            return Err(format!(
+                "Downloaded file size mismatch for {}: expected {} bytes, got {}",
+                file_name, expected_size, file_downloaded
+            ));
+        }

852-854: Consider exposing synthesis parameters to frontend.

The synthesis call hardcodes total_step=10 and speed=1.2 (line 853). While these align with the PR objectives, consider allowing the frontend to adjust these parameters (quality vs. speed trade-off) in a future enhancement.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f814d89 and b9a37ea.

📒 Files selected for processing (1)
  • frontend/src-tauri/src/tts.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-linux
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: build-ios
  • GitHub Check: build-android
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (12)
frontend/src-tauri/src/tts.rs (12)

113-145: LGTM: Unicode processor is well-implemented.

The bounds checking (line 132) and graceful OOV handling (line 136) prevent potential panics and ensure robust text-to-token conversion.


147-220: LGTM: Comprehensive text preprocessing for TTS.

The preprocessing pipeline appropriately handles markdown removal, Unicode normalization, symbol replacement, and punctuation formatting. The auto-punctuation logic (lines 216-218) correctly checks for existing punctuation before adding a period.


222-302: LGTM: Helper functions are correctly implemented.

The mask generation, noisy latent sampling, and word-based text splitting logic are all sound and handle edge cases appropriately (e.g., empty text, zero lengths).


389-411: State management design is appropriate.

The TTSState and TextToSpeech structs correctly separate public state (with Mutex for thread-safety) from internal implementation details. Returning Mutex<Self> from new() is unconventional but works well with Tauri's state management pattern.


596-626: ONNX Runtime error handling relies on proper file validation.

The ONNX session creation (lines 608-614) will propagate errors via ? if model files are corrupted or invalid. This is appropriate, but ensure that the download verification (in tts_download_models) is robust enough to prevent corrupted files from reaching this point. This relates to the earlier comment about adding checksum verification.


628-648: LGTM: WAV encoding is correctly implemented.

The sample clamping (line 640) prevents overflow when converting to 16-bit integers, and the scoped writer (lines 637-645) ensures proper WAV finalization before encoding.


661-680: LGTM: Status check is straightforward and correct.

The function appropriately checks file existence and state loading without holding locks unnecessarily.


789-796: LGTM: Atomic file finalization pattern.

The sequence of flush(), sync_all(), drop(), and rename() (lines 790-796) ensures data is fully written to disk before replacing the final file. This prevents corruption if the process is interrupted.


805-824: LGTM: Efficient lock usage in model loading.

The implementation correctly loads models (lines 811-814) before acquiring the state lock (line 817), minimizing lock contention and preventing blocking other operations during the expensive I/O and ONNX session initialization.


863-867: LGTM: Excellent lock release pattern.

Dropping the state lock (line 864) before the expensive base64 encoding prevents blocking other TTS operations during encoding. This is a performance best practice.


878-905: LGTM: Model cleanup commands are correctly implemented.

The tts_delete_models function correctly unloads models from memory before deleting files (lines 890-894), preventing use-after-delete issues.


23-23: The italic regex pattern analysis contains errors.

The pattern r"_([^_\s][^_]*)_" will actually match _a_ and _a b_ contrary to the review comment. The pattern requires the first character to be non-whitespace and non-underscore, then allows zero or more non-underscore characters (which includes spaces). Both test cases satisfy this: _a_ matches with a single character, and _a b_ matches with a space in the middle. The pattern correctly rejects _ test_ because the underscore cannot be followed by whitespace.

If there are legitimate concerns about the italic matching behavior (such as not matching single-character italics intentionally), those should be evaluated separately against the actual intended use cases in the codebase.

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @frontend/src-tauri/Cargo.toml:
- Around line 42-54: The Cargo dependency for the ONNX Runtime crate is pinned
to the unstable release candidate "ort = \"2.0.0-rc.10\"" which may introduce
breaking changes; update the dependency declaration for the desktop target in
Cargo.toml to use the stable release "ort = \"1.16.3\"" instead (modify the
entry under target.'cfg(any(target_os = "macos", target_os = "linux", target_os
= "windows"))'.dependencies where the ort line appears), run cargo update/build
to verify compatibility, and if any 2.0-specific APIs were used adjust code to
the 1.16.3 API surface or keep the RC only if those features are required.
🧹 Nitpick comments (3)
frontend/src-tauri/src/tts.rs (3)

477-479: Consider making chunk size and silence duration configurable.

The chunk size (300 characters) and inter-chunk silence (0.05 seconds) are currently hardcoded. While these values are reasonable, making them configurable parameters would provide flexibility for different use cases.


622-625: Reminder: Voice selection is currently hardcoded to F2.

The TODO comment indicates that voice selection API is needed. The PR objectives acknowledge this limitation. The implementation currently supports four voices (F1, F2, M1, M2) but defaults to F2.

Would you like me to help design a voice selection API, or should this be tracked in a separate issue?


922-965: Good mutex management with early lock release.

The implementation correctly:

  • Clones the style to avoid borrow conflicts (lines 929-934)
  • Drops the guard before expensive encoding operation (line 953), minimizing lock contention
  • Validates inputs (empty text/audio)

The hardcoded inference parameters (10 steps, 1.2× speed) match the PR objectives but could be made configurable in the future.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9a37ea and 7d53d03.

⛔ Files ignored due to path filters (1)
  • frontend/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • frontend/src-tauri/Cargo.toml
  • frontend/src-tauri/src/tts.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-linux
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: build-android
  • GitHub Check: build-ios
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (7)
frontend/src-tauri/src/tts.rs (7)

51-53: Excellent security practice: pinned model revision.

Pinning the HuggingFace revision to a specific commit ensures reproducibility and prevents supply chain attacks through model substitution.


55-119: Strong integrity verification with size and SHA-256 checks.

Each model file includes both expected size and SHA-256 checksum, providing robust verification during download to prevent corruption or tampering.


178-201: Good defensive handling of out-of-vocabulary characters.

The implementation correctly handles Unicode characters outside the indexer range by using the padding token (0) rather than panicking. This ensures robustness when processing diverse text inputs.


203-276: Comprehensive text preprocessing for TTS quality.

The preprocessing pipeline is thorough and well-structured:

  • Unicode normalization (NFKD) for consistent character handling
  • Markdown formatting removal for clean text
  • Punctuation and spacing corrections
  • Automatic sentence termination when needed

The use of pre-compiled regex patterns (via Lazy static) ensures efficient repeated processing.


360-443: Well-designed hierarchical text chunking strategy.

The chunking logic correctly handles multiple levels:

  1. Split by paragraphs (preserving structure)
  2. Split by sentences (preserving natural boundaries)
  3. Split by words (fallback for oversized segments)

This approach maintains natural speech boundaries while respecting length constraints.


754-892: Robust model download implementation with comprehensive verification.

The download logic includes multiple layers of protection:

  • Atomic downloads via temp files with .part extension
  • Dual size verification (Content-Length header + actual bytes)
  • SHA-256 integrity verification
  • Proper file finalization (flush, sync, rename)
  • Cleanup of partial downloads on failure
  • Progress events for UI feedback

This is production-quality download handling.


977-994: Correct cleanup order in model deletion.

The implementation properly unloads models from memory before deleting files, preventing potential access violations. The existence check before deletion (line 987) handles the edge case where models were never downloaded.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds on-device text-to-speech functionality using Supertonic, a fast ONNX-based TTS engine that runs entirely locally without cloud API calls.

Key Changes

  • Rust backend (tts.rs): Implements Supertonic TTS with ONNX Runtime, downloading ~265MB of models from HuggingFace with SHA256 verification, text preprocessing (markdown removal, unicode normalization), and streaming synthesis with 10-step denoising
  • React integration: New TTSContext managing download progress, model loading, and Web Audio API playback; TTSDownloadDialog provides polished UX for one-time model download
  • Desktop-only feature: Properly gated with #[cfg(desktop)] in Rust and isTauriEnv checks in React; speaker buttons appear only in Tauri desktop environment
  • Smart preprocessing: Removes <think> blocks from assistant messages before synthesis to avoid reading internal reasoning aloud

Architecture Highlights

  • Models pinned to specific HuggingFace revision with hardcoded SHA256 checksums for integrity
  • Streaming download with progress events emitted to frontend
  • ONNX Runtime enables cross-platform inference (macOS/Linux/Windows) with potential for NPU acceleration
  • Voice style (F2), inference steps (10), and speed (1.2x) are currently hardcoded but functional
  • Clean separation: Rust handles compute-heavy synthesis, React handles UI/audio playback via Web Audio API

Testing Notes

Desktop builds are marked as working in the PR description. iOS/Android builds noted as needing testing, which is appropriate since TTS is intentionally desktop-only (dependencies are platform-gated in Cargo.toml).

Confidence Score: 4/5

  • Safe to merge with minor suggestions for future enhancements
  • Code is well-structured with proper error handling, security best practices (SHA256 verification), and appropriate platform gating. The implementation is production-ready for desktop. Score is 4 (not 5) due to hardcoded configuration values that would benefit from being user-configurable in future iterations.
  • All files look solid. tts.rs has a few hardcoded values (voice, inference params) that are noted for future enhancement but don't block merging.

Important Files Changed

File Analysis

Filename Score Overview
frontend/src-tauri/src/tts.rs 4/5 Implements on-device TTS using Supertonic ONNX models with SHA256-verified downloads, text preprocessing, and streaming synthesis. Well-structured with proper error handling, though voice/parameter configuration is hardcoded.
frontend/src-tauri/src/lib.rs 5/5 Added TTS module import and registered 6 TTS Tauri commands (get_status, download_models, load_models, synthesize, unload_models, delete_models). Desktop-only via #[cfg(desktop)]. Clean integration.
frontend/src/services/tts/TTSContext.tsx 5/5 React context managing TTS lifecycle with download progress tracking, Web Audio API playback, and automatic cleanup. Properly handles Tauri environment detection and preprocesses text to remove <think> blocks before synthesis.
frontend/src/components/TTSDownloadDialog.tsx 5/5 Polished download UI with progress tracking, model deletion, and clear error states. Prevents dialog close during processing. Good UX for 265MB model download flow.
frontend/src/app.tsx 5/5 Added TTSProvider to app provider tree, positioned correctly between TooltipProvider and BillingServiceProvider. Minimal, appropriate change.
frontend/src/components/UnifiedChat.tsx 4/5 Added speaker button component to assistant messages with long-press settings gesture. Desktop-only via isTauriEnv check. Integrates cleanly with existing message rendering.
frontend/src-tauri/Cargo.toml 5/5 Added 12 TTS dependencies for desktop targets only via target.'cfg(...)'. Includes ONNX Runtime, audio encoding (hound), HTTP streaming (reqwest), and cryptography (sha2). Appropriate platform gating.
docs/tts-research.md 5/5 Comprehensive research document comparing Supertonic vs Chatterbox for Tauri integration. Details architecture, performance, and cross-platform deployment. Justifies Supertonic selection for mobile compatibility.

Sequence Diagram

sequenceDiagram
    participant User
    participant UI as React UI<br/>(UnifiedChat)
    participant TTS as TTSContext<br/>(React)
    participant Dialog as TTSDownloadDialog
    participant Tauri as Tauri IPC
    participant Rust as Rust Backend<br/>(tts.rs)
    participant HF as HuggingFace
    participant ONNX as ONNX Runtime
    participant Audio as Web Audio API

    Note over User,Audio: Initial Setup Flow
    User->>UI: Opens app
    UI->>TTS: Auto-check status on mount
    TTS->>Tauri: invoke("tts_get_status")
    Tauri->>Rust: tts_get_status()
    Rust->>Rust: Check if models exist<br/>and have correct size
    Rust-->>Tauri: TTSStatusResponse
    Tauri-->>TTS: {models_downloaded, models_loaded}
    
    alt Models not downloaded
        TTS->>TTS: Set status="not_downloaded"
        User->>UI: Clicks speaker button
        UI->>Dialog: Show download dialog
        User->>Dialog: Clicks "Download"
        Dialog->>TTS: startDownload()
        TTS->>Tauri: invoke("tts_download_models")
        Tauri->>Rust: tts_download_models()
        
        loop For each model file
            Rust->>HF: HTTP GET model file
            HF-->>Rust: Stream chunks
            Rust->>Rust: SHA256 hash verification
            Rust->>Tauri: emit("tts-download-progress")
            Tauri->>TTS: Download progress event
            TTS->>Dialog: Update progress bar
        end
        
        Rust-->>Tauri: Download complete
        TTS->>Tauri: invoke("tts_load_models")
        Tauri->>Rust: tts_load_models()
        Rust->>ONNX: Load 4 ONNX models<br/>(text_encoder, duration_predictor,<br/>vector_estimator, vocoder)
        Rust->>Rust: Load F2 voice style
        Rust->>Rust: Store in TTSState Mutex
        Rust-->>Tauri: Success
        Tauri-->>TTS: Models loaded
        TTS->>TTS: Set status="ready"
    else Models downloaded but not loaded
        TTS->>Tauri: invoke("tts_load_models")
        Tauri->>Rust: tts_load_models()
        Rust->>ONNX: Load models into memory
        Rust-->>Tauri: Success
        TTS->>TTS: Set status="ready"
    end

    Note over User,Audio: Speech Synthesis Flow
    User->>UI: Clicks speaker button<br/>on assistant message
    UI->>TTS: speak(text, messageId)
    TTS->>TTS: preprocessTextForTTS()<br/>(remove <think> blocks)
    TTS->>Tauri: invoke("tts_synthesize", {text})
    Tauri->>Rust: tts_synthesize(text)
    
    Rust->>Rust: preprocess_text()<br/>(remove markdown, normalize)
    Rust->>Rust: chunk_text(text, 300)<br/>(split into sentences)
    
    loop For each chunk
        Rust->>Rust: UnicodeProcessor.call()<br/>(tokenize text)
        Rust->>ONNX: duration_predictor.run()<br/>(predict phoneme duration)
        Rust->>ONNX: text_encoder.run()<br/>(encode text with F2 style)
        Rust->>Rust: sample_noisy_latent()<br/>(initialize with gaussian noise)
        
        loop 10 denoising steps
            Rust->>ONNX: vector_estimator.run()<br/>(denoise latent)
        end
        
        Rust->>ONNX: vocoder.run()<br/>(latent → waveform)
        Rust->>Rust: Append 50ms silence
    end
    
    Rust->>Rust: wav_to_base64()<br/>(encode as 16-bit WAV)
    Rust-->>Tauri: TTSSynthesizeResponse<br/>{audio_base64, sample_rate}
    Tauri-->>TTS: Audio data
    TTS->>TTS: base64ToBlob()
    TTS->>Audio: Create AudioContext
    TTS->>Audio: decodeAudioData()
    TTS->>Audio: createBufferSource()
    TTS->>Audio: source.start()
    Audio-->>User: Plays audio
    
    User->>UI: Clicks stop button
    UI->>TTS: stop()
    TTS->>Audio: source.stop()
    TTS->>Audio: audioContext.close()
    TTS->>TTS: Cleanup resources
Loading

AnthonyRonning and others added 14 commits January 8, 2026 11:27
- Add Rust TTS module with Supertonic ONNX inference
- Download models (~265MB) from HuggingFace on first use
- Add TTSContext for React state management
- Add TTSDownloadDialog for model setup
- Add speaker button to assistant messages (desktop only)
- Settings: F2 voice, 10 inference steps, 1.2x speed

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Preserve sentence punctuation when chunking text
- Add download timeout (5min) and partial file cleanup with .part temp files
- Fix audio URL memory leaks in stop/unmount/error handlers
- Use isTauriDesktop instead of isTauri (TTS is desktop-only)

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Long-press speaker button to open TTS management dialog
- Add delete option to free up storage (~264 MB)
- Add tts_delete_models Rust command
- Show storage info when TTS is ready

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add TypeScript preprocessing to remove <think>...</think> blocks
- Add Rust preprocessing to strip markdown formatting (**bold**, *italic*, etc.)
- Remove code blocks, headers, emojis from TTS input
- Handle unclosed think tags (streaming edge case)

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add #[cfg(desktop)] to tts module declaration
- Move TTS dependencies to desktop-only target in Cargo.toml
- Prevents TTS code from being compiled for iOS/Android

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Add useEffect cleanup for longPressTimer on unmount
- Use 0 (padding token) instead of -1 for OOV characters
- Make copy/speaker buttons always visible (not hover-only)
- Fix progress bar colors (use primary/20 background)

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Clear mediaSession metadata and action handlers so TTS audio
doesn't intercept system play/pause and track skip controls.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
- Use lazy_static regexes (once_cell) for one-time compilation
- Replace while loops with single-pass regex for duplicate quotes
- Add split_by_words() to handle text longer than max_len in chunks
- Add TODO for voice selection (currently hardcoded to F2)

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Replace HTMLAudioElement with Web Audio API (AudioContext + AudioBufferSourceNode)
which doesn't register with the Media Session API.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@AnthonyRonning AnthonyRonning force-pushed the feature/tts-supertonic branch from 7d53d03 to 6509a68 Compare January 8, 2026 17:28
@AnthonyRonning
Copy link
Contributor Author

@TestFlight build

@github-actions
Copy link
Contributor

github-actions bot commented Jan 8, 2026

🚀 TestFlight deployment triggered! Check the Actions tab for progress.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In @docs/tts-research.md:
- Line 26: Replace the bare URLs at the referenced locations (lines 26, 169,
175) with markdown links by wrapping each URL in [descriptive text](https://...)
or at minimum using the URL itself as link text like
[https://huggingface.co/...](https://huggingface.co/...), ensuring any
occurrences of raw "https://..." in the content are converted to proper markdown
link syntax for improved rendering and accessibility; search for the raw URLs in
the doc and update them accordingly.
- Around line 159-247: The "Grok" section duplicates content already covered in
the "Claude" analysis; remove or consolidate it into a single authoritative
analysis to avoid redundancy. Find the "Grok" header and its full subsection
(the duplicated comparisons of Supertonic vs Chatterbox, tech stack,
performance, platform insights, recommendations, and citations) and either
delete that entire block or merge its unique points into the existing "Claude"
section (preserve any unique facts such as Supertonic 2 release notes or
specific citations), then update references/citation list accordingly so only
one consolidated TTS analysis remains.
- Around line 5-26: The doc has missing blank lines and inconsistent header
markup—ensure each section header (e.g., "The 44 Million Parameter Advantage",
"The ONNX-Native Runtime", "The Sidecar Pattern", "The Mobile Wall", etc.) is
followed by a single blank line before its paragraph content and convert nested
section labels (e.g., subsection titles like "Supertonic 2: The Principles of
Architectural Distillation", "Chatterbox Turbo: The Llama-Based Heavyweight",
"3.1 The "Sidecar Pattern" and Desktop Success") into proper Markdown headers
(use ##/###/#### to reflect hierarchy) so formatting is consistent and readable
throughout the document.
🧹 Nitpick comments (9)
docs/tts-research.md (2)

29-156: The "Claude" section is better structured but may still be redundant.

This analysis is significantly more readable and concise than the "Gemini" section. However, since the PR implements only Supertonic and not Chatterbox, roughly 40% of this content (comparisons, Chatterbox recommendations) is surplus and adds maintenance burden.

If this document is intended as decision justification, keep it; if it's permanent developer reference documentation, consider refactoring to focus primarily on Supertonic integration and relegating Chatterbox to a brief "considered alternatives" section.


5-26: Correct grammar and spelling issues flagged by LanguageTool.

Static analysis identified several grammar issues:

  • Line 5: Missing line break between "levels.The" (should be "levels. The")
  • Line 5: "System Level Cache" should be "system-level cache" (hyphenated compound modifier)
  • Various other section headers have similar spacing/punctuation issues throughout the Gemini section

These are primarily whitespace and hyphenation issues that make the document appear less polished.

frontend/src/app.tsx (1)

17-17: Consider using path alias for consistency.

The import uses a relative path "./services/tts/TTSContext". Per coding guidelines, TypeScript/React files should use path aliases (@/* maps to ./src/*) for imports.

♻️ Suggested change
-import { TTSProvider } from "./services/tts/TTSContext";
+import { TTSProvider } from "@/services/tts";

Note: Import from the index file (@/services/tts) rather than directly from TTSContext for cleaner imports.

frontend/src/services/tts/TTSContext.tsx (1)

208-272: Consider reusing AudioContext for better performance.

The speak function creates a new AudioContext on each invocation (line 235). While this works correctly and contexts are properly closed, reusing a single context would be more efficient and avoid potential issues with browser limits on the number of AudioContext instances.

♻️ Potential optimization

Consider maintaining a single AudioContext instance and reusing it:

// At the provider level
const audioContextRef = useRef<AudioContext | null>(null);

// In speak function
const getAudioContext = () => {
  if (!audioContextRef.current || audioContextRef.current.state === 'closed') {
    audioContextRef.current = new AudioContext();
  }
  return audioContextRef.current;
};

const audioContext = getAudioContext();
// ... rest of the code

This is a minor optimization and not critical for functionality.

frontend/src/components/UnifiedChat.tsx (1)

65-65: Use centralized TTS import path.

The import uses @/services/tts/TTSContext directly. For consistency, consider importing from the barrel export @/services/tts instead, which re-exports the same functions.

♻️ Suggested change
-import { useTTS } from "@/services/tts/TTSContext";
+import { useTTS } from "@/services/tts";
frontend/src-tauri/src/tts.rs (4)

121-129: Consider using the hex crate for encoding.

This custom hex encoding implementation works correctly, but the hex crate is the idiomatic choice in the Rust ecosystem and would simplify maintenance.

♻️ Alternative using the hex crate

Add to Cargo.toml:

hex = "0.4"

Replace the function:

-fn bytes_to_hex(bytes: &[u8]) -> String {
-    const HEX: &[u8; 16] = b"0123456789abcdef";
-    let mut out = String::with_capacity(bytes.len() * 2);
-    for &b in bytes {
-        out.push(HEX[(b >> 4) as usize] as char);
-        out.push(HEX[(b & 0x0f) as usize] as char);
-    }
-    out
-}
+use hex;
+
+fn bytes_to_hex(bytes: &[u8]) -> String {
+    hex::encode(bytes)
+}

203-276: Consider decomposing this function for better maintainability.

The preprocess_text function handles multiple text transformations (normalization, markdown removal, symbol replacement, punctuation fixes) in a single 73-line function. Breaking it into smaller, focused helper functions would improve readability and testability.

For example:

fn preprocess_text(text: &str) -> String {
    let text = normalize_unicode(text);
    let text = strip_markdown(&text);
    let text = normalize_symbols(&text);
    let text = fix_punctuation_spacing(&text);
    let text = cleanup_whitespace(&text);
    ensure_ending_punctuation(&text)
}

477-479: Hardcoded synthesis parameters limit configurability.

The chunk size (max_len = 300) and silence duration (0.05) are hardcoded. While these initial values are reasonable, making them configurable would improve flexibility for future tuning.

This aligns with the hardcoded voice selection (F2) and synthesis parameters (10 steps, 1.2× speed) mentioned in the PR objectives—all reasonable choices for the initial implementation but good candidates for future parameterization.


505-541: Add validation for the speed parameter.

Line 539 divides duration by speed without validation. If speed is zero or negative, this will panic or produce unexpected results. While currently hardcoded to 1.2, adding defensive validation would prevent future issues.

♻️ Suggested validation
 fn infer(
     &mut self,
     text_list: &[String],
     style: &Style,
     total_step: usize,
     speed: f32,
 ) -> Result<(Vec<f32>, Vec<f32>)> {
+    if speed <= 0.0 {
+        return Err(anyhow::anyhow!("Speed must be positive, got {}", speed));
+    }
+
     let bsz = text_list.len();
     let (text_ids, text_mask) = self.text_processor.call(text_list);
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d53d03 and 6509a68.

⛔ Files ignored due to path filters (1)
  • frontend/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • docs/tts-research.md
  • frontend/src-tauri/Cargo.toml
  • frontend/src-tauri/src/lib.rs
  • frontend/src-tauri/src/tts.rs
  • frontend/src-tauri/tauri.conf.json
  • frontend/src/app.tsx
  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/services/tts/TTSContext.tsx
  • frontend/src/services/tts/index.ts
  • justfile
🚧 Files skipped from review as they are similar to previous changes (4)
  • frontend/src/components/TTSDownloadDialog.tsx
  • frontend/src-tauri/Cargo.toml
  • justfile
  • frontend/src-tauri/tauri.conf.json
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Use path aliases (@/* maps to ./src/*) for imports in TypeScript/React files
Use 2-space indentation, double quotes, and enforce 100-character line limit in TypeScript/React code
Maintain strict TypeScript and avoid using any type
Use PascalCase for component names and camelCase for variables and function names
Use functional components with React hooks instead of class components
Use React context for global state management and TanStack Query for server state management
Run just format, just lint, and just build after making TypeScript/React changes to ensure code quality and compilation

Files:

  • frontend/src/services/tts/TTSContext.tsx
  • frontend/src/services/tts/index.ts
  • frontend/src/components/UnifiedChat.tsx
  • frontend/src/app.tsx
🧠 Learnings (4)
📚 Learning: 2025-12-08T19:55:33.330Z
Learnt from: CR
Repo: OpenSecretCloud/Maple PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-08T19:55:33.330Z
Learning: Applies to **/*.{ts,tsx} : Use React context for global state management and TanStack Query for server state management

Applied to files:

  • frontend/src/services/tts/TTSContext.tsx
  • frontend/src/components/UnifiedChat.tsx
📚 Learning: 2025-09-21T01:46:39.084Z
Learnt from: AnthonyRonning
Repo: OpenSecretCloud/Maple PR: 246
File: frontend/src/utils/platform/index.ts:28-44
Timestamp: 2025-09-21T01:46:39.084Z
Learning: In TypeScript/JavaScript projects, when creating a convenience re-export index file (e.g., `src/utils/platform/index.ts`) that aggregates exports from a sibling module (e.g., `src/utils/platform.ts`), the relative import path `../platform` is correct and necessary. Using an absolute path like `@/utils/platform` from the index file would create a circular dependency since the absolute path resolves to the index file itself due to Node.js module resolution preferring index files in directories.

Applied to files:

  • frontend/src/services/tts/index.ts
📚 Learning: 2025-09-21T01:46:39.084Z
Learnt from: AnthonyRonning
Repo: OpenSecretCloud/Maple PR: 246
File: frontend/src/utils/platform/index.ts:28-44
Timestamp: 2025-09-21T01:46:39.084Z
Learning: In TypeScript/JavaScript projects, when creating a convenience re-export index file (e.g., `src/utils/platform/index.ts`) that aggregates exports from a sibling module (e.g., `src/utils/platform.ts`), the relative import path `../platform` is correct and necessary. Using an absolute path like `@/utils/platform` from the index file would create a circular dependency since the absolute path resolves to the index file itself due to Node.js module resolution preferring index files in directories. This is a standard and proper pattern for module organization.

Applied to files:

  • frontend/src/services/tts/index.ts
📚 Learning: 2025-08-30T22:07:39.291Z
Learnt from: AnthonyRonning
Repo: OpenSecretCloud/Maple PR: 212
File: frontend/src/billing/billingApi.ts:652-674
Timestamp: 2025-08-30T22:07:39.291Z
Learning: In frontend/src/billing/billingApi.ts and similar TypeScript files, the team prefers to rely on TypeScript's type inference through function return type declarations rather than adding explicit type casts to response.json() calls. This reduces code verbosity while maintaining type safety.

Applied to files:

  • frontend/src/app.tsx
🧬 Code graph analysis (4)
frontend/src-tauri/src/lib.rs (1)
frontend/src-tauri/src/tts.rs (8)
  • new (174-176)
  • new (451-456)
  • tts_get_status (722-744)
  • tts_download_models (755-892)
  • tts_load_models (895-913)
  • tts_synthesize (923-965)
  • tts_unload_models (968-974)
  • tts_delete_models (977-994)
frontend/src/services/tts/TTSContext.tsx (1)
frontend/src/services/tts/index.ts (3)
  • TTSStatus (2-2)
  • TTSProvider (1-1)
  • useTTS (1-1)
frontend/src/components/UnifiedChat.tsx (2)
frontend/src/services/tts/TTSContext.tsx (1)
  • useTTS (322-328)
frontend/src/components/TTSDownloadDialog.tsx (1)
  • TTSDownloadDialog (18-203)
frontend/src/app.tsx (5)
frontend/src/services/tts/TTSContext.tsx (1)
  • TTSProvider (62-320)
frontend/src/components/BillingServiceProvider.tsx (1)
  • BillingServiceProvider (5-14)
frontend/src/components/ProxyEventListener.tsx (1)
  • ProxyEventListener (7-58)
frontend/src/components/UpdateEventListener.tsx (1)
  • UpdateEventListener (11-65)
frontend/src/components/DeepLinkHandler.tsx (1)
  • DeepLinkHandler (9-109)
🪛 LanguageTool
docs/tts-research.md

[grammar] ~5-~5: Ensure spelling is correct
Context: ...ligible levels.The 44 Million Parameter AdvantageThe model operates with approximately 44 mi...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[uncategorized] ~5-~5: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...d) can often reside entirely within the System Level Cache (SLC) or high-speed RAM partition...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[typographical] ~5-~5: Consider using an en dash here instead of a hyphen.
Context: ... to be under a certain threshold, often 200MB-4GB depending on OS version) and the Google...

(QB_NEW_EN_DASH_RULE_EN)


[grammar] ~5-~5: Use a hyphen to join words.
Context: ...features. The dynamic nature of its flow matching steps and custom tokenizers mak...

(QB_NEW_EN_HYPHEN)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...he Tauri Framework Context: Integration RealitiesThe user's choice of Tauri as the applicati...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...It compiles down to a native binary.The Webview (Frontend): Written in web technologies...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...).3.1 The "Sidecar Pattern" and Desktop SuccessOn desktop operating systems, Tauri suppor...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...obile Wall": Why Sidecars Fail on iOS & AndroidThe user's requirement for iOS and Android ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...fecycle.The Dependency Hell of Embedded PythonThe alternative to a sidecar is embedding t...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...ource path helpers).Android Integration StrategyJNI and Shared Libraries: Android requires ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...g the models.5.1 Real-Time Factor (RTF) BenchmarksThe "Real-Time Factor" measures the speed o...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...s the integration roadmap.6.1 Rust Core ConfigurationThe integration avoids the sidecar pattern ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Ensure spelling is correct
Context: ...hin the Rust process.Step 1: Dependency ManagementIn src-tauri/Cargo.toml:Ini, TOML[dependen...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~13-~13: Ensure spelling is correct
Context: ...back rodio = "0.19" Step 2: Model Asset BundlingThe 268MB model file must be accessible to ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~26-~26: Ensure spelling is correct
Context: ...capability.10. Conclusion and Strategic RecommendationThe comparative analysis yields a definitiv...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~162-~162: Consider using a shorter alternative to avoid wordiness.
Context: ...ox models (350-500 million parameters), making it better suited for resource-constrained devices...

(MADE_IT_JJR)

🪛 markdownlint-cli2 (0.18.1)
docs/tts-research.md

26-26: Bare URL used

(MD034, no-bare-urls)


26-26: Bare URL used

(MD034, no-bare-urls)


169-169: Bare URL used

(MD034, no-bare-urls)


175-175: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-android
  • GitHub Check: build-macos (universal-apple-darwin)
  • GitHub Check: build-linux
  • GitHub Check: build-ios
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (14)
docs/tts-research.md (1)

97-102: Correct voice count: Supertonic 2 includes 10 preset voices (5 male, 5 female), not 11.

The multilingual support claim (English, Korean, Spanish, Portuguese, and French) and January 6, 2026 release date are accurate according to the HuggingFace model card. However, the voice count is 10 total voices—5 female (F1–F5) and 5 male (M1–M5)—not 11 as stated.

Likely an incorrect or invalid review comment.

frontend/src/app.tsx (1)

102-109: LGTM! TTS context properly integrated.

The TTSProvider correctly wraps the application tree, making TTS functionality available to all child components. The nesting order is appropriate with TTSProvider providing context to BillingServiceProvider and downstream components.

frontend/src/services/tts/index.ts (1)

1-2: LGTM! Proper re-export pattern.

The re-export pattern correctly uses relative imports from the sibling TTSContext module. This barrel export enables cleaner imports throughout the codebase (e.g., import { useTTS } from "@/services/tts").

Based on learnings, this relative import pattern is correct and necessary to avoid circular dependencies.

frontend/src-tauri/src/lib.rs (1)

6-7: LGTM! TTS backend properly integrated for desktop builds.

The TTS module is correctly:

  • Conditionally compiled for desktop only via #[cfg(desktop)]
  • Managed in application state with TTSState::new()
  • Exposed through six Tauri commands (get_status, download_models, load_models, synthesize, unload_models, delete_models)

The non-desktop build path correctly excludes TTS, aligning with the desktop-only requirement mentioned in PR objectives.

Also applies to: 39-39, 49-54

frontend/src/services/tts/TTSContext.tsx (4)

1-12: LGTM! Imports are clean and properly organized.

All necessary React hooks and Tauri APIs are imported. The imports follow the coding guidelines with proper path aliases.


62-126: LGTM! Provider initialization and status checking are well-implemented.

The TTSProvider correctly:

  • Checks Tauri desktop environment once at mount (line 64)
  • Initializes status based on environment (line 67)
  • Auto-checks status on mount for Tauri environments (lines 122-126)
  • Handles model loading automatically when models are downloaded but not loaded (lines 100-110)

Error handling and state transitions are clean and well-structured.


333-343: LGTM! Text preprocessing correctly handles think blocks.

The preprocessTextForTTS function appropriately:

  • Removes completed <think>...</think> blocks (line 337)
  • Handles unclosed <think> tags for streaming edge cases (line 340)
  • Returns trimmed text suitable for speech synthesis

This ensures users don't hear the model's internal reasoning when using TTS.


275-297: LGTM! Cleanup effect is comprehensive and correct.

The unmount cleanup properly handles:

  • Event listener cleanup (unlistenRef)
  • Audio source node cleanup (sourceNodeRef)
  • AudioContext closure (audioContextRef)
  • Object URL revocation (audioUrlRef)

All potential resource leaks are prevented.

frontend/src/components/UnifiedChat.tsx (4)

192-278: LGTM! TTSButton implementation is solid.

The TTSButton component is well-implemented with:

  • Proper TTS state management via useTTS hook
  • Click handler for speak/stop (lines 222-235)
  • Long press (500ms) to open manage dialog (lines 237-248)
  • Timer cleanup in useEffect (lines 209-215)
  • Desktop-only rendering via isTauriEnv check (lines 218-220)
  • Appropriate disabled states during processing (lines 250-254)
  • Accessible aria-labels (line 267)
  • Visual feedback with loading spinner and play/stop icons (lines 269-275)

The pointer event handling (onPointerDown, onPointerUp, onPointerLeave) provides good cross-platform support.


790-800: Consider reviewer feedback on button visibility.

Based on the PR objectives, a reviewer (marksftw) noted that the copy and speaker buttons are only visible on hover and recommended making them always visible for better discoverability, citing ChatGPT as an example.

Currently, the buttons are rendered in a flex container (line 792), but I don't see explicit hover-only classes in this code. The visibility may be controlled by the Button component's default styling. Consider verifying whether these controls should be made always visible instead of hover-only.

Could you verify the current visibility behavior of the copy and TTS buttons in the assistant messages? If they're currently hover-only, consider making them always visible as suggested by the reviewer for improved discoverability.


468-478: LGTM! MessageList TTS integration is clean.

The TTS callbacks are properly typed and passed through to MessageList:

  • onTTSSetupOpen triggers the TTS setup dialog
  • onTTSManage also triggers the setup dialog (for management)

The implementation correctly wires the dialog state to these callbacks.

Also applies to: 2721-2722


3382-3383: LGTM! TTS dialog properly integrated.

The TTSDownloadDialog is correctly rendered with controlled open state, enabling users to download and manage TTS models.

frontend/src-tauri/src/tts.rs (2)

754-892: Excellent download integrity verification! 🔒

This implementation demonstrates strong security practices:

  • Pinned revision (line 52) ensures reproducible downloads
  • SHA-256 checksums (lines 868-876) prevent tampering or corruption
  • Dual size validation catches both incomplete downloads and mismatches with expected constants
  • Atomic file operations (flush → sync → rename from .part, lines 879-885) prevent partial writes
  • Partial download cleanup (line 796) maintains clean state on retry

The layered verification approach (HTTP Content-Length → expected size → SHA-256) provides defense in depth. This addresses the "fix: pin and verify TTS model downloads" commit mentioned in the PR objectives.


922-965: Good concurrency pattern with early lock release.

The implementation correctly:

  • Clones the style reference to avoid borrow conflicts (lines 929-934)
  • Releases the Mutex guard before encoding (line 953), minimizing lock hold time during the expensive base64 encoding operation

The hardcoded synthesis parameters (total_step=10, speed=1.2) match the PR objectives and are appropriate for the initial implementation. These could be exposed as API parameters in future iterations for greater flexibility.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Implements on-device TTS using Supertonic with ONNX Runtime inference, including secure model downloads with pinned HuggingFace revisions and SHA256 verification. The implementation is well-architected with proper separation of concerns between Rust backend and React frontend.

Key Changes:

  • Rust backend (tts.rs): 994 lines implementing text preprocessing, ONNX inference pipeline (duration prediction, text encoding, denoising, vocoding), and Tauri commands for model lifecycle management
  • React context (TTSContext.tsx): State management with download progress tracking, Web Audio API playback, and automatic model loading
  • UI components: Download dialog with progress indicators and speaker buttons on assistant messages (desktop-only)
  • Security: Pinned model revision and SHA256 checksum verification for all 10 model files (~265MB total)
  • Platform: Desktop-only feature with conditional compilation, proper Linux AppImage media framework bundling

Issues Found:

  • Critical: Missing AudioContext.resume() handling for browser autoplay policy (line 233 in TTSContext.tsx)
  • Minor: Long-press timer cleanup issue in UnifiedChat.tsx that could leak on early unmount (line 206)
  • Enhancement opportunities: Hardcoded voice (F2), inference parameters (10 steps, 1.2x speed), and chunk silence (50ms) with no configuration API

Confidence Score: 4/5

  • Safe to merge with one critical fix needed for AudioContext autoplay handling
  • The implementation is solid with excellent security practices (pinned models, SHA256 verification), robust error handling, and proper resource management. However, the missing AudioContext.resume() handling is a critical issue that will cause silent failures on many browsers due to autoplay policies. The timer cleanup issue is minor but should be addressed. Once the AudioContext issue is fixed, this is production-ready.
  • Pay close attention to frontend/src/services/tts/TTSContext.tsx for the AudioContext autoplay fix and frontend/src/components/UnifiedChat.tsx for the timer cleanup

Important Files Changed

File Analysis

Filename Score Overview
frontend/src-tauri/src/tts.rs 4/5 Comprehensive TTS implementation using ONNX Runtime with robust security (pinned models, SHA256 verification), but hardcoded voice and inference parameters. Found minor issues with error handling.
frontend/src/services/tts/TTSContext.tsx 3/5 React context for TTS state management with good error handling, but missing AudioContext.resume() handling for browser autoplay policy. Clean resource management.
frontend/src/components/TTSDownloadDialog.tsx 5/5 Well-designed UI component for TTS setup with clear status states, progress indicators, and error handling. No issues found.
frontend/src/components/UnifiedChat.tsx 3/5 Added TTS button component to chat messages. Minor issue with long-press timer cleanup logic that could leak if component unmounts during timer.
frontend/src-tauri/src/lib.rs 5/5 Proper conditional compilation for desktop-only TTS feature with all commands registered correctly. No issues.
frontend/src-tauri/Cargo.toml 5/5 TTS dependencies properly scoped to desktop platforms only with appropriate feature flags. No issues.

Sequence Diagram

sequenceDiagram
    participant User
    participant UI as TTSDownloadDialog
    participant Context as TTSContext
    participant Tauri as Tauri IPC
    participant Rust as TTS Module (Rust)
    participant HF as HuggingFace

    Note over User,HF: Initial Setup Flow
    User->>UI: Opens TTS Dialog
    UI->>Context: checkStatus()
    Context->>Tauri: invoke("tts_get_status")
    Tauri->>Rust: tts_get_status()
    Rust-->>Tauri: {models_downloaded, models_loaded}
    Tauri-->>Context: Status response
    Context-->>UI: status="not_downloaded"

    User->>UI: Clicks Download
    UI->>Context: startDownload()
    Context->>Context: Set status="downloading"
    Context->>Tauri: listen("tts-download-progress")
    Context->>Tauri: invoke("tts_download_models")
    
    loop For each model file
        Rust->>HF: Download model file
        HF-->>Rust: Stream chunks
        Rust->>Rust: Verify SHA256 checksum
        Rust->>Tauri: emit("tts-download-progress")
        Tauri-->>Context: Progress event
        Context-->>UI: Update progress bar
    end

    Rust-->>Tauri: Download complete
    Context->>Context: Set status="loading"
    Context->>Tauri: invoke("tts_load_models")
    Tauri->>Rust: tts_load_models()
    Rust->>Rust: Load ONNX models into memory
    Rust->>Rust: Load voice style (F2)
    Rust-->>Tauri: Models loaded
    Tauri-->>Context: Success
    Context-->>UI: status="ready"

    Note over User,HF: TTS Synthesis Flow
    User->>UI: Clicks speaker button on message
    UI->>Context: speak(text, messageId)
    Context->>Context: Preprocess text (remove think blocks)
    Context->>Tauri: invoke("tts_synthesize", {text})
    Tauri->>Rust: tts_synthesize(text)
    
    Rust->>Rust: preprocess_text() - remove markdown
    Rust->>Rust: chunk_text() - split by sentences
    
    loop For each chunk
        Rust->>Rust: Unicode indexing
        Rust->>Rust: Duration prediction (ONNX)
        Rust->>Rust: Text encoding (ONNX)
        Rust->>Rust: Sample noisy latent
        loop Denoising steps (10 iterations)
            Rust->>Rust: Vector estimation (ONNX)
        end
        Rust->>Rust: Vocoder synthesis (ONNX)
        Rust->>Rust: Append 50ms silence
    end
    
    Rust->>Rust: Encode WAV to base64
    Rust-->>Tauri: {audio_base64, sample_rate}
    Tauri-->>Context: Synthesis response
    Context->>Context: Decode base64 to Blob
    Context->>Context: Create AudioContext
    Context->>Context: Decode audio data
    Context->>Context: Play via Web Audio API
    Context-->>UI: Update isPlaying state
    
    Note over Context: Audio playback complete
    Context->>Context: Cleanup AudioContext
    Context-->>UI: isPlaying=false
Loading

@github-actions
Copy link
Contributor

github-actions bot commented Jan 8, 2026

✅ TestFlight deployment completed successfully!

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Adds on-device text-to-speech functionality using Supertonic TTS engine with ONNX Runtime for desktop platforms. The implementation downloads ~265MB of models from HuggingFace on first use, performs all synthesis locally, and integrates cleanly into the chat UI.

Key Changes

  • Backend (Rust): Comprehensive TTS engine with ONNX model loading, text preprocessing (markdown stripping), chunking, and WAV synthesis with SHA256 verification for downloads
  • Frontend (React): TTSContext for state management, TTSDownloadDialog for setup flow, and speaker buttons in chat messages with play/stop controls
  • Architecture: Desktop-only feature using conditional compilation, with proper cleanup of audio resources and download progress tracking

Issues Found

  • State management race condition: isPlaying state set before async synthesis completes, could leave state inconsistent on error
  • Mutex contention: TTS synthesis holds lock during expensive ONNX inference, blocking concurrent status checks
  • Minor UX issues: Long-press gesture timing could conflict with click, AudioContext accumulation, and hardcoded F2 voice style

Security & Quality

  • Strong security: SHA256 checksums verify all downloads, pinned HuggingFace revision prevents supply chain attacks
  • Good error handling and cleanup throughout
  • Desktop-only compilation prevents mobile build issues

Confidence Score: 4/5

  • Safe to merge with minor race condition in state management that should be addressed
  • The implementation is well-structured with strong security practices (SHA256 verification, pinned revisions) and comprehensive error handling. However, there's a state management race condition in TTSContext where isPlaying is set before synthesis completes, and the mutex locking strategy in Rust could cause UX lag during synthesis. These are non-critical issues that don't affect core functionality but should be improved for production quality.
  • Pay close attention to frontend/src/services/tts/TTSContext.tsx (state race condition at line 222) and frontend/src-tauri/src/tts.rs (mutex contention at line 927)

Important Files Changed

File Analysis

Filename Score Overview
frontend/src-tauri/src/tts.rs 4/5 New TTS engine using ONNX Runtime - comprehensive implementation with model download, SHA256 verification, text preprocessing, and WAV synthesis
frontend/src/services/tts/TTSContext.tsx 4/5 React context for TTS state management - handles model lifecycle, download progress, audio playback with Web Audio API
frontend/src/components/TTSDownloadDialog.tsx 5/5 Clean UI component for TTS setup dialog with download progress, status indicators, and model management
frontend/src/components/UnifiedChat.tsx 4/5 Integrated TTS button into chat messages with play/stop controls and long-press management gesture
frontend/src-tauri/Cargo.toml 5/5 Added TTS dependencies (ONNX, ndarray, hound, etc.) with desktop-only target configuration

Sequence Diagram

sequenceDiagram
    participant User
    participant UI as UnifiedChat
    participant TTS as TTSContext
    participant Rust as tts.rs (Tauri)
    participant HF as HuggingFace
    participant ONNX as ONNX Runtime

    Note over User,ONNX: First-Time Setup Flow
    User->>UI: Click speaker icon
    UI->>TTS: Check status
    TTS->>Rust: tts_get_status()
    Rust-->>TTS: models_downloaded: false
    TTS->>UI: Show download dialog
    User->>UI: Confirm download
    UI->>TTS: startDownload()
    TTS->>Rust: tts_download_models()
    loop For each model file
        Rust->>HF: Download model file
        HF-->>Rust: Stream file chunks
        Rust->>Rust: Verify SHA256 checksum
        Rust->>TTS: Emit progress event
        TTS->>UI: Update progress bar
    end
    Rust-->>TTS: Download complete
    TTS->>Rust: tts_load_models()
    Rust->>ONNX: Load 4 ONNX models
    ONNX-->>Rust: Models ready
    Rust->>Rust: Load voice style (F2)
    Rust-->>TTS: Status: ready
    TTS->>UI: Show ready state

    Note over User,ONNX: Speech Synthesis Flow
    User->>UI: Click speaker icon
    UI->>TTS: speak(text, messageId)
    TTS->>TTS: preprocessTextForTTS()
    TTS->>Rust: tts_synthesize(text)
    Rust->>Rust: preprocess_text() - strip markdown
    Rust->>Rust: chunk_text() - split into sentences
    loop For each text chunk
        Rust->>ONNX: duration_predictor.run()
        Rust->>ONNX: text_encoder.run()
        Rust->>Rust: sample_noisy_latent()
        loop Denoising steps (10)
            Rust->>ONNX: vector_estimator.run()
        end
        Rust->>ONNX: vocoder.run()
        Rust->>Rust: Append WAV chunk
    end
    Rust->>Rust: wav_to_base64()
    Rust-->>TTS: { audio_base64, sample_rate }
    TTS->>TTS: base64ToBlob()
    TTS->>TTS: Create AudioContext
    TTS->>TTS: source.start()
    TTS-->>UI: Playing
    UI-->>User: Audio plays
Loading

@AnthonyRonning AnthonyRonning merged commit cf2b890 into master Jan 8, 2026
15 of 16 checks passed
@AnthonyRonning AnthonyRonning deleted the feature/tts-supertonic branch January 8, 2026 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Text-to-speech voice responses

3 participants