Skip to content

Latest commit

 

History

History
113 lines (91 loc) · 9.1 KB

File metadata and controls

113 lines (91 loc) · 9.1 KB
title Introduction
description Local audio AI for Apple devices — speech-to-text, speaker diarization, voice activity detection, and text-to-speech on the Neural Engine.

FluidAudio is a Swift SDK for fully local, low-latency audio AI on Apple devices. All inference runs on the Apple Neural Engine (ANE), keeping CPU and GPU free for your app.

At a Glance

Capability Model Speed Accuracy Languages
Transcription Parakeet TDT 0.6B 210x RTFx 2.5% WER (en), 14.7% avg (25 lang) 25 European
Streaming ASR Parakeet EOU 120M 12x RTFx 4.9% WER (en) English
Speaker Diarization Pyannote CoreML 122x RTFx 15% DER (offline) Language-agnostic
Streaming Diarization Sortformer 127x RTFx 31.7% DER Language-agnostic
Voice Activity Silero VAD v6 1230x RTFx 96% accuracy Language-agnostic
Text-to-Speech Kokoro 82M 23x RTFx 48 voices English
Text-to-Speech PocketTTS 155M Streaming ~80ms first audio English

All benchmarks on M4 Pro. ASR on LibriSpeech / FLEURS, diarization on VoxConverse / AMI, VAD on VOiCES / MUSAN. See full benchmarks for per-language breakdowns and device comparisons.

When to Use Which

Transcription

Need Use Why
Transcribe recordings/files Parakeet TDT v3 Fastest, 25 languages, 210x real-time
English-only, best accuracy Parakeet TDT v2 2.1% WER vs 2.5% on LibriSpeech
Live captions as user speaks Parakeet EOU 160ms chunks, end-of-utterance detection
Domain-specific terms (names, jargon) TDT + CTC vocabulary boosting 99.3% precision, 85.2% recall on earnings calls

Speaker Diarization

Need Use Why
Best accuracy (post-recording) Offline pipeline (VBx) 15% DER, full pyannote-compatible pipeline
Real-time "who's speaking now" Streaming pipeline 26% DER at 5s chunks, speaker tracking across chunks
Simple 2-4 speaker meetings Sortformer Single model, no clustering, 32% DER

Voice Activity Detection

Need Use Why
Segment audio before ASR Offline segmentation Clean segments with min/max duration controls
Real-time speech detection Streaming VAD Per-chunk events with hysteresis

Text-to-Speech

Need Use Why
Highest quality, full generation Kokoro 48 voices, SSML support, flow matching
Streaming audio (start playing fast) PocketTTS ~80ms to first audio, no espeak dependency

Platform Support

Platform Package
Swift (iOS / macOS) FluidAudio
React Native / Expo @fluidinference/react-native-fluidaudio
Rust / Tauri fluidaudio-rs

Showcase

40+ apps use FluidAudio for local speech recognition, speaker diarization, and text-to-speech.

App Description
Voice Ink Local AI for instant, private transcription with near-perfect accuracy. Uses Parakeet ASR.
Spokenly Mac dictation app for fast, accurate voice-to-text; supports real-time dictation and file transcription. Uses Parakeet ASR and speaker diarization.
Slipbox Privacy-first meeting assistant for real-time conversation intelligence. Uses Parakeet ASR (iOS) and speaker diarization across platforms.
Talat Privacy-focused AI meeting notes app. Featured in TechCrunch. Uses Parakeet ASR.
Paraspeech AI powered voice to text. Fully offline. No subscriptions.
OpenOats Open-source meeting note-taker that transcribes conversations in real time and surfaces relevant notes from your knowledge base.
Senko A very fast and accurate speaker diarization pipeline. A good example for Python integration.
macos-speech-server OpenAI compatible STT/transcription and TTS/speech API server.
Whisper Mate Transcribes movies and audio locally; records and transcribes in real time from speakers or system apps. Uses speaker diarization.
BoltAI Write content 10x faster using parakeet models.
Voxeoflow Mac dictation app with real-time translation. Lightning-fast transcription in over 100 languages.
WhisKey Privacy-first voice dictation keyboard for iOS and macOS. On-device transcription with 12+ languages, AI meeting summaries, and mindmap generation.
Summit AI Notes Local meeting transcription and summarization with speaker identification. Supports 100+ languages.
Snaply Free, Fast, 100% local AI dictation for Mac.
Enconvo AI Agent Launcher for macOS with voice input, live captions, and text-to-speech.
Speakmac Mac app that lets you type anywhere on your Mac using your voice. Fully local, private dictation built on FluidAudio.
Starling Open Source, fully local voice-to-text transcription with auto-paste at your cursor.
Altic/Fluid Voice Lightweight, fully free and Open Source Voice to Text dictation for macOS.
SamScribe Open-source macOS app that captures and transcribes audio from your microphone and meeting apps in real-time.
Dictate Anywhere Native macOS dictation app with global Fn key activation. Dictate into any app with 25 language support.
Hex macOS app that lets you press-and-hold a hotkey to record your voice, transcribe it, and paste into any application.
Super Voice Assistant Open-source macOS voice assistant with local transcription.
VoiceTypr Open-source voice-to-text dictation for macOS and Windows.
Ora Local voice assistant for macOS with speech recognition and text-to-speech.
Flowstay Easy text-to-speech, local post-processing and Claude Code integration for macOS. Free forever.
Meeting Transcriber macOS menu bar app that auto-detects, records, and transcribes meetings with dual-track speaker diarization.
Hitoku Draft A local, private, voice writing assistant on your macOS menu bar.
Audite macOS menu-bar app that records meetings and transcribes them locally into Markdown notes for Obsidian.
Muesli Native macOS dictation and meeting transcription with ~0.13s latency. Automatic speaker diarization.
NanoVoice Free iOS voice keyboard for fast, private dictation in any app.
MiniWhisper Open-source macOS menu bar for quick local voice-to-text with minimal setup.
Volocal Fully local voice AI on iOS. Uses streaming Parakeet EOU ASR and streaming PocketTTS.
VivaDicta Open-source iOS voice-to-text app with system-wide AI voice keyboard. 15+ AI providers, 40+ AI presets.
hongbomiao.com A personal R&D lab that facilitates knowledge sharing.
mac-whisper-speedtest Comparison of different local ASR, including one of the first versions of FluidAudio's ASR models.

Requirements

  • macOS 14+ / iOS 17+
  • Swift 5.10+
  • Apple Silicon recommended

Model Conversion

All FluidAudio models are converted through möbius, our open-source model conversion framework. It handles export, numerical validation, and quantization for CoreML and other edge runtimes. See the möbius docs to convert your own models.