Text-to-speech using Voxtral-4B-TTS, powered by MLX for efficient inference on Apple Silicon.
Featuring real-time streaming audio, multiple languages and voice presets, and both interactive and command-line interfaces.
- macOS with Apple Silicon (M1, M2, M3, etc.)
- uv - Python package manager (
brew install uv) - FFmpeg - for audio playback (
brew install ffmpeg)
Clone this repository and sync:
git clone https://github.com/tetsuo/vox.git
cd vox
uv syncNote that
mlx-audiois installed from the source repository (contains the latest Voxtral TTS support).
Type uv run vox --help to see all options:
usage: vox [options]
options:
-h, --help show this help message and exit
--voice VOICE voice preset (default: casual_male)
--text TEXT text to speak; use - to read from stdin
--save PATH save generated audio to a WAV file
--save-dir DIR auto-save each utterance to DIR/ (interactive mode)
--no-play generate audio but do not play it
--list-voices print available voices and exit
--chunk-frames N streaming chunk size in LM frames (default: 25, ~2s per chunk)
--model MODEL HuggingFace repo ID or local path
(default: mlx-community/Voxtral-4B-TTS-2603-mlx-6bit)
Speak a single phrase and exit:
uv run vox --text "hello world"
uv run vox --voice fr_female --text "bonjour le monde"Read from STDIN:
echo "Hello world" | uv run vox
uv run vox --text=-Generate audio without playback and save to WAV:
uv run vox --text "hello" --save output.wav --no-playAuto-save every utterance in interactive mode:
uv run vox --save-dir ./takesStart the interactive shell:
uv run voxThen type text to speak.
Built-in commands:
:voice <name>- Switch voice (e.g.,:voice fr_female):voices- List all available voices:help- Show help:quitor:q- Exit
The model supports 22 voices across 9 languages:
| Language | Voices |
|---|---|
| English | casual_male, casual_female, cheerful_female, neutral_male, neutral_female |
| French | fr_male, fr_female |
| Spanish | es_male, es_female |
| German | de_male, de_female |
| Italian | it_male, it_female |
| Portuguese | pt_male, pt_female |
| Dutch | nl_male, nl_female |
| Arabic | ar_male |
| Hindi | hi_male, hi_female |
- Voxtral-4B-TTS Model: https://huggingface.co/mistralai/Voxtral-4B-TTS-2603
- MLX Quantized Model: https://huggingface.co/mlx-community/Voxtral-4B-TTS-2603-mlx-6bit
- Mistral Announcement: https://mistral.ai/news/voxtral-tts
- MLX Framework: https://github.com/ml-explore/mlx