Skip to content

tetsuo/vox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

vox

Text-to-speech using Voxtral-4B-TTS, powered by MLX for efficient inference on Apple Silicon.

Featuring real-time streaming audio, multiple languages and voice presets, and both interactive and command-line interfaces.

Watch Video

Prerequisites

  • macOS with Apple Silicon (M1, M2, M3, etc.)
  • uv - Python package manager (brew install uv)
  • FFmpeg - for audio playback (brew install ffmpeg)

Quickstart

Clone this repository and sync:

git clone https://github.com/tetsuo/vox.git
cd vox
uv sync

Note that mlx-audio is installed from the source repository (contains the latest Voxtral TTS support).

Usage

Type uv run vox --help to see all options:

usage: vox [options]

options:
  -h, --help        show this help message and exit
  --voice VOICE     voice preset (default: casual_male)
  --text TEXT       text to speak; use - to read from stdin
  --save PATH       save generated audio to a WAV file
  --save-dir DIR    auto-save each utterance to DIR/ (interactive mode)
  --no-play         generate audio but do not play it
  --list-voices     print available voices and exit
  --chunk-frames N  streaming chunk size in LM frames (default: 25, ~2s per chunk)
  --model MODEL     HuggingFace repo ID or local path
                    (default: mlx-community/Voxtral-4B-TTS-2603-mlx-6bit)

Examples

Speak a single phrase and exit:

uv run vox --text "hello world"
uv run vox --voice fr_female --text "bonjour le monde"

Read from STDIN:

echo "Hello world" | uv run vox
uv run vox --text=-

Generate audio without playback and save to WAV:

uv run vox --text "hello" --save output.wav --no-play

Auto-save every utterance in interactive mode:

uv run vox --save-dir ./takes

Interactive Mode

Start the interactive shell:

uv run vox

Then type text to speak.

Built-in commands:

  • :voice <name> - Switch voice (e.g., :voice fr_female)
  • :voices - List all available voices
  • :help - Show help
  • :quit or :q - Exit

Voices

The model supports 22 voices across 9 languages:

Language Voices
English casual_male, casual_female, cheerful_female, neutral_male, neutral_female
French fr_male, fr_female
Spanish es_male, es_female
German de_male, de_female
Italian it_male, it_female
Portuguese pt_male, pt_female
Dutch nl_male, nl_female
Arabic ar_male
Hindi hi_male, hi_female

Resources

About

Voxtral-TTS CLI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages