Skip to content

PocketTTS sessions#471

Open
danielrothmann wants to merge 6 commits intoFluidInference:mainfrom
42futures:feature/pocket-tts-sessions
Open

PocketTTS sessions#471
danielrothmann wants to merge 6 commits intoFluidInference:mainfrom
42futures:feature/pocket-tts-sessions

Conversation

@danielrothmann
Copy link
Copy Markdown

@danielrothmann danielrothmann commented Mar 30, 2026

This PR implements a session API for PocketTTS. Closes #465

The goal was to improve reliability of long-running sessions with streaming text input. Previously, each call to synthesizeStreaming() paid the full voice prefill cost (~125 sequential CoreML predictions) and reset Mimi decoder state, causing latency and audio discontinuity between utterances.

PocketTtsSession is a new actor that performs voice prefill once at creation, then accepts streamed text via enqueue(). Each utterance only pays the text prefill cost. Mimi decoder state persists across utterances for audio continuity.

Cancellation is awaitable: await session.cancel() blocks until the generation task has fully stopped and the Neural Engine is free, preventing multiple inference loops from stacking up. If the consumer drops the frames stream, generation is cancelled automatically.

AudioFrame now includes an utteranceIndex field for text synchronisation on the consumer side.


Open with Devin

devin-ai-integration[bot]

This comment was marked as resolved.

@danielrothmann
Copy link
Copy Markdown
Author

@Alex-Wengg the merge checks seem to pass but always stop on "test-tts" - I'm struggling to find more information about what that is or why it's not completing. Any idea?

@Alex-Wengg
Copy link
Copy Markdown
Member

@danielrothmann thats fine we can ignore test tts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Session API for PocketTTS

2 participants