Skip to content

interrupt speech with button click#9

Open
RomanLut wants to merge 2 commits into
akdeb:mainfrom
RomanLut:submit-stop-talking
Open

interrupt speech with button click#9
RomanLut wants to merge 2 commits into
akdeb:mainfrom
RomanLut:submit-stop-talking

Conversation

@RomanLut
Copy link
Copy Markdown
Contributor

@RomanLut RomanLut commented Jun 5, 2025

This PR should allow speech to be interrupted via a button click.
Unfortunately, I haven't been able to verify it, as I haven't compiled the server.

@akdeb
Copy link
Copy Markdown
Owner

akdeb commented Jun 6, 2025

Thanks for submitting the PR @RomanLut! This is one of the items on the roadmap.

Let me try this. Previously the problem I faced in implementing interrupt was -- the audio bytes (from OpenAI) were still on the way to the ESP32 when the user pressed the button. So it doesn't switch back to listening clearly and the audio it hears (from the user) is murky/sped up.

Example case:

Server sends: "Hey! How can I help interrupt you today?" [User only hears "Hey! How can I help"]
User says: "What's the weather like in Baku, Azerbaijan?"
OpenAI hears: gibberish ..... like in Baku, Azerbaijan?

So there was a processing conflict while the "you today?" was still on the way to the ESP32 over the websocket and while it registered the overlapping audio from the user "What's the weather".

@akdeb akdeb self-requested a review June 6, 2025 10:40
@RomanLut
Copy link
Copy Markdown
Contributor Author

RomanLut commented Jun 6, 2025

In this PR, the behavior is as follows:

When the user clicks the button, the client sends the following JSON to the server:
{"type": "instruction", "msg": "INTERRUPT", "audio_end_ms": 1000}
The server already contains partial support for handling the interruption.

I also added the following command on server:

client.realtime.send("response.cancel", {
    type: "response.cancel",
    event_id: RealtimeUtils.generateId("evt_")
});

This should instruct the LLM to stop audio generation.

As a result, I expect the speaking to stop after a brief delay, and the client should return to listening mode as usual.

@akdeb
Copy link
Copy Markdown
Owner

akdeb commented Aug 22, 2025

@RomanLut I tested this locally but it did not capture the interrupt. Have you tried testing it since? Since I added Gemini Live API as well, this PR will only capture interrupts on OpenAI. I will leave it open for now in case someone wants to pick it up.

@RomanLut
Copy link
Copy Markdown
Contributor Author

RomanLut commented Nov 4, 2025

Sorry, I haven’t had much time lately for my hobby projects.
Interruption works with ChatGPT, but after an interruption, the device plays a short fragment of the previous sound before the next response. I suspect some audio buffers need to be cleared on the server side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants