interrupt speech with button click#9
Conversation
|
Thanks for submitting the PR @RomanLut! This is one of the items on the roadmap. Let me try this. Previously the problem I faced in implementing interrupt was -- the audio bytes (from OpenAI) were still on the way to the ESP32 when the user pressed the button. So it doesn't switch back to listening clearly and the audio it hears (from the user) is murky/sped up. Example case: Server sends: "Hey! How can I help interrupt you today?" [User only hears "Hey! How can I help"] So there was a processing conflict while the "you today?" was still on the way to the ESP32 over the websocket and while it registered the overlapping audio from the user "What's the weather". |
|
In this PR, the behavior is as follows: When the user clicks the button, the client sends the following JSON to the server: I also added the following command on server: This should instruct the LLM to stop audio generation. As a result, I expect the speaking to stop after a brief delay, and the client should return to listening mode as usual. |
|
@RomanLut I tested this locally but it did not capture the interrupt. Have you tried testing it since? Since I added Gemini Live API as well, this PR will only capture interrupts on OpenAI. I will leave it open for now in case someone wants to pick it up. |
|
Sorry, I haven’t had much time lately for my hobby projects. |
This PR should allow speech to be interrupted via a button click.
Unfortunately, I haven't been able to verify it, as I haven't compiled the server.