fix(web): embed agent text in voice ready event for readback by heavygee · Pull Request #682 · tiann/hapi

heavygee · 2026-05-24T20:31:51Z

Summary

Sorry if this is the wrong seam - I might be missing context from #640.

When a coding agent finishes (ready event), the ElevenLabs voice assistant often did not read back the agent's answer. The user had to ask repeatedly; ConvAI sometimes hallucinated partial summaries.

Root cause (I think):

formatReadyEvent used to tell ConvAI the summary was in "previous messages" without embedding assistant text inline.
formatMessage did not format Codex/Cursor stream-json payloads, so onMessages and session history context were empty for those sessions.

This PR embeds the last speakable assistant message in the ready inject (<text>…</text>) and teaches formatMessage / extractLastAssistantSpeakable the same codex stream-json path.

Test plan

bun test web/src/realtime/hooks/contextFormatters.test.ts (12 tests)
Manual dogfood: ElevenLabs conv_4501ksdt0athfhfr189tq3jehkcq (399s, Cursor session). At ~329s, ready inject included embedded <text>…</text> with the agent's last message; at ~330s voice summarized it accurately (e.g. DB 2.43 GiB → 842 MiB) without the user re-prompting.
Regression: Claude session voice still works

Issues

Fixes #681

Notes

Does not address bug(voice): context formatters hardcode "Claude Code" for all agent flavors #680 (hardcoded "Claude Code" label in formatPlainText) - happy to follow up separately if preferred.
Related to fix(web): relay Codex responses to voice sessions #640 (Codex voice context) but targets ready readback specifically.

Voice onReady now extracts the last speakable assistant message and embeds it in the ready inject so ConvAI can summarize without the user re-prompting. Also formats Codex/Cursor stream-json messages for live context updates and session history. Co-authored-by: Cursor <cursoragent@cursor.com>

Replace jellybot/subtitle dogfood strings in tests with generic examples. Co-authored-by: Cursor <cursoragent@cursor.com>

heavygee · 2026-05-24T20:43:13Z

Follow-up for #681 — thanks to HAPI Bot for pointing at the exact call sites.

This is intentionally small: four files in web/src/realtime/hooks/ only. Test plan includes a Cursor-session ElevenLabs dogfood conv where ready inject carried embedded <text>…</text> and voice read back without a re-prompt.

If anything should move into #640 instead, happy to split or close — just let me know.

github-actions

Findings

[Major] Preserve tool-call context for mixed Claude content arrays — formatMessage now calls extractSpeakableFromContent(content) before the existing content-array formatter. Since that helper also returns joined text for arrays, a Claude assistant payload with both text and tool_use blocks returns before the loop that emits Claude Code is using ..., so voice context drops tool-call updates that were previously sent. Evidence: web/src/realtime/hooks/contextFormatters.ts:103.
Suggested fix:
```
const speakable = !isContentArray(content)
    ? extractSpeakableFromContent(content)
    : null
if (speakable) {
    const roleForFormat = normalizedRole === 'user' ? 'user' : 'assistant'
    return formatPlainText(roleForFormat, speakable)
}
```

Summary

Review mode: initial
One regression found in voice context formatting for mixed assistant text/tool-use arrays.

Testing

Not run (automation): bun is not installed in this runner (bun: command not found).

github-actions · 2026-05-24T20:48:18Z

+        return null
+    }
+
+    const speakable = extractSpeakableFromContent(content)


[MAJOR] Preserve tool-call context for mixed Claude content arrays

formatMessage now calls extractSpeakableFromContent(content) before the existing content-array loop. Because that helper also returns joined text for arrays, a Claude assistant payload like [{ type: 'text', ... }, { type: 'tool_use', ... }] returns at line 106 and never formats the tool_use item. That regresses voice context by dropping tool-call updates that were previously emitted by the loop.

Suggested fix:

const speakable = !isContentArray(content) ? extractSpeakableFromContent(content) : null if (speakable) { const roleForFormat = normalizedRole === 'user' ? 'user' : 'assistant' return formatPlainText(roleForFormat, speakable) }

Add a regression test with a mixed text + tool_use array asserting both the text and tool-call lines are present.

Fixed in 818bf7c. Guarded the call with !isContentArray(content) so arrays fall through to the existing loop as before. Added a regression test with a mixed text+tool_use payload — was red without the guard, green after.

…essage extractSpeakableFromContent also handles content arrays (joins text items), so calling it unconditionally before the existing array loop caused mixed text+tool_use payloads to return early without formatting the tool_use item. Guard with !isContentArray so the loop handles arrays as before. Adds regression test: mixed text+tool_use array must produce both the text and the tool-call line (was red before this fix). via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

github-actions

Findings

[Major] Status events can be spoken as the last assistant answer — extractSpeakableFromContent now accepts any { data: { type: 'message', message } } envelope, not just Codex payloads. Existing sendSessionEvent({ type: 'message', message }) emits { role: 'agent', content: { type: 'event', data: ... } }, so abort/retry/failure status events can enter voice context as Claude Code: and can be selected by extractLastAssistantSpeakable for the ready readback instead of the actual final assistant response. Evidence: web/src/realtime/hooks/contextFormatters.ts:152, related producer cli/src/api/apiSession.ts:541.
Suggested fix:
```
if (isObject(content) && content.type === 'codex' && isObject(content.data)) {
    const data = content.data
    if (data.type === 'message' && typeof data.message === 'string' && data.message.trim()) {
        return data.message.trim()
    }
}
```

Questions

None.

Summary

Review mode: initial
One issue found: Codex message extraction is too broad and can turn agent status events into spoken assistant answers.

Testing

Not run: bun test web/src/realtime/hooks/contextFormatters.test.ts failed because bun is not available in this runner.

HAPI Bot

github-actions · 2026-05-24T22:53:32Z

+    }
+
+    // Codex / stream-json agent messages: { type: 'codex', data: { type: 'message', message: '...' } }
+    if (isObject(content) && typeof content.type === 'string' && isObject(content.data)) {


[Major] extractSpeakableFromContent now accepts any object whose data.type is 'message', but existing session status events use { type: 'event', data: { type: 'message', message } } via sendSessionEvent. That means abort/retry/failure status messages can be formatted as Claude Code: and can become the ready readback text instead of the actual assistant answer.

Suggested fix:

if (isObject(content) && content.type === 'codex' && isObject(content.data)) { const data = content.data if (data.type === 'message' && typeof data.message === 'string' && data.message.trim()) { return data.message.trim() } }

Fixed in 09e9b55. Narrowed to content.type === 'codex' — as the comment already said it should be. Added regression test: { type: 'event', data: { type: 'message', message: '...' } } must return null from formatMessage (was red before the narrowing, green after).

The helper matched any object with a string type and a data property, so sendSessionEvent({ type: 'message', message }) events (which arrive as { type: 'event', data: { type: 'message', message } }) were falsely formatted as speakable assistant text and could be selected as the ready readback. Narrow the Codex path to content.type === 'codex' as the comment already states. Adds regression test: session status event must return null from formatMessage. via [HAPI](https://hapi.run) Co-Authored-By: HAPI <noreply@hapi.run>

github-actions

Findings

None.

Questions

None.

Summary

Review mode: follow-up after new commits
No high-confidence issues found in the latest diff. The follow-up change narrows Codex message extraction to type: 'codex' and adds coverage for session status events not being spoken as assistant text. Residual risk: tests could not be run in this runner because bun is unavailable.

Testing

Not run (automation): bun test web/src/realtime/hooks/contextFormatters.test.ts failed with bun: command not found.

HAPI Bot

heavygee · 2026-05-25T09:12:36Z

Woohoo! It's a spectacular tool, so glad I could help 🙏

…

On Mon, 25 May 2026, 06:31 weishu, ***@***.***> wrote: Merged #682 <#682> into main. — Reply to this email directly, view it on GitHub <#682 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A7X33OEH2CCSBO4N2Q3QL2344PLEDAVCNFSM6AAAAACZLMXXG6VHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMRVHEYTGMRUG43TINY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

heavygee mentioned this pull request May 24, 2026

bug(voice): agent completion readback unreliable (all agents; worst on stream-json flavors) #681

Closed

test(web): use domain-neutral voice formatter fixtures

63c3732

Replace jellybot/subtitle dogfood strings in tests with generic examples. Co-authored-by: Cursor <cursoragent@cursor.com>

heavygee force-pushed the fix/voice-readback branch from cfbb3b7 to 63c3732 Compare May 24, 2026 20:40

heavygee marked this pull request as ready for review May 24, 2026 20:43

github-actions Bot reviewed May 24, 2026

View reviewed changes

heavygee mentioned this pull request May 24, 2026

fix(web): use session flavor label in voice context formatters heavygee/hapi#1

Draft

3 tasks

tiann merged commit 3258c52 into tiann:main May 25, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(web): embed agent text in voice ready event for readback#682

fix(web): embed agent text in voice ready event for readback#682
tiann merged 4 commits into
tiann:mainfrom
heavygee:fix/voice-readback

heavygee commented May 24, 2026 •

edited

Loading

Uh oh!

heavygee commented May 24, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot May 24, 2026

Uh oh!

heavygee May 24, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot May 24, 2026

Uh oh!

heavygee May 24, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

heavygee commented May 25, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

heavygee commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Issues

Notes

Uh oh!

heavygee commented May 24, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

heavygee May 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

heavygee May 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

heavygee commented May 25, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

heavygee commented May 24, 2026 •

edited

Loading