Skip to content

Add live transcription WebSocket client with microphone support#20

Merged
arnavmehta7 merged 6 commits into
mainfrom
feat/live-transcription-mic
May 26, 2026
Merged

Add live transcription WebSocket client with microphone support#20
arnavmehta7 merged 6 commits into
mainfrom
feat/live-transcription-mic

Conversation

@arnavmehta7

Copy link
Copy Markdown
Collaborator

Summary

Adds camb.live_transcription — a typed, async WebSocket client wrapping the /apis/transcription/listen endpoint. End-to-end verified against the dev backend.

What ships

from camb.client import CambAI
from camb.live_transcription import Microphone, ServerMessageType

client = CambAI(api_key=...)
session = await client.live_transcription.connect(
    model="boli-v5",
    language="en-us",
    sample_rate=16000,
)

@session.on(ServerMessageType.RESULTS)
def _(msg):
    print(msg.transcript)   # cumulative interim — replace, don't concat

async with session:
    await session.stream_audio(Microphone(sample_rate=16000, chunk_size=1600))
  • Typed eventsReady, Results, Final (reserved), Error, Closed. All defined via a single ServerMessageType enum + Pydantic models + PARSER_REGISTRY. Adding a new event = three edits in one file.
  • Microphone helpersounddevice-based capture, now a regular dependency.
  • File sourceFileAudioSource paces a WAV at real-time for tests / CI.
  • Wildcard handlersession.on_any catches event types the SDK hasn't been updated for yet, so apps stay forward-compatible.
  • Custom transport — pluggable Transport protocol so tests can inject mocks; default uses websockets.
  • Error hierarchyLiveTranscriptionError + …ConnectError / …ProtocolError subclasses.

Bugs found and fixed during integration testing

  1. WebSocket race — server's Ready was being dropped when the handshake completed before the read loop attached.
  2. waitUntilReady hang on auth failure — close before Ready left awaiters hanging; now wakes the event.
  3. Double-open via async withconnect() already opens the session, so async with session: re-ran _open(), spawning a second reader task. Symptom: every event fired twice and the session wedged after a few frames. _open() is now idempotent.

Files

camb/live_transcription/
├── __init__.py          # public exports
├── client.py            # LiveTranscriptionClient (camb.live_transcription)
├── session.py           # LiveTranscriptionSession + module-level connect()
├── transport.py         # Transport protocol + WebsocketsTransport
├── events.py            # ServerMessageType + Pydantic models + PARSER_REGISTRY
├── options.py           # Encoding + ConnectOptions
├── audio_source.py      # AudioSource protocol + FileAudioSource
├── microphone.py        # Microphone (sounddevice)
└── errors.py            # LiveTranscriptionError hierarchy

examples/
├── live_transcription_microphone.py   # mic → session
└── live_transcription_file.py         # WAV → session at real-time pace

Test plan

  • Smoke test of imports + model parsing in py3125 conda env.
  • End-to-end: stream WAV → dev endpoint → received Ready + 9 cumulative Results + clean Closed(1000). Final transcript: " Welcome to our service. We're glad to have".
  • Negative path: invalid key → Closed(1008, "Invalid/Expired API Key") surfaced cleanly, no hang.
  • Post-fix verification: Ready fires exactly once, no late wedge under async with.
  • Reviewer: run examples/live_transcription_file.py public_docs/audio/demo-accent-en-us.wav against staging once the route is healthy.

Related

  • Docs PR: Camb-ai/public_docs#117
  • Companion TS SDK PR: Camb-ai/cambai-typescript-sdk (filed in this same batch)

🤖 Generated with Claude Code

Arnav Mehta and others added 6 commits May 26, 2026 04:10
Wraps /apis/transcription/listen behind an async session with a typed
event dispatcher (Ready / Results / Final / Error / Closed). New events
are added in three edits: enum member, payload model, registry entry.

Microphone helper backed by sounddevice as an optional extra:
pip install 'camb-sdk[microphone]'.
Without this, an auth failure (server emits Closed(1008) immediately)
left wait_until_ready() hanging forever. Set the ready event in
_emit_close so awaiters wake up and observe the Closed event with the
real reason.
- examples/live_transcription_file.py streams any local WAV to the
  WebSocket at real-time pace (no audio device required).
- README now lists Live Transcription in the feature bullets, the new
  optional [microphone] extra under Installation, an end-to-end example
  block (Example 5), and a pointer to the file example for environments
  without a mic.
The [microphone] extra was a single line; gating sounddevice behind it
meant users got a runtime error on the first Microphone() instead of an
install-time failure. Now sounddevice is a regular dep:

- Drop the [project.optional-dependencies] block in pyproject.toml.
- Add sounddevice>=0.4.6 to dependencies and requirements.txt.
- Microphone.__init__ no longer guards the import or raises
  MicrophoneUnavailableError; the import is top-level.
- Remove MicrophoneUnavailableError from errors.py and __init__.py
  exports. (Hard dep makes the error unreachable.)
- Remove the [microphone] install hint from README and example
  docstrings.
The resource client's connect() already opens the session before
returning it; users then wrap the returned session in
'async with session:' for cleanup. Without an idempotency guard,
__aenter__ re-ran _open, spawning a second reader task on the same
WebSocket.

Symptoms this fixes:
- Every event (Ready, Results, Closed) was fanned out twice — the Ready
  log line printed twice on connect.
- After ~a few seconds of streaming, the dual consumers raced on the
  underlying websockets async iterator and the session wedged silently.

Now _open early-returns if the reader task is already running, so the
standard 'connect() + async with' pattern in the examples no longer
double-opens.
@arnavmehta7 arnavmehta7 merged commit fd88096 into main May 26, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant