perf: add FrameType enum, O(1) dispatch table, and lazy Frame fields by viai957 · Pull Request #3847 · pipecat-ai/pipecat

viai957 · 2026-02-26T14:54:40Z

Summary

FrameType / FrameCategory enums (frame_types.py): integer type identifiers added as ClassVar[int] type_id on every concrete Frame subclass, enabling zero-cost type identification without isinstance chains.
O(1) dispatch tables in LLMUserAggregator and LLMAssistantAggregator: replaced sequential isinstance if/elif chains with a Dict[int, Callable] keyed on frame.type_id, eliminating N comparisons per frame in hot paths.
Lazy Frame.name and Frame.metadata: both fields are now computed/allocated only on first access, reducing per-frame allocation cost for the majority of frames that never touch these fields.
Optional UserIdleController: user_idle_timeout in LLMUserAggregatorParams defaults to None (was 0); the controller is only instantiated when explicitly configured, and a _IDLE_CONTROLLER_FRAME_TYPES frozenset guards the hot path to skip the controller call for irrelevant frame types.

Motivation

In high-throughput pipelines (e.g. real-time audio with many short frames), process_frame is called thousands of times per second. The original implementation checked frame types via long isinstance chains (O(N) per frame) and unconditionally allocated name/metadata dicts on every frame construction. These changes make the common path O(1) and allocation-free for frames that don't need those fields.

Changes

`src/pipecat/frames/frame_types.py` (new)

FrameCategory: 8-bit category prefixes (AUDIO, TEXT, IMAGE, CONTROL, SYSTEM, etc.)
FrameType: 16-bit type IDs composed as (category << 8) | subtype
Fast category-check helper functions (is_audio_frame, is_text_frame, etc.)

`src/pipecat/frames/frames.py`

Every concrete Frame subclass now carries a type_id: ClassVar[int] matching its FrameType constant
Frame.name → lazy property (backed by _name, allocated on first access)
Frame.metadata → lazy property (backed by _metadata, allocated on first access)
All existing fields preserved (id, pts, broadcast_sibling_id, transport_source, transport_destination)

`src/pipecat/processors/aggregators/llm_response_universal.py`

LLMUserAggregator._dispatch: Dict[int, Callable] lookup table replaces isinstance chain in process_frame
LLMAssistantAggregator._dispatch: same pattern
_IDLE_CONTROLLER_FRAME_TYPES frozenset guards UserIdleController calls
UserIdleController only instantiated when user_idle_timeout is set

`src/pipecat/processors/frame_processor.py`

_has_handlers() fast-path guards on push_frame and __process_frame event handler calls (avoids coroutine creation when no handlers registered)
PROCESS_TASK_CANCEL_TIMEOUT_SECS for __cancel_process_task() to prevent indefinite hangs
broadcast_frame_instance preserves lazy _metadata on copied frames

`src/pipecat/transports/base_output.py`

In-place del self._audio_buffer[:chunk_size] replaces slice copy

`src/pipecat/utils/base_object.py`

_has_handlers(event_name): non-async O(1) check for registered event handlers

Test plan

uv run pytest passes (406 passed, 0 failed)
uv run ruff check passes
uv run ruff format --check passes
Existing pipeline behaviour is unchanged (dispatch table covers all previously handled frame types)
All public API signatures preserved (push_interruption_task_frame_and_wait(timeout=), start_ttfb_metrics(start_time=), broadcast_sibling_id, write_transport_frame)

Notes

FrameCategory is re-exported from frames.py via # noqa: F401 for downstream consumers that import from pipecat.frames.frames.
The dispatch tables are built once during __init__ and never mutated, so there is no thread-safety concern.
Subclasses that override process_frame directly (not via the aggregators) are unaffected; type_id is purely additive.

- Add FrameType / FrameCategory integer enums (frame_types.py); every concrete Frame subclass now carries a type_id ClassVar for zero-cost type identification without isinstance chains. - Replace sequential isinstance dispatch in LLMUserAggregator and LLMAssistantAggregator with an O(1) dict lookup table keyed on frame.type_id, eliminating N comparisons per frame in hot paths. - Lazy-init Frame.name and Frame.metadata: both are now computed / allocated only on first access, reducing per-frame allocation cost. - Make UserIdleController optional (default None) in LLMUserAggregatorParams so idle-detection overhead is zero unless explicitly configured. - Add _IDLE_CONTROLLER_FRAME_TYPES frozenset to skip the idle controller call for the vast majority of frames.

RTVI framework accesses frame.broadcast_sibling_id on every frame (rtvi.py:1262). Removing it from Frame.__post_init__ would cause AttributeError for non-broadcast frames. Restore the field declaration and initialization.

kedar389 · 2026-02-26T15:51:45Z

I am curious what are the performance gains from these changes ?

viai957 · 2026-02-27T04:34:36Z

@kedar389
Ohh Ya, here are benchmarks from a MacBook Pro M4 (Python 3.12.10). The benchmark simulates the real LLMUserAggregator dispatch with 17 isinstance
branches and a weighted audio-heavy frame mix (60% audio, 15% text/LLM, 10% TTS events, 8% speaking events, 5% transcription, 2% control).

Frame construction (lazy vs eager)

	ns/op
Original (eager `name` str + `metadata` dict alloc)	337
Optimized (lazy, no alloc until access)	158	53% faster

Most frames never have .name or .metadata accessed they're created, dispatched, and discarded. The lazy path avoids the f"{cls.__name__}#{count}"
format string and the empty dict() allocation entirely.

Dispatch: isinstance chain vs dict lookup (17 branches)

Branch position	isinstance (ns)	dict (ns)	Speedup
1st (TranscriptionFrame)	32	45	0.72x (isinstance wins)
14th (AudioRawFrame)	182	44	4.1x
17th (EndFrame)	214	44	4.8x

isinstance is O(N) — it checks each branch sequentially. Dict lookup is O(1). For branch 1, isinstance is faster because it short-circuits immediately and
avoids the dict hash. But audio frames are ~60% of all frames in a voice pipeline, and they hit branch 14 of 17. That's where the win matters.

Weighted realistic workload (audio-heavy pipeline)

	ns/frame (avg)
isinstance dispatch	174
dict dispatch	49	3.5x faster

Combined per-frame cost

	Construct	Dispatch	Total
Original	337 ns	182 ns	520 ns
Optimized	158 ns	44 ns	202 ns	61% reduction

Throughput impact

Frame rate	Savings/sec	Fewer allocs/sec
100 fps	32 µs	200
500 fps	159 µs	1,000
1,000 fps	318 µs	2,000
5,000 fps	1,588 µs	10,000

The absolute microsecond savings are modest at typical frame rates, but the GC pressure reduction from eliminating 2 allocations per frame (name string +
metadata dict) compounds especially in long-running pipelines where reduced GC pauses matter for real-time audio latency.

What's NOT premature about this

process_frame is the single hottest function in the framework every frame in every pipeline passes through it
The isinstance chains in the aggregators are 17 branches long and growing with each new frame type
The type_id approach is additive existing code that uses isinstance still works, but hot paths can opt into the faster dispatch
The lazy init eliminates allocations that are provably unused (most frames never have .name or .metadata accessed)

Happy to add these benchmarks to the repo if useful, or adjust the approach.

kedar389 · 2026-02-28T16:16:08Z

Sorry, not to diminish your work. But how is this not a premature optimization?
Even if the optimizations have good relative numbers. What are the absolute numbers ? What are the absolute number in a pipeline it can save ?

On what pipeline has this been tested ? Do you know what is the average fps in audio or video pipelines ? Have you tried to benchmark it on real pipeline before and after and see how much ms you saved ?

Even if we would have 5000 fps (which is the worst case you showed) in a pipeline, it still would only save just 1,5 ms from the results you showed. Which is dwarfed by other components in the system like turn model or VAD. 1,5 ms gain for 800 lines of code, that seems kinda like premature optimization.

Also if dispatch is such problem, why not just move the audio frame which is hit much more often to be first in the is_instance comparison, it optimizes that problem in one line of code and you do not have to do the dict lookup .

I feel like this ratio is skewed(60% audio, 15% text/LLM, 10% TTS events, 8% speaking events, 5% transcription, 2% contro) There is probably much more of audio (and video) frames, because they flow non-stop compared to other frames.

viai957 · 2026-03-02T17:05:58Z

@kedar389, you raised valid points that deserved real data rather than theoretical arguments. I ran pipeline benchmarks on both branches and want to share honest results.

Benchmark Setup

Pushed 10,000 frames (80% TTS audio, 10% text, 5% transcription, 5% raw audio) through chains of PassthroughProcessors to isolate framework overhead. Each processor calls super().process_frame() + push_frame(),
exactly the hot path in a real pipeline. Python 3.12, Apple Silicon, median of 3 runs.

Results

Frame Creation (100k TTSAudioRawFrame)

Metric	main	this PR
Per-frame	3.19 µs	1.73 µs (1.84x faster)
Memory (100k)	34.6 MB	22.5 MB (35% less)

Lazy init defers name formatting + metadata dict until first access. In the hot audio path, neither is touched.

Pipeline Throughput (10k frames)

Pipeline Depth	main (fps)	PR (fps)	Delta
1 processor	66,020	69,540	+5%
3 processors	41,469	44,110	+6%
6 processors	24,760	29,200	+18%

At 6 processors (realistic voice pipeline), 18% higher throughput. The gains come mainly from two things:

_has_handlers() guard — push_frame() and __process_frame() make 4 calls to _call_event_handler per frame per processor, even when zero handlers are registered. Each call enters an async coroutine, checks
event_name not in self._event_handlers, returns. The sync _has_handlers() guard skips the coroutine entirely (2.4x faster for this check).
Lazy frame fields — Saves ~1.5µs per frame creation + 35% memory by not allocating name string and metadata dict upfront.

60-Second Conversation (6 proc, 50 fps)

At standard voice rates: ~0.06 ms/sec difference. You're right that at 50fps for a single agent, this is negligible compared to LLM/TTS latency.

Where it matters

The 18% throughput headroom helps in:

Multi-agent servers running hundreds of concurrent pipelines on the same machine
Video + audio pipelines (30fps video + 50fps audio = much higher frame rates)
Memory pressure : 35% less per-frame allocation means fewer GC pauses

Your instanceof reordering suggestion

Valid point. Moving AudioRawFrame to the top of isinstance chains in base_output.py would capture some of the dispatch gains with zero complexity. The handler guard and lazy init savings can't be achieved by
reordering though.

Proposal

I'm happy to slim this PR down to just the highest-impact, lowest-controversy changes:

_has_handlers() sync guard on all 4 event handler call sites (biggest win, ~10 lines)
Lazy name/metadata on Frame (35% memory savings, ~20 lines)
Drop the FrameType enum and dispatch tables (separate PR if there's future interest)

markbackman · 2026-03-02T17:30:19Z

Hi 👋

I'm a Pipecat maintainer. Before we do any optimization, it would be nice to understand if there is a performance issue. I've been doing a lot of latency measurements and I haven't found anything that jumps out at me as a problem.

If you have data that shows that there is slowness that needs to be optimized, please share. Until then, I don't think we're ready to make this change as it touches foundational level classes used all over Pipecat.

viai957 · 2026-03-03T09:28:31Z

@markbackman you're absolutely right that I should start by identifying a concrete performance issue.

Here's the context: I'm planning a deployment of ~3000 concurrent voice agents. When I attempted a load test with thousands of parallel pipelines in a single process, it failed badly. I
dug into the root cause and want to share findings.

Per-Pipeline Resource Cost (Measured)

Each pipeline with 6 processors (default config) allocates:

19 asyncio Tasks, 23 Queues, 18 Events
~132 KB Python heap (386 MB at 3000 pipelines)
1 ThreadPoolExecutor per MediaSender → 1 OS thread

At 3000 concurrent pipelines: 57,000 asyncio tasks, 3000+ OS threads, 300K task wakeups/sec.

Benchmark: Throughput Degradation

Concurrent Pipelines	Total FPS	Per-Pipeline FPS
5	30,833	6,167
100	29,758	298
500	27,230	54.5

At 500 concurrent pipelines, per-pipeline throughput is barely above the 50fps audio floor. The event loop starts saturating.

Root Causes Found

ThreadPoolExecutor per MediaSender — at scale, OS thread limit (ulimit ~4096) is hit around pipeline 4000, crashing the process
Signal handler overwriting — loop.add_signal_handler replaces previous handlers. Only the last PipelineRunner handles SIGINT, making graceful shutdown of N pipelines impossible
Unbounded queues — no backpressure. A 2-second LLM spike queues hundreds of frames per pipeline
Global threading.Lock for obj_id() — 150K acquisitions/sec at scale (13 ms/sec)

Revised Proposal

I realize my original PR was addressing the wrong layer. The real improvements for large-scale deployment are:

Shared ThreadPoolExecutor instead of per-pipeline (eliminates OS thread exhaustion)
Fix signal handler overwriting (correctness bug for multi-runner scenarios)
Bounded queues with backpressure
_has_handlers() guard on event handler calls (eliminates unnecessary coroutine overhead at 4 call sites per frame per processor)

I'm happy to close this PR and open targeted issues/PRs for the above. Or if you'd prefer a single focused PR for just the handler guards and signal handler fix (low-risk, high-impact), I can slim this down.

markbackman · 2026-03-03T13:43:48Z

I'm planning a deployment of ~3000 concurrent voice agents. When I attempted a load test with thousands of parallel pipelines in a single process, it failed badly.

I see; that's your problem. Real-time agents require real-time communication which needs a different type of deployment and fixed resources. We recommend a bot per process where each bot is allocated 0.5 vCPU and 1GB of RAM.

For our Pipecat Cloud product, we isolate agents in a single Python process to ensure they have sufficient resources allocated. Agents scale out successfully without resource issues. I think we should close this PR and it's worth looking at your deployment approach to solve the problem.

Happy to chat more about this in Discord if you have questions.

viai957 added 3 commits February 26, 2026 20:16

fix: restore broadcast_sibling_id on base Frame

7e9546e

RTVI framework accesses frame.broadcast_sibling_id on every frame (rtvi.py:1262). Removing it from Frame.__post_init__ would cause AttributeError for non-broadcast frames. Restore the field declaration and initialization.

viai957 marked this pull request as ready for review February 26, 2026 15:11

markbackman closed this Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: add FrameType enum, O(1) dispatch table, and lazy Frame fields#3847

perf: add FrameType enum, O(1) dispatch table, and lazy Frame fields#3847
viai957 wants to merge 3 commits intopipecat-ai:mainfrom
viai957:perf/frame-type-ids-lazy-init

viai957 commented Feb 26, 2026 •

edited

Loading

Uh oh!

kedar389 commented Feb 26, 2026 •

edited

Loading

Uh oh!

viai957 commented Feb 27, 2026

Uh oh!

kedar389 commented Feb 28, 2026

Uh oh!

viai957 commented Mar 2, 2026 •

edited

Loading

Uh oh!

markbackman commented Mar 2, 2026

Uh oh!

viai957 commented Mar 3, 2026 •

edited

Loading

Uh oh!

markbackman commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

viai957 commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

src/pipecat/frames/frame_types.py (new)

src/pipecat/frames/frames.py

src/pipecat/processors/aggregators/llm_response_universal.py

src/pipecat/processors/frame_processor.py

src/pipecat/transports/base_output.py

src/pipecat/utils/base_object.py

Test plan

Notes

Uh oh!

kedar389 commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viai957 commented Feb 27, 2026

Frame construction (lazy vs eager)

Dispatch: isinstance chain vs dict lookup (17 branches)

Weighted realistic workload (audio-heavy pipeline)

Combined per-frame cost

Throughput impact

What's NOT premature about this

Uh oh!

kedar389 commented Feb 28, 2026

Uh oh!

viai957 commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Setup

Results

Frame Creation (100k TTSAudioRawFrame)

Pipeline Throughput (10k frames)

60-Second Conversation (6 proc, 50 fps)

Where it matters

Your instanceof reordering suggestion

Proposal

Uh oh!

markbackman commented Mar 2, 2026

Uh oh!

viai957 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Per-Pipeline Resource Cost (Measured)

Benchmark: Throughput Degradation

Root Causes Found

Revised Proposal

Uh oh!

markbackman commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

viai957 commented Feb 26, 2026 •

edited

Loading

`src/pipecat/frames/frame_types.py` (new)

`src/pipecat/frames/frames.py`

`src/pipecat/processors/aggregators/llm_response_universal.py`

`src/pipecat/processors/frame_processor.py`

`src/pipecat/transports/base_output.py`

`src/pipecat/utils/base_object.py`

kedar389 commented Feb 26, 2026 •

edited

Loading

viai957 commented Mar 2, 2026 •

edited

Loading

viai957 commented Mar 3, 2026 •

edited

Loading