You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[BUG] copy.deepcopy(agent) fails on second gateway channel — Agent RLock not picklable
Labels:bug, gateway, telegram, multi-channel
Overview
PraisonAI's gateway is designed to run multiple messaging channels from a single process — for example, three Telegram bots (CFO, Ops, Content) each routed to a different agent. This is the standard Hermes-style workforce pattern: one VPS, one gateway, many bots.
When the gateway starts the second channel bot, startup fails with a Python pickling error. The first channel may initialize successfully; every subsequent channel is skipped or crashes during _create_bot(). Multi-bot deployments therefore require a userland monkey-patch today.
What the user sees
Console / log output
When starting a gateway with 2+ Telegram channels (each bound to its own agent), the gateway logs something like:
Failed to create bot for 'telegram_ops': cannot pickle '_thread.RLock' object
Or, depending on Python version and call stack:
TypeError: cannot pickle '_thread.RLock' object
In a three-channel workforce setup (telegram_cfo, telegram_ops, telegram_content), typical startup looks like:
Channel 'telegram_cfo' (telegram) initialized
Failed to create bot for 'telegram_ops': cannot pickle '_thread.RLock' object
Failed to create bot for 'telegram_content': cannot pickle '_thread.RLock' object
Started 1 channel bot(s)
The operator believes all three bots are live. Only one actually polls Telegram.
Workaround in the wild
Deployments that need multi-channel gateway today patch WebSocketGateway._create_bot to skipcopy.deepcopy(agent) and pass each channel its pre-built dedicated agent instance instead. That unblocks startup but should not be required for a supported multi-channel path.
Architecture — how multi-channel gateway is supposed to work
Intended design: Each channel gets its own agent instance so channel-specific settings (tools, memory, session) do not leak between bots. The gateway achieves isolation by calling copy.deepcopy(agent) inside _create_bot() before wrapping the agent in a TelegramBot.
What breaks: Modern Agent objects hold a threading.RLock (used for cache/thread safety). RLocks are not picklable and not deep-copyable. The moment the gateway tries to clone the second agent, Python raises TypeError: cannot pickle '_thread.RLock' object.
Step-by-step failure sequence
Operator defines gateway.yaml with multiple channels and multiple agents (Hermes workforce pattern).
Gateway loads all agents into self._agents.
Gateway iterates channels and calls _create_bot() for each.
Inside _create_bot():
importcopyagent=copy.deepcopy(agent) # ← fails here on 2nd+ channel
Agent.__deepcopy__ or the default deepcopy walks the object graph and hits self.__cache_lock = threading.RLock().
Python cannot serialize RLock → exception → channel skipped → "Started N channel bot(s)" with N < expected.
Why this matters
Impact
Detail
Blocks Hermes parity
Hermes runs multiple Telegram bots on one gateway. PraisonAI cannot do this out of the box.
Silent partial failure
Gateway may report "Started 1 channel bot(s)" while 2 channels failed — easy to miss in logs.
Forces unsafe workarounds
Monkey-patching _create_bot bypasses intended isolation; operators may share agent state unintentionally.
Affects any multi-channel setup
Not Telegram-specific — any second channel that deep-copies an Agent with memory/tools enabled will hit this.
Root cause (technical)
The gateway assumes Agent is deep-copy safe. It is not, because:
copy.deepcopy() uses the pickling protocol internally for many object types.
threading.RLock is a OS-level synchronization primitive with no meaningful duplicate — Python refuses to copy it.
The first channel often succeeds because it deep-copies the first agent before any concurrent access complicates state — or because the failure is deterministic on the second call regardless. Either way, multi-channel is broken.
Expected behavior
Gateway with 3 Telegram channels and 3 agents starts all 3 bots without error.
Each channel receives an isolated agent instance (no shared mutable session/tools state).
No userland monkey-patch required.
Proposed fix
Option A — Agent.clone_for_channel() (recommended)
Add a first-class SDK method that produces a channel-safe clone:
Re-create a fresh RLock instead of copying the old one.
Reset or fork session store handle per channel.
Gateway calls agent.clone_for_channel() instead of copy.deepcopy(agent).
Option B — Skip deepcopy when agents are already dedicated
If gateway.yaml maps each channel to a distinct agent ID (no sharing), skip clone entirely and document that shared-agent multi-channel requires explicit clone support.
Tests
Unit test: gateway config with 2+ channels → all _channel_bots populated.
Regression test: assert no copy.deepcopy(agent) on paths where RLock exists.
[BUG]
copy.deepcopy(agent)fails on second gateway channel — Agent RLock not picklableLabels:
bug,gateway,telegram,multi-channelOverview
PraisonAI's gateway is designed to run multiple messaging channels from a single process — for example, three Telegram bots (CFO, Ops, Content) each routed to a different agent. This is the standard Hermes-style workforce pattern: one VPS, one gateway, many bots.
When the gateway starts the second channel bot, startup fails with a Python pickling error. The first channel may initialize successfully; every subsequent channel is skipped or crashes during
_create_bot(). Multi-bot deployments therefore require a userland monkey-patch today.What the user sees
Console / log output
When starting a gateway with 2+ Telegram channels (each bound to its own agent), the gateway logs something like:
Or, depending on Python version and call stack:
In a three-channel workforce setup (
telegram_cfo,telegram_ops,telegram_content), typical startup looks like:The operator believes all three bots are live. Only one actually polls Telegram.
Workaround in the wild
Deployments that need multi-channel gateway today patch
WebSocketGateway._create_botto skipcopy.deepcopy(agent)and pass each channel its pre-built dedicated agent instance instead. That unblocks startup but should not be required for a supported multi-channel path.Architecture — how multi-channel gateway is supposed to work
flowchart TB subgraph Gateway["WebSocketGateway (single process, port 8765)"] YAML["gateway.yaml"] Agents["agents: { cfo, ops, content }"] Channels["channels: { telegram_cfo, telegram_ops, telegram_content }"] end YAML --> Agents YAML --> Channels Channels --> C1["_create_bot(telegram_cfo)"] Channels --> C2["_create_bot(telegram_ops)"] Channels --> C3["_create_bot(telegram_content)"] C1 --> B1["TelegramBot → @cfo_bot → cfo agent"] C2 --> B2["TelegramBot → @ops_bot → ops agent"] C3 --> B3["TelegramBot → @content_bot → content agent"] Agents --> A1["Agent(cfo)"] Agents --> A2["Agent(ops)"] Agents --> A3["Agent(content)"] C1 -.->|"deepcopy(agent)"| A1 C2 -.->|"deepcopy(agent) 💥 RLock"| A2 C3 -.->|"deepcopy(agent) 💥 RLock"| A3Intended design: Each channel gets its own agent instance so channel-specific settings (tools, memory, session) do not leak between bots. The gateway achieves isolation by calling
copy.deepcopy(agent)inside_create_bot()before wrapping the agent in aTelegramBot.What breaks: Modern
Agentobjects hold athreading.RLock(used for cache/thread safety). RLocks are not picklable and not deep-copyable. The moment the gateway tries to clone the second agent, Python raisesTypeError: cannot pickle '_thread.RLock' object.Step-by-step failure sequence
gateway.yamlwith multiplechannelsand multipleagents(Hermes workforce pattern).self._agents._create_bot()for each._create_bot():Agent.__deepcopy__or the default deepcopy walks the object graph and hitsself.__cache_lock = threading.RLock()."Started N channel bot(s)"with N < expected.Why this matters
_create_botbypasses intended isolation; operators may share agent state unintentionally.Root cause (technical)
The gateway assumes
Agentis deep-copy safe. It is not, because:praisonaiagents.agent.agent.Agentinitializesself.__cache_lock = threading.RLock()eagerly (thread-safe cache access).copy.deepcopy()uses the pickling protocol internally for many object types.threading.RLockis a OS-level synchronization primitive with no meaningful duplicate — Python refuses to copy it.The first channel often succeeds because it deep-copies the first agent before any concurrent access complicates state — or because the failure is deterministic on the second call regardless. Either way, multi-channel is broken.
Expected behavior
Proposed fix
Option A —
Agent.clone_for_channel()(recommended)Add a first-class SDK method that produces a channel-safe clone:
agent.clone_for_channel()instead ofcopy.deepcopy(agent).Option B — Skip deepcopy when agents are already dedicated
If
gateway.yamlmaps each channel to a distinct agent ID (no sharing), skip clone entirely and document that shared-agent multi-channel requires explicit clone support.Tests
_channel_botspopulated.copy.deepcopy(agent)on paths where RLock exists.Related issues
allowed_usersconfig plumbing; unrelated to deepcopy.Acceptance criteria
cannot pickle '_thread.RLock' object_create_botwithout monkey-patch