You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PraisonAI exposes GET /health on the gateway (default http://127.0.0.1:8765/health). Operators use it to verify the workforce is running. Today it reports each channel as "running": true based on a simple boolean flag — with no last error, no Telegram conflict detection, and no distinction between "polling successfully" and " stuck in retry loop after 409 Conflict".
When two gateway processes poll the same Telegram bot token, one wins and the other fails with a 409 Conflict from Telegram's getUpdates API. The loser may retry silently forever. Health still shows green.
This was observed in a Hermes workforce deployment: CFO bot stopped replying after a duplicate gateway restart; /health reported all channels healthy.
All channels show "running": true even when Telegram polling is broken.
Log output ( buried — operator may not see it)
When a second process steals the Telegram poll:
ERROR Bot 'telegram_cfo' crashed: Conflict: terminated by other getUpdates request;
make sure that only one bot instance is running
INFO Reconnecting 'telegram_cfo' in 5s...
Or from python-telegram-bot:
telegram.error.Conflict: Conflict: terminated by other getUpdates request
Gateway retries up to 5 times with exponential backoff, then gives up — but /health never reflects this.
Operator experience
User messages @mervincfo_bot — no reply.
Operator curls /health — everything looks fine.
Operator assumes OpenAI/API issue, burns time debugging wrong layer.
Actual fix: kill duplicate python start_gateway.py processes.
Architecture — health vs reality gap
flowchart TB
subgraph HealthEndpoint["GET /health"]
H1["For each channel bot"]
H2["running = bot.is_running"]
H3["Return JSON"]
H1 --> H2 --> H3
end
subgraph BotRuntime["Channel bot runtime"]
R1["_run_bot_safe() retry loop"]
R2["_start_telegram_bot_polling()"]
R3["PTB updater.start_polling()"]
R1 --> R2 --> R3
end
subgraph TelegramAPI["Telegram Bot API"]
T1["getUpdates long poll"]
T2["409 Conflict if two pollers same token"]
end
R3 --> T1
T1 --> T2
T2 -.->|"error logged only"| R1
H2 -.->|"does not read last_error"| H3
style H2 fill:#FFB6C1
style T2 fill:#FF6347
Loading
The gap: Runtime captures exceptions in _run_bot_safe and logs them. Health reads bot.is_running — which may stay True briefly or not reflect polling failure at all. Last error is discarded.
Telegram 409 Conflict explained
Telegram allows exactly one active getUpdates connection per bot token. If Process A and Process B both call getUpdates:
One receives updates normally.
The other gets HTTP 409 Conflict with message: "terminated by other getUpdates request".
python-telegram-bot surfaces this as telegram.error.Conflict.
This is the most common cause of "bot started but silent" in multi-process or duplicate-gateway scenarios — more common than API quota or code bugs.
[Enhancement]
/healthshould expose Telegram polling conflicts (409) and last bot errorLabels:
enhancement,gateway,observability,telegramOverview
PraisonAI exposes
GET /healthon the gateway (defaulthttp://127.0.0.1:8765/health). Operators use it to verify the workforce is running. Today it reports each channel as"running": truebased on a simple boolean flag — with no last error, no Telegram conflict detection, and no distinction between "polling successfully" and " stuck in retry loop after 409 Conflict".When two gateway processes poll the same Telegram bot token, one wins and the other fails with a 409 Conflict from Telegram's
getUpdatesAPI. The loser may retry silently forever. Health still shows green.This was observed in a Hermes workforce deployment: CFO bot stopped replying after a duplicate gateway restart;
/healthreported all channels healthy.What the user sees
Health endpoint today (misleading)
{ "status": "healthy", "uptime": 64.76, "agents": 3, "sessions": 0, "clients": 0, "channels": { "telegram_cfo": { "platform": "telegram", "running": true }, "telegram_ops": { "platform": "telegram", "running": true }, "telegram_content": { "platform": "telegram", "running": true } } }All channels show
"running": trueeven when Telegram polling is broken.Log output ( buried — operator may not see it)
When a second process steals the Telegram poll:
Or from python-telegram-bot:
Gateway retries up to 5 times with exponential backoff, then gives up — but
/healthnever reflects this.Operator experience
@mervincfo_bot— no reply./health— everything looks fine.python start_gateway.pyprocesses.Architecture — health vs reality gap
flowchart TB subgraph HealthEndpoint["GET /health"] H1["For each channel bot"] H2["running = bot.is_running"] H3["Return JSON"] H1 --> H2 --> H3 end subgraph BotRuntime["Channel bot runtime"] R1["_run_bot_safe() retry loop"] R2["_start_telegram_bot_polling()"] R3["PTB updater.start_polling()"] R1 --> R2 --> R3 end subgraph TelegramAPI["Telegram Bot API"] T1["getUpdates long poll"] T2["409 Conflict if two pollers same token"] end R3 --> T1 T1 --> T2 T2 -.->|"error logged only"| R1 H2 -.->|"does not read last_error"| H3 style H2 fill:#FFB6C1 style T2 fill:#FF6347The gap: Runtime captures exceptions in
_run_bot_safeand logs them. Health readsbot.is_running— which may stayTruebriefly or not reflect polling failure at all. Last error is discarded.Telegram 409 Conflict explained
Telegram allows exactly one active
getUpdatesconnection per bot token. If Process A and Process B both callgetUpdates:telegram.error.Conflict.This is the most common cause of "bot started but silent" in multi-process or duplicate-gateway scenarios — more common than API quota or code bugs.
Proposed health response shape
{ "status": "degraded", "channels": { "telegram_cfo": { "platform": "telegram", "running": false, "last_error": "telegram_conflict: terminated by other getUpdates request", "last_error_at": "2026-05-26T06:10:06Z", "retry_count": 3 } } }Gateway overall
statusshould be"degraded"when any channel has a fatal polling error, not"healthy".Proposed fix
1. Track last error on each channel bot
Add to bot instance or gateway channel registry:
last_error: str | nulllast_error_at: ISO timestampretry_count: intUpdate in
_run_bot_safeexception handler.2. Parse Telegram-specific failures
Detect
telegram.error.Conflictor message substring"terminated by other getUpdates"→ seterror_code: "telegram_conflict"with operator-facing hint:3. Surface in CLI
praisonai gateway statusshould print channel errors, not just up/down.4. Doctor check
praisonai doctor gateway:telegram_conflictin last 5 minutes?Acceptance criteria
/healthincludeslast_errorandlast_error_atper channel when polling failsrunning: falseanderror_code: telegram_conflictstatusisdegradedwhen any channel is failedpraisonai gateway statusdisplays channel errors in human-readable form