Skip to content

fix(session): close zombie active session when tmux backing session i…#98

Merged
deepcoldy merged 3 commits into
deepcoldy:masterfrom
Lotu527:fix/zombie-session-missing-tmux
Jun 6, 2026
Merged

fix(session): close zombie active session when tmux backing session i…#98
deepcoldy merged 3 commits into
deepcoldy:masterfrom
Lotu527:fix/zombie-session-missing-tmux

Conversation

@Lotu527
Copy link
Copy Markdown
Contributor

@Lotu527 Lotu527 commented Jun 3, 2026

…s missing on restore

When restoreActiveSessions() iterates persistent-backend sessions and finds the backing tmux/zellij/herdr session is gone, it previously just continued, leaving the DaemonSession registered as active forever. Any incoming message for that chat would match the orphaned entry and be silently dropped (no worker to handle it, no error surfaced to the user).

Fix: call closeSession(sessionId) before continuing so the session is removed from both the runtime activeSessions Map and the sessionStore (status → closed), and a warn log is emitted to make the cleanup visible.

@Lotu527 Lotu527 requested a review from deepcoldy as a code owner June 3, 2026 06:07
@deepcoldy
Copy link
Copy Markdown
Owner

@Lotu527 你好 👋 PR #98 已通过双评审(reviewer + Codex merge-gate ✅)。评审中 gate 发现并修复了 2 个阻断项,修复以 2 个 commit 叠在你的 f73d8f0 之上。团队计划带这两个修复一起合并,想请你先评估确认一下(尤其 tri-state 的语义取舍),有异议随时提:

① restore 探测假阴性会误关活会话 → 升级 tri-state(核心)
原 PR 把「backing session 探测不存在」从旧的非破坏 continue 升级成破坏性 closeSession。但三个后端的 hasSession() 把「探测命令失败/超时」和「确实不存在」都折叠成同一个 false。daemon 重启时若 herdr server 慢启动 / zellij·tmux 探测瞬时异常,一个还活着的会话会被永久关闭(删 active 索引 + store 标 closed),活 pane 泄漏、下条消息走 auto-create 丢上下文,且 store 已 closed 不再懒恢复;restore 遍历全部 active session,一次瞬时失败可成片误关。
修复:探测升级为 exists | missing | unknown。仅「命令成功且确认不存在」=missing 才 close;失败/超时/解析异常=unknown → warn 后保留 active 记录走懒恢复(等同旧 continue);exists → auto-fork。三后端补 probeSession()hasSession() 保留为 probeSession()==='exists' 薄封装(对现有 boolean 调用方零行为变化)。

② tmux probe 用 shell execSync 会把命令不可用误判 missing
tmux has-session 经 shell 字符串执行时,tmux 不在 PATH / 不可执行会以 clean 退出码 127/126 返回,被判别成 missing → 触发破坏性 close。改用 execFileSync('tmux', ['has-session','-t',name], …) 直跑二进制:ENOENT/EACCES → unknown;真不存在=clean exit 1 → missing。

测试:新增 test/restore-zombie-close.test.ts(restore 三态决策断言)、test/tmux-probe.test.ts(真实 tmux probe 分类)、herdr probe 四态用例。
验证:合并到当前 master(含 #99)干净,tsc + build 通过,全量单测 3493 passed(仅 card-integration Scenario4 为 master 既有红,与本改无关)。

如需逐行 diff,我可以把 review 分支推上来给你对比。麻烦评估,无异议我们就按这套合并。🙏

gaozhikun and others added 3 commits June 6, 2026 23:40
…s missing on restore

When restoreActiveSessions() iterates persistent-backend sessions and finds
the backing tmux/zellij/herdr session is gone, it previously just `continue`d,
leaving the DaemonSession registered as active forever. Any incoming message for
that chat would match the orphaned entry and be silently dropped (no worker to
handle it, no error surfaced to the user).

Fix: call closeSession(sessionId) before continuing so the session is removed
from both the runtime activeSessions Map and the sessionStore (status → closed),
and a warn log is emitted to make the cleanup visible.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PR deepcoldy#98 把 restore 阶段「backing session 不存在」从非破坏的 continue 升级成
破坏性 closeSession,但三个后端的 hasSession() 把「探测命令失败/超时」和
「确实不存在」都折叠成同一个 false。daemon 重启时 herdr server 慢启动 /
zellij·tmux 探测瞬时异常会让一个还活着的会话被永久关闭(删 active 索引 +
store 标 closed),活 pane 泄漏、下条消息走 auto-create 丢上下文,且 store
已 closed 不再懒恢复。restore 循环遍历全部 active session,一次瞬时失败可
成片误关。

改动:
- 新增 SessionProbe = 'exists' | 'missing' | 'unknown'(backend/types.ts)。
- 三后端补 probeSession():仅「命令成功且确认不存在」=missing;失败/超时/
  解析异常=unknown。hasSession() 改为 probeSession()==='exists' 的薄封装,
  对所有现有 boolean 调用方行为不变(unknown/missing→false,exists→true)。
- restoreActiveSessions:missing→closeSession(真僵尸);unknown→warn 保留
  active 记录走懒恢复(等同旧 continue,恢复窗口不再被提前关死);exists→
  auto-fork 重连。
- ensureTerminalWorkerPort 非破坏读路径保持原语义:仅 exists 才唤醒 worker。

测试:
- herdr-backend:probeSession 的 exists / missing / present-but-not-running→
  missing / list 失败超时→unknown 四态,并验证 hasSession 在 unknown 仍为 false。
- restore-zombie-close(新):missing→closeSession+Map 移除+store closed+不 fork;
  unknown→不 close+Map 保留+不 fork;exists→fork+不 close。
Codex 复 gate 抓到:原 tmux probeSession 用 shell 字符串 execSync,当 tmux
不在 PATH / 不可执行时,shell 以 clean 退出码 127(command not found)/126
(not executable) 返回,被 `typeof e.status==='number' && !e.signal` 误判成
missing → restore 走破坏性 closeSession,正是 tri-state 要避免的「探测失败驱动
永久 close」。daemon 重启时 backend=tmux、历史 session 仍在 tmux server 里、
但新 daemon 运行环境暂时找不到 tmux 即可触发。

改用 execFileSync('tmux', ['has-session','-t',name], …) 直接执行二进制:
binary 缺失=ENOENT、不可执行=EACCES(均无数字 status)→ unknown;session 真
不存在=clean exit 1 → missing;超时=signal → unknown;存在=exit 0 → exists。
判别逻辑不变(Node 复现确认 execFileSync 不会混入 shell 的 126/127)。

补 test/tmux-probe.test.ts 直测真实 TmuxBackend.probeSession 的 command-failure
分类:absent→missing;command-not-found(ENOENT)/not-executable(EACCES)→unknown;
timeout(signal)→unknown;exit0→exists;并验证 hasSession 在 unknown 仍为 false。
(补上 restore-zombie-close 因 mock probeSession 而未覆盖的 tmux 真实分类缺口。)
@deepcoldy deepcoldy force-pushed the fix/zombie-session-missing-tmux branch from f73d8f0 to d7a88fc Compare June 6, 2026 15:43
@deepcoldy deepcoldy merged commit 8a0b0d4 into deepcoldy:master Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants