Summary
When routing picks a winner from another vendor (switch-harness, or any model the source harness can't run natively), watchmen today only writes an advisory sentence — it makes the decision but never executes it (route_rewrite.py:129 cross_runtime guard, PR #92). This issue tracks turning that advice into actual cross-vendor delegation: run the chosen model, return the result — invisibly and safely.
Today
- In-vendor routing writes a runnable artifact (CC subagent
model:, Codex profile model =) and the host self-delegates.
- Cross-vendor falls back to text: "For the
X skill, run it on Codex — Claude Code can't run gpt-5.5 natively." (_advisory_sentence, route_rewrite.py:190). The hands stop at a sentence.
Proposed design — a DelegationExecutor seam
One interface, pluggable backends: (task_context, target={harness, model}) -> result + cost + signals.
- Headless backend (build first; no AX dependency). A router subagent / dispatch block shells
codex exec --json or claude -p --output-format stream-json as a worker and returns structured output. The AX spike already proved the cross-vendor handoff (CC↔Codex with structured history) works using exactly these CLIs — no external runtime required.
- AX backend (future / optional). Same interface, dispatch via AX. Adopt only when AX clears a concrete bar: a real
HarnessService protocol (currently a literal "Hello world" placeholder), controller2 dispatch-to-named-agent + resume (both TODO today), observable trace (it currently drops tool calls), and detached execution. AX is also still entirely cost-agnostic, and upstream contributions are paused — so it's a backend we plug in later, not a dependency. Keep the ax_dispatch.py experiment as the stub behind this interface.
Guardrails (must-have before any auto-execution)
- Consent + allowlist (per project). Silently sending proprietary code to a "cheaper" third-party vendor is a data-governance problem. Cross-vendor delegation must be explicit opt-in per project.
- Budget caps. Token + delegation-depth limits (cf. opencode's
task_budget / depth caps) to prevent runaway cost.
- Execution-grounded validation. Prefer signals like tests-pass / diff-applies over judge opinion when accepting delegated work.
Note on AX's real fit
AX's strongest near-term value for us is not live cross-vendor delegation — it's the offline fork-and-race compare harness (same starting state → two divergent trajectories, both captured), which is the natural substrate for watchmen's blind-compare. Worth tracking separately; it still needs the trace fix to harvest tool-call signals.
Dependencies
Acceptance (phase 1, headless)
Summary
When routing picks a winner from another vendor (
switch-harness, or any model the source harness can't run natively), watchmen today only writes an advisory sentence — it makes the decision but never executes it (route_rewrite.py:129cross_runtimeguard, PR #92). This issue tracks turning that advice into actual cross-vendor delegation: run the chosen model, return the result — invisibly and safely.Today
model:, Codex profilemodel =) and the host self-delegates.Xskill, run it on Codex — Claude Code can't rungpt-5.5natively." (_advisory_sentence,route_rewrite.py:190). The hands stop at a sentence.Proposed design — a
DelegationExecutorseamOne interface, pluggable backends:
(task_context, target={harness, model}) -> result + cost + signals.codex exec --jsonorclaude -p --output-format stream-jsonas a worker and returns structured output. The AX spike already proved the cross-vendor handoff (CC↔Codex with structured history) works using exactly these CLIs — no external runtime required.HarnessServiceprotocol (currently a literal"Hello world"placeholder),controller2dispatch-to-named-agent + resume (bothTODOtoday), observable trace (it currently drops tool calls), and detached execution. AX is also still entirely cost-agnostic, and upstream contributions are paused — so it's a backend we plug in later, not a dependency. Keep theax_dispatch.pyexperiment as the stub behind this interface.Guardrails (must-have before any auto-execution)
task_budget/ depth caps) to prevent runaway cost.Note on AX's real fit
AX's strongest near-term value for us is not live cross-vendor delegation — it's the offline fork-and-race compare harness (same starting state → two divergent trajectories, both captured), which is the natural substrate for watchmen's blind-compare. Worth tracking separately; it still needs the trace fix to harvest tool-call signals.
Dependencies
Acceptance (phase 1, headless)
DelegationExecutorinterface with a headless backend (codex exec/claude -p).switch-harnessdecision can, when the project has opted in, actually run the foreign model and return the result — instead of only advising.