Skip to content

cross-vendor delegation — execute the routed winner, not just advise #96

@aktasbatuhan

Description

@aktasbatuhan

Summary

When routing picks a winner from another vendor (switch-harness, or any model the source harness can't run natively), watchmen today only writes an advisory sentence — it makes the decision but never executes it (route_rewrite.py:129 cross_runtime guard, PR #92). This issue tracks turning that advice into actual cross-vendor delegation: run the chosen model, return the result — invisibly and safely.

Today

  • In-vendor routing writes a runnable artifact (CC subagent model:, Codex profile model =) and the host self-delegates.
  • Cross-vendor falls back to text: "For the X skill, run it on Codex — Claude Code can't run gpt-5.5 natively." (_advisory_sentence, route_rewrite.py:190). The hands stop at a sentence.

Proposed design — a DelegationExecutor seam

One interface, pluggable backends: (task_context, target={harness, model}) -> result + cost + signals.

  • Headless backend (build first; no AX dependency). A router subagent / dispatch block shells codex exec --json or claude -p --output-format stream-json as a worker and returns structured output. The AX spike already proved the cross-vendor handoff (CC↔Codex with structured history) works using exactly these CLIs — no external runtime required.
  • AX backend (future / optional). Same interface, dispatch via AX. Adopt only when AX clears a concrete bar: a real HarnessService protocol (currently a literal "Hello world" placeholder), controller2 dispatch-to-named-agent + resume (both TODO today), observable trace (it currently drops tool calls), and detached execution. AX is also still entirely cost-agnostic, and upstream contributions are paused — so it's a backend we plug in later, not a dependency. Keep the ax_dispatch.py experiment as the stub behind this interface.

Guardrails (must-have before any auto-execution)

  • Consent + allowlist (per project). Silently sending proprietary code to a "cheaper" third-party vendor is a data-governance problem. Cross-vendor delegation must be explicit opt-in per project.
  • Budget caps. Token + delegation-depth limits (cf. opencode's task_budget / depth caps) to prevent runaway cost.
  • Execution-grounded validation. Prefer signals like tests-pass / diff-applies over judge opinion when accepting delegated work.

Note on AX's real fit

AX's strongest near-term value for us is not live cross-vendor delegation — it's the offline fork-and-race compare harness (same starting state → two divergent trajectories, both captured), which is the natural substrate for watchmen's blind-compare. Worth tracking separately; it still needs the trace fix to harvest tool-call signals.

Dependencies

Acceptance (phase 1, headless)

  • DelegationExecutor interface with a headless backend (codex exec / claude -p).
  • A switch-harness decision can, when the project has opted in, actually run the foreign model and return the result — instead of only advising.
  • Per-project opt-in + token/depth budget enforced; off by default.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions