Skip to content

fix(windows): make tightvnc startup deterministic#689

Closed
fcoury-oai wants to merge 1 commit into
openclaw:mainfrom
fcoury-oai:fcoury/tightvnc-deterministic-startup
Closed

fix(windows): make tightvnc startup deterministic#689
fcoury-oai wants to merge 1 commit into
openclaw:mainfrom
fcoury-oai:fcoury/tightvnc-deterministic-startup

Conversation

@fcoury-oai

@fcoury-oai fcoury-oai commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Why

Azure Windows desktop leases can show the Windows Firewall prompt for TightVNC Server the first time the per-user VNC process starts. That modal blocks the visible desktop and can sit in front of automation that is trying to inspect or drive the target application.

The VNC bootstrap should make the desktop transport ready without depending on a human click or on best-effort registry timing.

Example
image

What Changed

  • Add a Crabbox-TightVNC-Loopback Windows Firewall allow rule for tvnserver.exe on TCP port 5900 before the user-session VNC process is scheduled or started.
  • Make the per-user TightVNC startup fail loudly if the HKCU password values cannot be copied from the machine-wide TightVNC configuration.
  • Wait for both Password and ControlPassword registry values instead of assuming they are immediately available after install.
  • Use ScheduledTasks cmdlets for the interactive CrabboxUserVNC task so the task definition is explicit about interactive logon and highest run level.
  • Mirror the same bootstrap behavior in the worker-side Windows bootstrap generator.

How to Test

  1. Build the local CLI from this branch:

    go build -trimpath -o bin/crabbox ./cmd/crabbox
  2. Acquire a Windows desktop lease using the Azure provider.

  3. Connect to the lease through crabbox vnc.

  4. Confirm that the desktop is reachable through VNC without a Windows Security firewall prompt for TightVNC Server.

  5. Confirm that the VNC server still requires VNC authentication and only listens on the expected loopback-forwarded path.

Targeted tests:

  • go test ./internal/cli -run TestAWSUserDataWindowsProfile -count=1
  • npm test -- bootstrap.test.ts from worker/

Documentation

No developers.openai.com/codex documentation update is needed.

@clawsweeper

clawsweeper Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 25, 2026, 9:17 AM ET / 13:17 UTC.

Summary
The PR updates the Go and Worker Windows desktop bootstrap generators and tests to add a TightVNC firewall rule, retry HKCU password seeding, and create the CrabboxUserVNC task with ScheduledTasks cmdlets.

Reproducibility: no. high-confidence live reproduction was established in this review. The PR body shows the firewall prompt and current source starts user-session TightVNC after installing with firewall exceptions disabled, but I did not run an Azure Windows lease.

Review metrics: 2 noteworthy metrics.

  • Changed surface: 4 files, +162/-28. The PR changes both bootstrap generators and their tests, so any security-boundary fix must stay mirrored across CLI and Worker paths.
  • Generated firewall rules: 2 TCP/5900 allow rules added. Both generated Windows bootstrap paths add an inbound VNC firewall rule that changes the desktop access boundary.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🧂 unranked krab
Result: blocked until stronger real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Constrain or otherwise prove the TightVNC listener/firewall path remains tunnel-only in both generators.
  • [P1] Add after-fix Azure Windows proof showing no firewall prompt, successful authenticated VNC, and no direct VNC exposure with private details redacted.
  • Update the PR body after adding proof so ClawSweeper re-reviews automatically; if it does not, ask a maintainer to comment @clawsweeper re-review.

Proof guidance:

  • [P1] Needs stronger real behavior proof before merge: The PR body includes a screenshot of the pre-fix prompt, but it needs after-fix Azure Windows proof showing no prompt, authenticated VNC, and no direct VNC exposure; screenshots, terminal output, recordings, or redacted logs are acceptable. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Mantis proof suggestion
A real Windows desktop proof would materially help because the changed behavior is a visible firewall prompt plus VNC reachability/security validation. A maintainer can ask Mantis to capture proof by posting this exact PR comment:

@openclaw-mantis visual task: verify an Azure Windows desktop lease reaches VNC without a TightVNC firewall prompt and cannot reach VNC directly outside the SSH tunnel.

Risk before merge

  • [P1] Merging as-is can weaken the documented tunnel-only VNC boundary because the Windows Firewall allow rule is scoped to tvnserver.exe and TCP/5900 but not to loopback or an equivalent narrow source/destination constraint.
  • [P1] The PR has a pre-fix screenshot and test plan, but still needs after-fix Azure Windows proof showing no firewall prompt, successful authenticated VNC, and no direct VNC exposure.

Maintainer options:

  1. Constrain VNC Before Allowing Firewall (recommended)
    Change both generators so the firewall/listener path cannot expose TightVNC beyond the intended loopback SSH tunnel, then prove that behavior on a live Windows lease.
  2. Accept A Broader Network Assumption
    Maintainers could intentionally accept the rule only after documenting the provider/network assumptions and proving supported managed leases still reject direct VNC access.
  3. Hold Draft For Live Proof
    Keep the PR draft until after-fix Azure Windows proof demonstrates the prompt is gone and the VNC boundary still holds.

Next step before merge

  • [P1] Human review is needed because proof is insufficient and the firewall/security-boundary behavior must be constrained or validated in a live Windows lease.

Security
Needs attention: The diff adds unscoped Windows Firewall allow rules for TightVNC on TCP/5900 in both bootstrap generators, which can weaken the documented tunnel-only VNC boundary.

Review findings

  • [P1] Constrain the TightVNC firewall rule — internal/cli/bootstrap.go:486
  • [P1] Mirror the VNC boundary fix in the Worker generator — worker/src/bootstrap.ts:498
Review details

Best possible solution:

Keep the deterministic startup and registry retry work, but preserve tunnel-only VNC in both bootstrap generators and require live Windows proof for prompt suppression, authentication, and direct-VNC rejection before merge.

Do we have a high-confidence way to reproduce the issue?

No high-confidence live reproduction was established in this review. The PR body shows the firewall prompt and current source starts user-session TightVNC after installing with firewall exceptions disabled, but I did not run an Azure Windows lease.

Is this the best way to solve the issue?

No. Deterministic startup is the right direction, but the current firewall rule is not the narrowest maintainable fix because it can violate the tunnel-only VNC boundary.

Full review comments:

  • [P1] Constrain the TightVNC firewall rule — internal/cli/bootstrap.go:486
    Current docs require managed VNC to stay reachable only through the SSH tunnel. This new rule allows inbound TCP/5900 for tvnserver.exe on any profile with no address scope, so a VM whose network permits 5900 can expose the VNC server outside Crabbox's tunnel. Please either prove the server is loopback-only before adding the allow rule or constrain/remove the rule while still suppressing the prompt.
    Confidence: 0.88
  • [P1] Mirror the VNC boundary fix in the Worker generator — worker/src/bootstrap.ts:498
    The Worker bootstrap generator emits the same unscoped TCP/5900 allow rule, so brokered/generated Windows leases would carry the same direct-VNC exposure risk even if the Go path were fixed alone. Keep the Worker generator aligned with whatever loopback-only or scoped firewall behavior lands in the CLI bootstrap.
    Confidence: 0.86

Overall correctness: patch is incorrect
Overall confidence: 0.86

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against 0ec69d642764.

Label changes

Label justifications:

  • P2: This is a normal-priority Windows desktop reliability fix with a concrete merge blocker but limited blast radius while unmerged.
  • merge-risk: 🚨 security-boundary: Merging as-is could weaken Crabbox's documented tunnel-only VNC boundary by allowing inbound TightVNC traffic on TCP/5900.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🧂 unranked krab.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The PR body includes a screenshot of the pre-fix prompt, but it needs after-fix Azure Windows proof showing no prompt, authenticated VNC, and no direct VNC exposure; screenshots, terminal output, recordings, or redacted logs are acceptable. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
  • proof: 📸 screenshot: Contributor real behavior proof includes screenshot evidence. The PR body includes a screenshot of the pre-fix prompt, but it needs after-fix Azure Windows proof showing no prompt, authenticated VNC, and no direct VNC exposure; screenshots, terminal output, recordings, or redacted logs are acceptable.
Evidence reviewed

Security concerns:

  • [medium] Broad TightVNC firewall allow rule — internal/cli/bootstrap.go:486
    New-NetFirewallRule is added for tvnserver.exe on TCP/5900 with -Profile Any and no address restriction; current docs require VNC to be reachable only through SSH/WebVNC tunnel paths.
    Confidence: 0.88
  • [medium] Worker emits the same VNC exposure — worker/src/bootstrap.ts:498
    The Worker bootstrap generator mirrors the same unscoped firewall rule, so the security-boundary issue is not limited to local CLI-generated bootstrap scripts.
    Confidence: 0.86

What I checked:

  • PR adds broad local firewall rule: The PR head adds New-NetFirewallRule for tvnserver.exe on TCP/5900 with -Profile Any and no local or remote address scope in the Go bootstrap generator. (internal/cli/bootstrap.go:486, bd70b17b6d4a)
  • Worker generator mirrors the same rule: The Worker bootstrap generator emits the same unscoped TCP/5900 allow rule, so brokered/generated Windows leases would carry the same boundary change. (worker/src/bootstrap.ts:498, bd70b17b6d4a)
  • Current docs require tunnel-only VNC: The VNC plan says VNC/noVNC is never exposed publicly and no provider firewall/security-group ingress is added for VNC. (docs/plan/vnc.md:105, 0ec69d642764)
  • Windows docs promise SSH-tunnel reachability: Managed Azure Windows VNC is documented as listening on port 5900 and being reachable only through the SSH tunnel. (docs/features/vnc-windows.md:71, 0ec69d642764)
  • Current main disables the MSI firewall exception: Current main installs TightVNC with SERVER_ADD_FIREWALL_EXCEPTION=0, matching the documented tunnel-only boundary the PR needs to preserve. (internal/cli/bootstrap.go:468, 0ec69d642764)
  • Provider ingress remains SSH-scoped: Current Azure and AWS ingress code builds provider rules for SSH candidate ports; the Windows readiness probe checks VNC via 127.0.0.1:5900. (internal/cli/azure.go:697, 0ec69d642764)

Likely related people:

  • fcoury-oai: Prior merged PR fix(windows): make azure desktop leases reliable #497 changed Azure Windows desktop reliability and touched the same Go and Worker bootstrap generators shortly before this PR. (role: recent adjacent contributor; confidence: high; commits: 3fecdef99268, 9af8f2a78eca; files: internal/cli/bootstrap.go, internal/cli/bootstrap_test.go, worker/src/bootstrap.ts)
  • steipete: PR history shows the initial interactive desktop/VNC implementation and recent Windows bootstrap artifact verification in the same docs and bootstrap area. (role: desktop/VNC feature owner; confidence: high; commits: 8caabdbaa944, af4fc5224763, 0b19d9ddf87f; files: docs/plan/vnc.md, docs/features/vnc-windows.md, internal/cli/bootstrap.go)
  • jwmoss: PR feat(azure): support Windows desktop and WSL2 leases #81 introduced Azure Windows desktop and WSL2 support across the same Windows VNC docs and bootstrap generators. (role: Windows/Azure desktop bootstrap introducer; confidence: high; commits: ab38b0caeb83, 5aa3df8dfb27; files: docs/features/vnc-windows.md, internal/cli/bootstrap.go, worker/src/bootstrap.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: 📸 screenshot Contributor real behavior proof includes screenshot evidence. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. labels Jun 25, 2026
@steipete

Copy link
Copy Markdown
Contributor

Superseded by 184979d.

Current main removes the per-user TightVNC startup path entirely and runs the authenticated TightVNC service instead. That addresses the reported first-login firewall modal and registry timing race without adding TCP/5900 firewall ingress.

Verified on current main:

  • go test ./internal/cli -run 'TestAWSUserDataWindowsProfile|Test.*Windows.*VNC|Test.*WebVNC' -count=1
  • npm test --prefix worker -- bootstrap.test.ts (17 tests)
  • Azure Windows desktop acquisition was attempted with Standard_D4s_v5, but the coordinator returned no VM/host before its 30-minute timeout; lease cbx_7538f819aace was released, so no live desktop claim is being made.

Thanks @fcoury-oai for isolating the modal and startup race. The report was useful; the safer service-mode fix landed independently while this PR was open.

@steipete steipete closed this Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. P2 Normal priority bug or improvement with limited blast radius. proof: 📸 screenshot Contributor real behavior proof includes screenshot evidence. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants