fix(cor-572): CLI client timeout vs draining/black-holed core host#1501
Draft
dvydra wants to merge 2 commits into
Draft
fix(cor-572): CLI client timeout vs draining/black-holed core host#1501dvydra wants to merge 2 commits into
dvydra wants to merge 2 commits into
Conversation
newCrossJurisHTTPClient had no overall Timeout, so a draining/black-holed core that accepts the connection but never sends response headers made `entire` commands hang on the OS TCP timeout. Add a 30s overall timeout at that single chokepoint; every caller (New, NewForCluster, NewWithBearer, the ENTIRE_TOKEN bypass) inherits it. The dial stays bounded by httpclient.DefaultDialTimeout. Trail: https://entire.io/gh/entireio/cli/trails/644 Closes COR-572 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Author
Review gauntlet — cleanRan
🚧 Draft — NO MERGE / NO DEPLOY. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
https://entire.io/gh/entireio/cli/trails/644
COR-572 —
entireCLI hangs against a draining / black-holed core hostProblem. Control-plane commands build their HTTP client in
internal/coreapi/cross_juris_transport.go:newCrossJurisHTTPClient()(reached via bothcoreapi.New()→clientForTargetandcoreapi.NewWithBearer). That client sets no overallTimeout. The shared transport already bounds the TCP connect (httpclient.DefaultDialTimeout = 4s, env-overridable viaENTIRE_CONNECT_TIMEOUT_SECONDS), so a host that refuses/drops packets fails fast — but a draining host that accepts the connection and then never sends response headers (exactly the COR-562 cutover scenario) makesclient.Doblock until the OS TCP timeout. No CLI-level deadline applies.Fix. Give the control-plane
http.Clientan overall request timeout (30s) at that single chokepoint. Every caller (New,NewForCluster,NewWithBearer, theENTIRE_TOKENbypass) inherits it. A per-commandcontextdeadline incorecmd.gowas considered but the client-level timeout is the single point that covers all callers, includingentire auth status, which builds its own/meclient.Note on the issue's "2s-dial". The repo standardized on a 4s dial timeout (
httpclient.DefaultDialTimeout); I did not introduce a conflicting 2s value. Only the missing overall timeout is added.Files touched
internal/coreapi/cross_juris_transport.go— addcontrolPlaneRequestTimeoutconst + setTimeouton the client (via anewCrossJurisHTTPClientWithTimeouthelper so tests can use a short budget).internal/coreapi/client_timeout_test.go— test: a silent/draining host returns a timeout error promptly instead of blocking.Closes COR-572
🚧 Draft — NO MERGE / NO DEPLOY. Part of the COR-562 cutover tooling batch; scrapbook: entirehq/entiredb#2121.
Note
Low Risk
Narrow networking guardrail at one client factory; may surface timeouts on legitimately slow control-plane responses over 30s.
Overview
Fixes CLI hangs when the control-plane core accepts TCP but never returns HTTP headers (e.g. draining/black-holed host during cutover). Connect already fails fast via the shared 4s dial budget; the missing piece was an overall request deadline on
client.Do.Adds a 30s
http.Client.Timeoutat the single chokepoint innewCrossJurisHTTPClient(via a test-friendly helper so tests can use a shorter budget), so every control-plane path—coreapi.New,NewForCluster,NewWithBearer, andENTIRE_TOKEN—inherits the same cap, includingentire auth status’s/meclient.New test in
client_timeout_test.goasserts a silent/draining host returns a timeout error instead of blocking indefinitely.Reviewed by Cursor Bugbot for commit 38a6bc5. Configure here.