Conversation
Redocly previews |
There was a problem hiding this comment.
Pull request overview
Adjusts Sync Gateway’s GoCB retry behavior so readiness checks and index lookups no longer use fail-fast retry by default, with an unsupported.use_gocb_fast_fail_retry switch to re-enable fail-fast behavior when desired.
Changes:
- Introduces
unsupported.use_gocb_fast_fail_retry(config + flag + OpenAPI) and threads it through bootstrap/per-db connections. - Switches cluster/bucket
WaitUntilReadyand index lookup retry strategies between fail-fast and best-effort based on the new setting. - Updates affected tests to cover both modes and to reflect differing HTTP error classifications.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
rest/server_context.go |
Propagates startup unsupported.use_gocb_fast_fail_retry into per-db BucketSpec. |
rest/main.go |
Adds bootstrap option plumbing and passes retry-mode into CouchbaseClusterSpec. |
rest/database_init_manager.go |
Passes retry-mode into NewClusterOnlyN1QLStore for index initialization. |
rest/config_startup.go |
Adds UseGOCBFastFailRetry to UnsupportedConfig. |
rest/config_flags.go |
Registers --unsupported.use_gocb_fast_fail_retry startup flag. |
rest/adminapitest/admin_api_test.go |
Updates admin API tests to validate behavior/status differences in both retry modes. |
docs/api/components/schemas.yaml |
Documents the new startup unsupported.use_gocb_fast_fail_retry property in OpenAPI schema. |
db/indextest/util.go |
Updates N1QL store construction to include retry-mode parameter. |
db/indextest/indextest_test.go |
Updates N1QL store construction to include retry-mode parameter. |
db/indextest/indextest_dual_metadata_test.go |
Updates N1QL store construction to include retry-mode parameter. |
base/gocb_utils.go |
Adds goCBRetryStrategy helper returning fail-fast vs best-effort strategy. |
base/gocb_utils_test.go |
Adds unit test coverage for goCBRetryStrategy. |
base/collection.go |
Makes cluster/bucket readiness retry strategy configurable via BucketSpec.UseGOCBFastFailRetry. |
base/collection_n1ql.go |
Threads retry-mode into N1QL index manager creation. |
base/collection_n1ql_common.go |
Uses configurable retry strategy for index enumeration (GetAllIndexes). |
base/cluster_n1ql.go |
Extends NewClusterOnlyN1QLStore to store retry-mode and pass into index manager. |
base/bucket.go |
Adds UseGOCBFastFailRetry to BucketSpec. |
base/bucket_gocb_test.go |
Updates incorrect-login test expectations for both retry modes (fast-fail vs timeout). |
base/bootstrap.go |
Adds retry-mode to CouchbaseClusterSpec/CouchbaseCluster, applies configurable readiness retry, and maps non-auth bucket readiness failures to 502. |
torcolvin
left a comment
There was a problem hiding this comment.
To discuss with team, but my preference is to not make this configurable since I can't imagine people opting into this - the reason to opt in would be if you expect to make typos or credential problems and this is mostly useful when initially setting up Sync Gateway. In a production environment, you would want retry behavior to make sure that connections are robust to things like:
- one bad node in a connection string or in SRV records.
- If there is another issue like https://jira.issues.couchbase.com/browse/GOCBC-1812 where operator can surface nodes to connect to that aren't ideal.
- One node that is temporarily offline or flickering.
| ViewQueryTimeoutSecs *uint32 // the view query timeout in seconds (default: 75 seconds) | ||
| MaxConcurrentQueryOps *int // maximum number of concurrent query operations (default: DefaultMaxConcurrentQueryOps) | ||
| BucketOpTimeout *time.Duration // How long bucket ops should block returning "operation timed out". If nil, uses GoCB default. GoCB buckets only. | ||
| UseGOCBFastFailRetry bool // When true, gocb readiness checks and index lookups fail fast instead of using the best-effort retry strategy |
There was a problem hiding this comment.
What do you think about doing something like: InitialConnectionRetryStrategy *gocb.RetryStrategy with something like "If unset, use the default retry strategy for sync gateway.
The other idea if this is confusing is to create an enum for types of strategies:
- DefaultRetryStrategy
- FastFailOnInitialConnectRetryStrategy
CBG-5357
Make best effort retry default for bucket readiness and index lookups
Adds config option to make it fast fail
Parameterise relevant tests
Pre-review checklist
fmt.Print,log.Print, ...)base.UD(docID),base.MD(dbName))docs/apiDependencies (if applicable)
Integration Tests