Skip to content

CBG-5357: removal of fast fail retry by default#8308

Merged
gregns1 merged 6 commits into
mainfrom
CBG-5357
May 29, 2026
Merged

CBG-5357: removal of fast fail retry by default#8308
gregns1 merged 6 commits into
mainfrom
CBG-5357

Conversation

@gregns1

@gregns1 gregns1 commented May 28, 2026

Copy link
Copy Markdown
Contributor

CBG-5357

Make best effort retry default for bucket readiness and index lookups
Adds config option to make it fast fail
Parameterise relevant tests

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Dependencies (if applicable)

  • Link upstream PRs
  • Update Go module dependencies when merged

Integration Tests

Copilot AI review requested due to automatic review settings May 28, 2026 11:00
@github-actions

github-actions Bot commented May 28, 2026

Copy link
Copy Markdown

Redocly previews

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts Sync Gateway’s GoCB retry behavior so readiness checks and index lookups no longer use fail-fast retry by default, with an unsupported.use_gocb_fast_fail_retry switch to re-enable fail-fast behavior when desired.

Changes:

  • Introduces unsupported.use_gocb_fast_fail_retry (config + flag + OpenAPI) and threads it through bootstrap/per-db connections.
  • Switches cluster/bucket WaitUntilReady and index lookup retry strategies between fail-fast and best-effort based on the new setting.
  • Updates affected tests to cover both modes and to reflect differing HTTP error classifications.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
rest/server_context.go Propagates startup unsupported.use_gocb_fast_fail_retry into per-db BucketSpec.
rest/main.go Adds bootstrap option plumbing and passes retry-mode into CouchbaseClusterSpec.
rest/database_init_manager.go Passes retry-mode into NewClusterOnlyN1QLStore for index initialization.
rest/config_startup.go Adds UseGOCBFastFailRetry to UnsupportedConfig.
rest/config_flags.go Registers --unsupported.use_gocb_fast_fail_retry startup flag.
rest/adminapitest/admin_api_test.go Updates admin API tests to validate behavior/status differences in both retry modes.
docs/api/components/schemas.yaml Documents the new startup unsupported.use_gocb_fast_fail_retry property in OpenAPI schema.
db/indextest/util.go Updates N1QL store construction to include retry-mode parameter.
db/indextest/indextest_test.go Updates N1QL store construction to include retry-mode parameter.
db/indextest/indextest_dual_metadata_test.go Updates N1QL store construction to include retry-mode parameter.
base/gocb_utils.go Adds goCBRetryStrategy helper returning fail-fast vs best-effort strategy.
base/gocb_utils_test.go Adds unit test coverage for goCBRetryStrategy.
base/collection.go Makes cluster/bucket readiness retry strategy configurable via BucketSpec.UseGOCBFastFailRetry.
base/collection_n1ql.go Threads retry-mode into N1QL index manager creation.
base/collection_n1ql_common.go Uses configurable retry strategy for index enumeration (GetAllIndexes).
base/cluster_n1ql.go Extends NewClusterOnlyN1QLStore to store retry-mode and pass into index manager.
base/bucket.go Adds UseGOCBFastFailRetry to BucketSpec.
base/bucket_gocb_test.go Updates incorrect-login test expectations for both retry modes (fast-fail vs timeout).
base/bootstrap.go Adds retry-mode to CouchbaseClusterSpec/CouchbaseCluster, applies configurable readiness retry, and maps non-auth bucket readiness failures to 502.

Comment thread base/bootstrap.go Outdated

@torcolvin torcolvin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To discuss with team, but my preference is to not make this configurable since I can't imagine people opting into this - the reason to opt in would be if you expect to make typos or credential problems and this is mostly useful when initially setting up Sync Gateway. In a production environment, you would want retry behavior to make sure that connections are robust to things like:

  1. one bad node in a connection string or in SRV records.
  2. If there is another issue like https://jira.issues.couchbase.com/browse/GOCBC-1812 where operator can surface nodes to connect to that aren't ideal.
  3. One node that is temporarily offline or flickering.

Comment thread base/bootstrap.go Outdated
Comment thread base/bucket.go Outdated
ViewQueryTimeoutSecs *uint32 // the view query timeout in seconds (default: 75 seconds)
MaxConcurrentQueryOps *int // maximum number of concurrent query operations (default: DefaultMaxConcurrentQueryOps)
BucketOpTimeout *time.Duration // How long bucket ops should block returning "operation timed out". If nil, uses GoCB default. GoCB buckets only.
UseGOCBFastFailRetry bool // When true, gocb readiness checks and index lookups fail fast instead of using the best-effort retry strategy

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about doing something like: InitialConnectionRetryStrategy *gocb.RetryStrategy with something like "If unset, use the default retry strategy for sync gateway.

The other idea if this is confusing is to create an enum for types of strategies:

  1. DefaultRetryStrategy
  2. FastFailOnInitialConnectRetryStrategy

Comment thread base/cluster_n1ql.go Outdated
Comment thread base/collection_n1ql.go Outdated
Comment thread base/collection_n1ql_common.go Outdated
Comment thread docs/api/components/schemas.yaml Outdated
torcolvin
torcolvin previously approved these changes May 29, 2026
@gregns1 gregns1 enabled auto-merge (squash) May 29, 2026 16:44
@gregns1 gregns1 merged commit 3698360 into main May 29, 2026
51 checks passed
@gregns1 gregns1 deleted the CBG-5357 branch May 29, 2026 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants