Conversation
…sure versions are generated differently from independent HLC's
Contributor
There was a problem hiding this comment.
Pull request overview
This PR aims to make HLV-based topology tests deterministic on Windows by spacing peer writes to avoid ties caused by Windows’ lower wall-clock precision (independent HLCs producing identical versions).
Changes:
- Add Windows-only
time.Sleepdelays before peer document mutations (create/update/delete) to help ensure unique HLV versions across peers. - Introduce
runtime/timeusage in the topology HLV test helpers to gate this behavior by OS.
Collaborator
|
As discussed offline, see if we can use |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
CBG-5460
Adds change to windows builds to get HLC time with GetSystemTimePreciseAsFileTime, reading the high-resolution system clock live on every call.
The Hybrid Logical Clock derives a version's physical component from the wall clock, then clears the low 16 logical bits — making each physical "slot" ~65 µs wide. On Linux/macOS, time.Now() resolves to ~1 ns, so this is a non-issue. On Windows, time.Now() is backed by the coarse system timer (~0.5–15 ms). In topology tests, peers writing in quick succession produced identical HLC versions, because many writes fell inside a single coarse tick.
The initial fix used QueryPerformanceCounter (QPC): snapshot a (QPC ticks, time.Now()) anchor pair once at init(), then compute anchorNanos + elapsedTicks. This resolves to ~100 ns which resolves the resolution issue.
The two clocks ended up with a fixed, never-correcting offset of up to ~15 ms between them — and because it depends on startup timing, the offset is random per process run. Constantly tripping the cv.ver <= cas re-stamp path and producing intermittent topology-test failures.
Switched to use GetSystemTimePreciseAsFileTime which fixes the issues as offers same precision but both base and rosmar read the same system clock
Sync Gateway's HLC and Couchbase Server's CAS are genuinely independent clocks on separate nodes, and the system is designed to converge under that skew — the CAS re-stamp exists precisely for it. When the topology test failed, the data still converged correctly (every peer agreed on the same cv and pv); only the test harness's predicted version was wrong.
Pre-review checklist
fmt.Print,log.Print, ...)base.UD(docID),base.MD(dbName))docs/apiDependencies (if applicable)
Integration Tests