forked from git/git
-
Notifications
You must be signed in to change notification settings - Fork 164
fsck: snapshot default refs before object walk #2026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
newren
wants to merge
1
commit into
gitgitgadget:master
Choose a base branch
from
newren:fsck-snapshot
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+70
−4
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fsck has a race when operating on live repositories; consider the
following simple script that writes new commits as fsck runs:
#!/bin/bash
git fsck &
PID=$!
while ps -p $PID >/dev/null; do
sleep 3
git commit -q --allow-empty -m "Another commit"
done
Since fsck reads refs at the beginning, walks those for connectivity,
and then reads the refs again at the end to check, this can cause fsck
to get confused and think that the new refs refer to missing commits and
that new reflog entries are invalid. Running the above script in a
clone of git.git results in the following (output ellipsized to remove
additional errors of the same type):
$ ./fsck-while-writing.sh
Checking ref database: 100% (1/1), done.
Checking object directories: 100% (256/256), done.
warning in tag d6602ec: missingTaggerEntry: invalid format - expected 'tagger' line
Checking objects: 100% (835091/835091), done.
error: HEAD: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
error: HEAD: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
error: HEAD: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
error: HEAD: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
[...]
error: HEAD: invalid reflog entry 87c8a5c2f6b79d9afa9e941590b9a097b6f7ac09
error: HEAD: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
error: HEAD: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
error: HEAD: invalid reflog entry 6724f2dfede88bfa9445a333e06e78536c0c6c0d
error: refs/heads/mybranch invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
error: refs/heads/mybranch: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
error: refs/heads/mybranch: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
error: refs/heads/mybranch: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
[...]
error: refs/heads/mybranch: invalid reflog entry 87c8a5c2f6b79d9afa9e941590b9a097b6f7ac09
error: refs/heads/mybranch: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
error: refs/heads/mybranch: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
error: refs/heads/mybranch: invalid reflog entry 6724f2dfede88bfa9445a333e06e78536c0c6c0d
Checking connectivity: 833846, done.
missing commit 6724f2dfede88bfa9445a333e06e78536c0c6c0d
Verifying commits in commit graph: 100% (242243/242243), done.
This problem doesn't occur when refs are specified on the command line
for us to check, since we use those specified refs for both walking and
checking. Using the same refs for walking and checking seems to just
make sense, so modify the existing code to do the same when refs aren't
specified. Snapshot the refs at the beginning, and also ignore all
reflog entries since the time of our snapshot (while this technically
means we could ignore a reflog entry created before the fsck process
if the local clock is weird, since reflogs are local-only there are not
concerns about differences between clocks on different machines). This
combination of changes modifies the output of running the above script
to:
$ ./fsck-while-writing.sh
Checking ref database: 100% (1/1), done.
Checking object directories: 100% (256/256), done.
warning in tag d6602ec: missingTaggerEntry: invalid format - expected 'tagger' line
Checking objects: 100% (835091/835091), done.
Checking connectivity: 833846, done.
Verifying commits in commit graph: 100% (242243/242243), done.
While worries about live updates while running fsck is likely of most
interest for forge operators, it will likely also benefit those with
automated jobs (such as git maintenance) or even casual users who want
to do other work in their clone while fsck is running.
Signed-off-by: Matthew John Cheetham <[email protected]>
Co-authored-by: Elijah Newren <[email protected]>
[en: several changes:
* adjusted for upstream refactorings to refs callback call signatures
* handle reflogs as well
* free recorded snapshot of refs when done
* default to snapshotting instead of making it a non-default option
* provide reproducible testcase in commit message and rewrite commit
message around it
]
Signed-off-by: Elijah Newren <[email protected]>
Author
|
/submit |
|
Submitted as [email protected] To fetch this version into To fetch this version to local tag |
|
On the Git mailing list, Junio C Hamano wrote (reply to this): "Elijah Newren via GitGitGadget" <[email protected]> writes:
> This problem doesn't occur when refs are specified on the command line
> for us to check, since we use those specified refs for both walking and
> checking. Using the same refs for walking and checking seems to just
> make sense, so modify the existing code to do the same when refs aren't
> specified.
Excellent analysis and good approach.
> Snapshot the refs at the beginning, and also ignore all
> reflog entries since the time of our snapshot (while this technically
> means we could ignore a reflog entry created before the fsck process
> if the local clock is weird, since reflogs are local-only there are not
> concerns about differences between clocks on different machines).
Repository on a network filesystem being accessed by hosts with
broken clock?
I do not think our reflog API has (1) give me some token to mark
your current state (2) here is the token you gave me earlier, now
iterate and yield entries but ignore entries added after you gave me
that token, so going by the reflog timestamp is probably the best we
could do. Any approach may get confused when the user tries to be
cute and issues "reflog delete" or "reflog expire" in the middle
anyway, I suspect ;-)
> While worries about live updates while running fsck is likely of most
> interest for forge operators, it will likely also benefit those with
> automated jobs (such as git maintenance) or even casual users who want
> to do other work in their clone while fsck is running.
Great. Will queue. Thanks.
> @@ -509,6 +510,9 @@ static int fsck_handle_reflog_ent(const char *refname,
> timestamp_t timestamp, int tz UNUSED,
> const char *message UNUSED, void *cb_data UNUSED)
> {
> + if (now && timestamp > now)
> + return 0;
> +
> if (verbose)
> fprintf_ln(stderr, _("Checking reflog %s->%s"),
> oid_to_hex(ooid), oid_to_hex(noid));
> @@ -567,14 +571,53 @@ static int fsck_head_link(const char *head_ref_name,
> const char **head_points_at,
> struct object_id *head_oid);
>
> -static void get_default_heads(void)
> +struct ref_snapshot {
> + size_t nr;
> + size_t name_alloc;
> + size_t oid_alloc;
> + char **refname;
> + struct object_id *oid;
> +};
This data structure is somewhat unexpected. Instead of a struct
that holds two arrays, I would have rather expected an array of
"struct { refname, oid }", with the possiblity to add a "token to
mark the latest reflog entry" to the mix I alluded to earlier when
such an API function materializes.
[Footnote]
We could call refs_for_each_reflog_ent_reverse(), grab the
parameters that each_reflog_ent_fn receives as that "token" for the
latest reflog entry and stop. That way, we will learn the value of
<old,new,committer,timestamp,tz,msg>, which should be a robust
enough unique key.
After that when iterating over the reflog, we know we should stop
after processing the reflog entry that holds the recorded value. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.