test(but-core): demonstrate data loss in safe_checkout_from_head deterministically#13090
test(but-core): demonstrate data loss in safe_checkout_from_head deterministically#13090
Conversation
…rministically Add a single-threaded unit test that reproduces the data-loss bug without any concurrency. The key insight is that safe_checkout_from_head reads repo.head_tree_id_or_empty() *at call time*. If a concurrent process has already updated HEAD (H0 → H_A), the next caller computes the diff H_A → H_B where H_B was derived from the stale H0. That diff marks A's file as DELETED and the checkout removes it from disk. The test sets up two workspace commits derived from the same parent H0: H_A = H0 + auth/session.ts H_B = H0 + api/pagination.ts (no session.ts — stale derivation) 1. safe_checkout_from_head(H_A) → HEAD = H_A, session.ts created ✓ 2. safe_checkout_from_head(H_B) → reads HEAD = H_A, diff H_A→H_B marks session.ts as DELETED → physically removes it ← DATA LOSS The test does not propagate errors immediately on the second checkout; instead it checks whether session.ts still exists on disk and only then emits the detailed failure message. This ensures the data-loss assertion fires even if the checkout also returns an error. The test is #[ignore] because it currently FAILS (demonstrating the bug). Remove the ignore and flip the assertion once the fix lands.
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
There was a problem hiding this comment.
Pull request overview
Adds an (ignored) regression test in but-core to deterministically reproduce and document a data-loss scenario in safe_checkout_from_head when two workspace trees are derived from the same stale base HEAD.
Changes:
- Import
safe_checkout_from_headinto the existing worktree checkout test suite. - Add a new single-threaded
#[ignore]unit test that demonstrates how a stale-derived checkout can delete files created by a prior checkout.
| /// deletion). Remove the `#[ignore]` annotation once the bug is fixed and flip the | ||
| /// final assertion to `assert!(session_path.exists(), …)`. |
There was a problem hiding this comment.
The doc comment says to “flip the final assertion to assert!(session_path.exists(), …) once the bug is fixed”, but the test doesn’t currently have a final assertion to flip (it relies on the match arms/panic behavior). Consider rewording this to match the actual intended post-fix behavior (e.g., whether the second checkout must succeed, or only must not delete the file).
| /// deletion). Remove the `#[ignore]` annotation once the bug is fixed and flip the | |
| /// final assertion to `assert!(session_path.exists(), …)`. | |
| /// deletion). Once the bug is fixed, remove the `#[ignore]` annotation and update the | |
| /// final expectations to require that the second checkout succeeds and that | |
| /// `session_path` still exists on disk (for example, by asserting `session_path.exists()`). |
|
|
||
| // Classify what happened and emit the most informative failure message possible. | ||
| match (checkout_b_result, session_deleted) { | ||
| (_, true) => { |
There was a problem hiding this comment.
In the data-loss case you match (_, true) and then panic, which discards any error returned by the second safe_checkout_from_head call. Splitting this into (Err(e), true) vs (Ok(_), true) (or including the error in the panic message when present) would make failures much easier to diagnose while still prioritizing the on-disk data-loss assertion.
| (_, true) => { | |
| (Err(e), true) => { | |
| // Data was lost and the checkout call also returned an error — include both in the failure. | |
| panic!( | |
| "DATA LOSS BUG: auth/session.ts was deleted from disk by \ | |
| safe_checkout_from_head(H_B).\n\n\ | |
| Cause: H_B was computed from stale H₀ (before process A committed), so it \ | |
| does not contain auth/session.ts. However safe_checkout_from_head reads the \ | |
| CURRENT HEAD (= H_A, which does contain auth/session.ts) as the source tree. \ | |
| The resulting diff H_A → H_B marks auth/session.ts as DELETED and the \ | |
| checkout physically removes the file.\n\n\ | |
| Additionally, safe_checkout_from_head(H_B) returned an error:\n\ | |
| {e:?}\n\n\ | |
| Fix: serialise workspace checkouts with inter-process locking \ | |
| (try_exclusive_inter_process_access) or capture the current HEAD before the \ | |
| workspace is recomputed and pass it explicitly to safe_checkout.", | |
| e | |
| ); | |
| } | |
| (Ok(_), true) => { |
Byron
left a comment
There was a problem hiding this comment.
@krlvi I took a look and don't think that the PR should not be merged, for two reasons:
- it doesn't show a race, it shows conflicting instructions. Checkout A then checkout B, and it does as it's told. Safe checkout is safe because it won't nuke unstaged or untracked files, but it's still allowed to set the target tree to match the given commit.
- safe_checkout is a plumbing function which expects to have a repository lock on the caller side. Once these are inter-process safe, safe-checkout will not easily be interleaved with conflicting calls.
Something that safe_checkout could do on top is to immediately grab a lock on the .git/index to prevent other checkouts, for instance by Git itself, from happening. This is definitely a possible improvement and one that I think should be done.
For example, right now, it would be possible for one but commit to wait for a commit-message, and a Git commit to come in in the meantime that will happily be overwritten once the commit message clears.
Will see to following this up.
I see, this makes sense. So do i understand correctly that we mainly wanna address this at a "higher level"/app logic, like this test targets #13079 ? |
|
@krlvi Yes, the higher level would be the way. |
Add a single-threaded unit test that reproduces the data-loss bug without
any concurrency. The key insight is that safe_checkout_from_head reads
repo.head_tree_id_or_empty() at call time. If a concurrent process has
already updated HEAD (H0 → H_A), the next caller computes the diff H_A → H_B
where H_B was derived from the stale H0. That diff marks A's file as DELETED
and the checkout removes it from disk.
The test sets up two workspace commits derived from the same parent H0:
H_A = H0 + auth/session.ts
H_B = H0 + api/pagination.ts (no session.ts — stale derivation)
session.ts as DELETED → physically removes it ← DATA LOSS
The test does not propagate errors immediately on the second checkout;
instead it checks whether session.ts still exists on disk and only then
emits the detailed failure message. This ensures the data-loss assertion
fires even if the checkout also returns an error.
The test is #[ignore] because it currently FAILS (demonstrating the bug).
Remove the ignore and flip the assertion once the fix lands.