fix: coalesce split Myers hunks to prevent false merge conflicts#2476
fix: coalesce split Myers hunks to prevent false merge conflicts#2476Mattias Granlund (mtsgrd) wants to merge 1 commit intoGitoxideLabs:mainfrom
Conversation
When imara-diff's Myers algorithm diffs two files, it sometimes splits what is logically one change into a non-empty deletion hunk and a separate empty insertion hunk, with one unchanged base line between them. This is a valid minimal edit script, but it differs from the alignment that git's xdiff (also Myers-based) would choose. When the empty insertion lands at a base position that the other side of a 3-way merge also touches, `take_intersecting` reports an overlap and the merge produces a conflict — even though `git merge-file` resolves the same inputs cleanly. Fix this by adding a pre-processing step after sorting hunks: for each empty-range insertion hunk, look backwards for the nearest same-side non-empty hunk within a gap of at most one unchanged base line. If found, extend that hunk to cover the gap and the insertion point. This re-joins the split hunk, making the merge robust to different diff algorithm alignment choices. The coalescing is conservative: it only applies when (a) the insertion hunk has an empty before-range, (b) there is a same-side non-empty hunk nearby (gap ≤ 1), and (c) that hunk is the nearest same-side hunk. This avoids affecting cases like zdiff3-interesting where empty insertions are standalone and represent genuinely different changes. Closes GitoxideLabs#2475 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Sebastian Thiel (@Byron) I think the journey test failure is flaky? Also, as far as I can tell this analysis doesn't contain any contradictions, and the .svelte based test cases pass correctly with this patch applied. |
Simon Larsén (slarse)
left a comment
There was a problem hiding this comment.
Hello! I took the liberty of analyzing the test case and problem. See way too long comment :)
| let base = b"alpha_x\n\nbravo_x\ncharlie_x\n\n"; | ||
| let ours = b"\n\nbravo_x\ncharlie_x\n"; | ||
| let theirs = b"alpha_x\n\ncharlie_x\n\n"; |
There was a problem hiding this comment.
tl;dr: I think this should be fixed on the diff-algo side rather than merge-algo side.
If we just add another blank line to each of these revisions, you still get a conflict.
| let base = b"alpha_x\n\nbravo_x\ncharlie_x\n\n"; | |
| let ours = b"\n\nbravo_x\ncharlie_x\n"; | |
| let theirs = b"alpha_x\n\ncharlie_x\n\n"; | |
| let base = b"alpha_x\n\n\nbravo_x\ncharlie_x\n\n"; | |
| let ours = b"\n\n\nbravo_x\ncharlie_x\n"; | |
| let theirs = b"alpha_x\n\n\ncharlie_x\n\n"; |
This is down to the naive Myer's diff algorithm doing a greedy longest substring match between revisions s.t. the matching intermediate lines get matched. The fix seems to just try to look "one line back" from the hunk, but that just happens to work because there was precisely one line separating these hunks. My edit here separates the hunks by 2 lines instead, making the conflict reappear.
The base-ours diff here is with base Myers this:
-alpha_x
+
bravo_x
charlie_x
-This is natural in Myers as the intermediate matching section is greedily matched. I've validated this output both with an old Myers implementation I wrote myself a few years ago, and with gix. It doesn't matter how many blank lines separate the hunks, they'll be matched all the same, as evidenced by my above tweak to the test case causing it to fail again. So checking just one line back is an edge case fix for an edge case.
This will always conflict with the removal of bravo_x, which looks like this.
alpha_x
-bravo_x
charlie_xNow, Git's Myers implementation is doing something to prioritize hunk cohesion, because its diff output is different:
-alpha_x
+
bravo_x
charlie_x
-This does not conflict with the removal av bravo_x because there's a buffering matched line between the two hunks. I don't know exactly what optimization Git has here, but the point I'm making is that I think this should be a diff-algorithm fix rather than merge-algorithm fix.
As a final note, I think the test case is a bit misleading as it makes this out to be some issue with blank lines in particular, but it's not. It's an issue with matching lines. Replacing the blank lines with bla has the exact same effect.
| let base = b"alpha_x\n\nbravo_x\ncharlie_x\n\n"; | |
| let ours = b"\n\nbravo_x\ncharlie_x\n"; | |
| let theirs = b"alpha_x\n\ncharlie_x\n\n"; | |
| let base = b"alpha_x\nbla\nbravo_x\ncharlie_x\n\n"; | |
| let ours = b"bla\nbla\nbravo_x\ncharlie_x\n"; | |
| let theirs = b"alpha_x\nbla\ncharlie_x\n\n"; |
There was a problem hiding this comment.
Thanks a lot for looking into this Mattias Granlund (@mtsgrd), and particularly to Simon Larsén (@slarse) for the eager review!
Without having even looked at the details, this makes me think that Git might not run into it because it makes diff-slider adjustments, maybe even to the point where it can avoid conflicts, which would also be rather puzzling to me.
Maybe I try to port the diff over to using imara-diff v0.2 which ships with 95% Git-diff-slider compatibility from what we could tell.
Summary
coalesce_empty_insertions_with_nearest_same_side_hunk()as a pre-processing step after sorting hunks, before the intersection checkfalse_conflict::myers_false_conflict_with_blank_line_ambiguity) — 5/4/4 lines of generic textContext
Reported in #2475. When GitButler amends a commit via
create_tree()→merge_trees(), certain edits produce aCherryPickMergeConflictthatgit merge-fileresolves cleanly.Root cause
Three-way merge of:
base
base → ours (delete
alpha_x, add a blank, remove trailing blank):base → theirs (delete
bravo_x):alpha_x (blank) - bravo_x charlie_x (blank)Myers (imara-diff) diffs base→ours as three hunks:
alpha_xHistogram diffs it as two hunks:
alpha_x→ blankTheirs (both algorithms):
before=2..3 after=2..2(deletebravo_x).The empty insertion at position 2 (Myers hunk #2) collides with theirs' deletion at 2..3 via the empty-range special case in
left_overlaps_right, producing a false conflict.Fix
After sorting hunks by ancestor position, a new
coalesce_empty_insertions_with_nearest_same_side_hunk()pass absorbs empty insertions into their nearest preceding same-side non-empty hunk (gap ≤ 1). This turns Myers hunks #1 + #2 into a single{before=0..2, after=0..2}hunk that no longer overlaps with theirs at position 2.The coalescing is conservative — it only fires when (a) the hunk has an empty before-range, (b) a same-side non-empty hunk exists within 1 base line, and (c) that hunk is the nearest same-side hunk. This avoids affecting cases like
zdiff3-interestingwhere standalone empty insertions represent genuinely different changes.Test plan
myers_false_conflict_with_blank_line_ambiguitypassesgix-mergetest suite passes (22/22), includingrun_baselineandtree::run_baselinegit merge-file -pconfirms clean merge on the same inputsCloses #2475
🤖 Generated with Claude Code