-
-
Notifications
You must be signed in to change notification settings - Fork 939
feat: Add support for Git Blame information #993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I've chosen showBlame because trying to follow the existing naming convetions for the config options, as it's behaviour it's similar to showLineNumbers
output.git.showBlame configuration option
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThis PR adds support for displaying Git blame information in Repomix output. It introduces a new CLI flag Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI as CLI Parser
participant Config as Config Builder
participant FileProcessor as File Processor
participant GitOps as Git Operations
participant GitCmd as Git Command
User->>CLI: --output-show-git-blame flag
CLI->>CLI: Parse option to outputShowGitBlame
CLI->>Config: buildCliConfig(options)
Config->>Config: Merge outputShowGitBlame → config.output.git.showBlame
rect rgb(200, 220, 240)
Note over FileProcessor,GitOps: File Processing Phase
FileProcessor->>FileProcessor: Check config.output.git.showBlame
alt showBlame enabled
FileProcessor->>GitOps: getGitBlame(cwd, filePath)
GitOps->>GitOps: isGitRepository(cwd)
alt Is git repo
GitOps->>GitCmd: execGitBlame(cwd, filePath)
GitCmd->>GitCmd: Execute: git blame --date=short -w
GitCmd-->>GitOps: Raw blame output (stdout)
GitOps->>GitOps: formatGitBlame(output) → [Author Date] Code
GitOps-->>FileProcessor: Formatted blame content
FileProcessor->>FileProcessor: Replace processedContent with blame
else Not a git repo
GitOps-->>FileProcessor: Return null
FileProcessor->>FileProcessor: Use original file content
end
else showBlame disabled
FileProcessor->>FileProcessor: Use original file content
end
end
FileProcessor-->>User: Output with optional blame lines
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @Lautron, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a useful feature to display git blame information in the output. The implementation is well-structured, with new logic for fetching and formatting blame data, along with corresponding configuration options, CLI flags, and tests.
However, I've identified a significant issue in src/core/file/fileProcessContent.ts where enabling showBlame conflicts with other output formatting options like --compress and --output-show-line-numbers. The current processing order causes these features to either fail silently or produce incorrect/redundant output. I've left a detailed comment explaining the problem. Addressing this conflict is crucial for the feature to work correctly with other options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
README.md (1)
621-621: Git-blame docs align with implementation; optional perf noteThe new CLI flag
--output-show-git-blameand config optionoutput.git.showBlameare clearly documented and match the wiring inCliOptionsandbuildCliConfig. You might optionally add a short note that enabling blame can be slower on large repos (since it runsgit blameper file), but the current wording is already accurate and complete.Also applies to: 1211-1211
tests/cli/actions/defaultAction.test.ts (1)
307-314: Test correctly verifies CLI → config mapping for git blameThis test cleanly asserts that
outputShowGitBlameresults inconfig.output.git.showBlame === true, matching the logic inbuildCliConfig. Consider adding a follow-up test that combines this flag with other git options (e.g.,includeDiffs/includeLogs) to ensure all git settings are preserved when merged, but not required for correctness here.src/cli/actions/defaultAction.ts (1)
283-291: Git blame flag wiring is consistent and preserves existing git optionsThe
outputShowGitBlamehandling correctly setsoutput.git.showBlame = truewhile spreadingcliConfig.output?.git, so other git-related flags (includeDiffs,includeLogs, etc.) remain intact. This mirrors the existing patterns for git options and looks solid. If this area grows further, you might consider consolidating all git-related CLI handling into a small helper to keepbuildCliConfigeasier to scan, but it’s not necessary right now.src/core/file/fileProcessContent.ts (1)
28-33: Clarify howshowBlameshould interact withcompressand other transformsRight now, when
config.output.git?.showBlameis true you replaceprocessedContentwith blame output, then still run truncate/removeComments/removeEmptyLines and, crucially, thecompress/parseFilestep on that blame-formatted text. If a user enables both--compressand--output-show-git-blame,parseFileis likely to see non‑valid source and throw, which makes the combination surprising or unusable.It would be good to define and enforce explicit semantics here, for example:
- Either treat blame as final content and skip the
compressbranch (and possibly other structural transforms) whenshowBlameis on, or- Prefer structural compression and skip blame when
compressis enabled, or- Make the two options mutually exclusive at the CLI/config layer.
Whichever behavior you choose, I’d recommend adding a small test to lock it in.
tests/core/git/gitCommand.test.ts (1)
3-4:execGitBlametests exercise both success and failure paths; consider flattening describe hierarchyThe new tests correctly verify the git arguments, returned stdout, and the trace log on error for
execGitBlame, which tightly matches the implementation.For readability, you might move
describe('execGitBlame', ...)to be a top‑level sibling of the other command describes instead of nesting it insidedescribe('execLsRemote', ...), but that’s purely structural.Also applies to: 408-432
src/core/git/gitBlameHandle.ts (1)
20-20: Consider using non-greedy matching for the author capture group.The regex uses greedy
(.+)to capture the author name. While this works due to backtracking, using non-greedy(.+?)would more explicitly convey the intent to match "up to" the date pattern. This improves regex clarity and predictability.Additionally,
[\^]?can be simplified to\^?(the character class is unnecessary for a single escaped character).Apply this diff for improved regex clarity:
- const blameLineRegex = /^[\^]?[a-f0-9]+\s*\((.+)\s+(\d{4}-\d{2}-\d{2})\s+\d+\)(.*)$/i; + const blameLineRegex = /^\^?[a-f0-9]+\s*\((.+?)\s+(\d{4}-\d{2}-\d{2})\s+\d+\)(.*)$/i;
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
README.md(2 hunks)src/cli/actions/defaultAction.ts(1 hunks)src/cli/cliRun.ts(1 hunks)src/cli/types.ts(1 hunks)src/config/configSchema.ts(2 hunks)src/core/file/fileProcessContent.ts(2 hunks)src/core/git/gitBlameHandle.ts(1 hunks)src/core/git/gitCommand.ts(1 hunks)tests/cli/actions/defaultAction.test.ts(1 hunks)tests/cli/actions/workers/defaultActionWorker.test.ts(1 hunks)tests/config/configSchema.test.ts(2 hunks)tests/core/file/fileProcessContent.test.ts(3 hunks)tests/core/git/gitBlameHandle.test.ts(1 hunks)tests/core/git/gitCommand.test.ts(2 hunks)tests/core/metrics/calculateGitDiffMetrics.test.ts(1 hunks)tests/core/metrics/calculateGitLogMetrics.test.ts(1 hunks)tests/core/output/flagFullDirectoryStructure.test.ts(1 hunks)tests/core/output/outputStyles/jsonStyle.test.ts(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
tests/cli/actions/defaultAction.test.ts (1)
src/cli/actions/defaultAction.ts (1)
buildCliConfig(114-306)
src/core/git/gitCommand.ts (1)
src/shared/logger.ts (1)
error(34-38)
src/core/file/fileProcessContent.ts (1)
src/core/git/gitBlameHandle.ts (1)
getGitBlame(48-73)
tests/core/git/gitBlameHandle.test.ts (1)
src/core/git/gitBlameHandle.ts (1)
getGitBlame(48-73)
src/core/git/gitBlameHandle.ts (1)
src/shared/logger.ts (1)
error(34-38)
tests/core/file/fileProcessContent.test.ts (3)
src/core/git/gitBlameHandle.ts (1)
getGitBlame(48-73)src/config/configSchema.ts (1)
RepomixConfigMerged(159-159)src/core/file/fileProcessContent.ts (1)
processContent(22-73)
tests/core/git/gitCommand.test.ts (1)
src/core/git/gitCommand.ts (1)
execGitBlame(213-228)
🔇 Additional comments (15)
tests/core/output/flagFullDirectoryStructure.test.ts (1)
34-35: Mock config updated to new git.showBlame fieldAdding
showBlame: falseto the mockgitconfig keeps this fixture aligned with the expandedRepomixConfigMergedschema without affecting test behavior.tests/core/metrics/calculateGitDiffMetrics.test.ts (1)
52-53: Config fixture correctly extended with git.showBlameThe
mockConfig.output.gitobject now includesshowBlame: false, matching the updated config schema while leaving the diff-metrics behavior unchanged.src/cli/types.ts (1)
30-31: CliOptions extended cleanly with outputShowGitBlameThe new
outputShowGitBlame?: booleanfield matches the CLI flag name and is consumed correctly inbuildCliConfig; no issues here.tests/cli/actions/workers/defaultActionWorker.test.ts (1)
62-63: Worker config fixture aligned with new git.showBlame fieldIncluding
showBlame: falsein the mockgitconfig keeps this test’sRepomixConfigMergedshape up to date with the schema while not affecting the worker behavior being asserted.tests/config/configSchema.test.ts (1)
122-125: Schema tests correctly cover new git.showBlame fieldAdding
showBlame: falseto the valid configs for bothrepomixConfigDefaultSchemaandrepomixConfigMergedSchemaensures the updated git config shape is explicitly exercised and preserved through parsing. This is a good, minimal extension of the existing tests.Also applies to: 228-231
tests/core/file/fileProcessContent.test.ts (2)
6-25: Git blame mocking setup is consistent and isolatedImporting and mocking
getGitBlame, then defaulting it to resolvenullinbeforeEach, keeps existing tests unaffected while allowing per-test overrides. This is a clean pattern for the new dependency.
182-225: New git blame tests cover happy path and fallback correctlyBoth cases—blame present and blame
null—are exercised with propercwdandoutput.git.showBlamesetup, and you assert bothgetGitBlameusage and the fallback to original content.tests/core/output/outputStyles/jsonStyle.test.ts (1)
28-35: Mock config update forshowBlamematches schema expectationsAdding
showBlame: falseto thegitblock keeps this test fixture in sync with the config schema defaults without altering existing assertions.src/cli/cliRun.ts (1)
140-140: New--output-show-git-blameoption is clearly named and placedThe flag name and description are consistent with existing output options, and its placement under "Repomix Output Options" matches its purpose.
src/core/git/gitCommand.ts (1)
206-228:execGitBlameimplementation is safe and matches optional-blame semanticsExecuting
git blameviaexecFileAsyncwith argument arrays avoids shell injection, and logging then returning''on failure aligns with treating blame as best‑effort data that shouldn’t break processing.src/config/configSchema.ts (1)
44-52:showBlameis wired consistently into base and default git config schemasAdding
showBlameas optional in the base schema and defaulting it tofalsein the default schema keeps configuration parsing consistent and ensures existing configs remain valid.Also applies to: 105-112
tests/core/metrics/calculateGitLogMetrics.test.ts (1)
46-53: Metrics test git config kept in sync viashowBlame: falseIncluding
showBlame: falsein the mock git config mirrors the real schema and avoids drift between tests and configuration without altering test logic.tests/core/git/gitBlameHandle.test.ts (1)
1-71: LGTM! Comprehensive test coverage.The test suite thoroughly covers all main scenarios:
- Happy path with proper blame output formatting
- Non-git repository early exit optimization
- Empty blame output handling (untracked files)
- Error handling with appropriate logging
The use of dependency injection via the
depsparameter enables clean mocking and all assertions are appropriate.src/core/git/gitBlameHandle.ts (2)
48-73: LGTM! Well-structured error handling and dependency injection.The function correctly:
- Uses dependency injection for testability
- Checks if the directory is a git repository before executing blame
- Handles errors gracefully by returning
nulland logging at trace level- Maintains consistent behavior across different failure scenarios
20-20: Verify git blame date format compatibility between regex pattern and execGitBlame output.The regex pattern expects
YYYY-MM-DDwithout time and timezone, but standardgit blameoutputsYYYY-MM-DD HH:MM:SS +ZONE. Confirm thatexecGitBlameingitCommand.tsapplies date formatting flags (e.g.,--date=short) to match the expected pattern, or update the regex if the command produces full datetime output.
removeEmptyLines. If the user uses them simultaneously it shows a warning.
|
I've fixed Gemini comment about interaction with /gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable feature for displaying Git Blame information in the output. The implementation is well-rounded, including updates to the CLI, configuration, core logic, and comprehensive tests. I appreciate the thoughtful handling of incompatible options, which improves the user experience by providing clear warnings.
My primary feedback focuses on enhancing the robustness of the Git Blame output parsing. The current approach uses regular expressions on human-readable output, which can be brittle. I've provided suggestions to switch to the machine-readable --porcelain format. This change will make the feature more reliable and resilient to edge cases like unconventional author names.
| function formatGitBlame(blameOutput: string): string { | ||
| // Regex breakdown: | ||
| // ^[a-f0-9]+ - Matches the commit hash at the start of the line (e.g., a2d3c5fed) | ||
| // \s* - Matches optional whitespace | ||
| // \((.+) - Group 1: Matches the AUTHOR part (e.g., yamadashy, Kazuki Yamada) | ||
| // \s+ - Matches one or more spaces | ||
| // (\d{4}-\d{2}-\d{2}) - Group 2: Matches the DATE part (e.g., 2025-02-17) | ||
| // \s+ - Matches one or more spaces | ||
| // \d+\) - Matches the line number inside the parenthesis (e.g., 1) | ||
| // (.*) - Group 3: Matches the CODE LINE part, including leading space | ||
| const blameLineRegex = /^[\^]?[a-f0-9]+\s*\((.+)\s+(\d{4}-\d{2}-\d{2})\s+\d+\)(.*)$/i; | ||
|
|
||
| const lines = blameOutput.split('\n'); | ||
| const formattedLines: string[] = []; | ||
|
|
||
| for (const line of lines) { | ||
| const match = line.match(blameLineRegex); | ||
| if (match) { | ||
| const [, rawAuthor, date, codeLineRaw] = match; | ||
| const author = rawAuthor.trim(); | ||
| const codeLine = codeLineRaw.startsWith(' ') ? codeLineRaw.substring(1) : codeLineRaw; | ||
|
|
||
| formattedLines.push(`[${author} ${date}] ${codeLine}`); | ||
| } else if (line.trim().length > 0) { | ||
| formattedLines.push(`[N/A N/A] ${line.trim()}`); | ||
| } | ||
| } | ||
|
|
||
| return formattedLines.join('\n'); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation uses a regular expression to parse the human-readable git blame output. This approach is fragile and may fail if author names contain unexpected characters or formats that conflict with the regex pattern.
To make parsing more robust and reliable, it's best practice to use the --porcelain format, which is specifically designed for machine consumption. I've provided a new implementation of formatGitBlame below that parses this format. It's more resilient and correctly handles various edge cases, such as multi-line blame groups.
This change should be paired with an update to execGitBlame to use the --porcelain flag instead of --date=short.
function formatGitBlame(blameOutput: string): string {
const lines = blameOutput.split('\n');
const formattedLines: string[] = [];
let currentAuthor = 'N/A';
let currentDate = 'N/A';
for (const line of lines) {
if (!line) continue;
// Porcelain format provides metadata blocks.
// A line with a 40-char hash is a header. Metadata may follow.
if (/^[a-f0-9]{40}/.test(line)) {
// This is a header line. We don't need to parse it, but it signals a new block might start.
// The state (author/date) will be updated if new metadata is found.
continue;
}
if (line.startsWith('author ')) {
currentAuthor = line.substring('author '.length);
continue;
}
if (line.startsWith('author-time ')) {
const timestamp = parseInt(line.substring('author-time '.length), 10);
if (!isNaN(timestamp)) {
currentDate = new Date(timestamp * 1000).toISOString().split('T')[0];
}
continue;
}
// A line starting with a tab is the actual code content.
if (line.startsWith('\t')) {
const codeLine = line.substring(1);
// Avoid a trailing space for empty lines.
const formattedLine = codeLine ? `[${currentAuthor} ${currentDate}] ${codeLine}` : `[${currentAuthor} ${currentDate}]`;
formattedLines.push(formattedLine);
}
// Any other metadata lines (committer, summary, etc.) are ignored.
}
return formattedLines.join('\n');
}| }, | ||
| ): Promise<string> => { | ||
| try { | ||
| const result = await deps.execFileAsync('git', ['-C', directory, 'blame', '--date=short', '-w', filePath]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure robust parsing of the git blame output, it's better to use the --porcelain format, which is designed for machine consumption. This format is more stable and less ambiguous than the human-readable default, especially with complex author names.
The --date=short option is not compatible with --porcelain and is no longer needed, as the porcelain format provides a UNIX timestamp that can be formatted as required.
| const result = await deps.execFileAsync('git', ['-C', directory, 'blame', '--date=short', '-w', filePath]); | |
| const result = await deps.execFileAsync('git', ['-C', directory, 'blame', '--porcelain', '-w', filePath]); |
Why did I implement this feature?
I recently realized how useful it would be to see Git Blame information right in the repomix output when reviewing my own contributions across different repositories. I figured others might find this helpful too, so I created this PR to share it!
Description
This change simply adds the ability to include Git Blame details (author and date) for every line in the packed output. This makes it much easier to understand the history and context of the code directly within the generated file.
Changes:
--output-show-git-blameCLI flag.output.git.showBlameconfiguration option.getGitBlamecore logic to format output as[<Author> <Date>] <Code>.fileProcessContentto integrate blame data when enabled.README.mdwith new flag and config optionTests added
tests/core/git/gitBlameHandle.test.tsgetGitBlamereturns correctly formatted author and date strings for valid git repositories.getGitBlamereturnsnulland skips processing when the directory is not a git repository.getGitBlamereturnsnullfor files producing empty blame output, such as untracked files.getGitBlamehandles execution errors gracefully by returningnulland logging the error.tests/core/git/gitCommand.test.tsexecGitBlamecorrectly invokes the git binary with arguments for short dates and ignoring whitespace.execGitBlamereturns an empty string and logs a trace if the git command fails.tests/core/file/fileProcessContent.test.tsshowBlameis enabled.How to try it
npm run repomix -- --output-show-git-blame --include "src/index.ts"Example output
Checklist
npm run testnpm run lint