Skip to content

GitHub author grouping (#2022)#2081

Open
daveoconnor wants to merge 2 commits intodevelopfrom
doc/2022-github-author-grouping
Open

GitHub author grouping (#2022)#2081
daveoconnor wants to merge 2 commits intodevelopfrom
doc/2022-github-author-grouping

Conversation

@daveoconnor
Copy link
Collaborator

This PR is related to ticket #2022.

Adds a new contribution app which records git commits, pr contributions, and issue contributions. The result ended up looking different to the design because email addresses were only available for commits.

For now there's a question around repos with multiple libraries. For commits we duplicate them which matches what the existing commit table does. For PR and Issue contributions they're on a per repo basis because there was no constraint issue there and no preexisting data.

With a github personal access token the management command takes up to 6 hours to run, which is 5 minutes of actual processing and 55 minutes of GraphQL rate limit sleep. That may be reduced with tokens which have higher access rates. This should be considered when setting up the celery task.

I've tried to optimize the calls as much as possible using Etags for commits and tweaking the GraphQL queries. Notes about this are documented with the queries.

Once the identity merging work is completed there'll need to be a change to allow users to claim identities.

This PR also fixes some issues with coverage tests.

Copy link
Collaborator

@gregjkal gregjkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, with a handful of minor comments. I have some broader thoughts on efficiency, but will take those as future optimization opportunities.

@daveoconnor daveoconnor force-pushed the doc/2022-github-author-grouping branch from 850a4f4 to 7590a6d Compare February 20, 2026 20:36
@daveoconnor
Copy link
Collaborator Author

@gregjkal PR review responded to and adjusted - I'll leave this one to you to merge, it'll need to be run manually on the server anyway with ./manage.py update_contributions and then a task added at a later date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants