Skip to content

Conversation

@HimasreeKolathur24
Copy link
Contributor

Description

  • This PR adds an initial implementation of the CHAOSS Bot Activity metric API.
  • The metric measures the volume of automated activity by identifying bot-authored commits and comparing them with human commits.
  • The implementation follows existing metric patterns in repo_meta.py and is scoped to commit activity for this first version.

This PR fixes #2594

Notes for Reviewers

  • This is an initial (MVP) implementation focused on commit-based bot activity.
  • Bot identification is based on contributor login patterns (case-insensitive match on bot), consistent with existing Augur practices.
  • The metric was validated via static analysis and local Docker startup. The Augur service did not expose HTTP endpoints in the local environment, but no runtime or import errors were observed.
  • The metric is designed to be easily extended in the future to include issues, pull requests, or time-series outputs.

Signed commits

  • Yes, I signed my commits.

Signed-off-by: HimasreeKolathur24 <[email protected]>
SELECT
SUM(
CASE
WHEN LOWER(cn.cntrb_login) LIKE '%bot%' THEN 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id be curious if this is the best way to do bot detection. I know 8Knot has filters for this so id want to compare how that project does bot detection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this initial implementation, I used a simple heuristic (LOWER(cntrb_login) LIKE '%bot%') to align with a minimal MVP and patterns I’ve seen in other Augur metrics.

I agree this may not be the most robust approach. I’m not yet familiar with 8Knot’s bot filtering logic, but I’d be happy to review how bot detection is handled there and adjust this metric to better align with that approach if you think it would be preferable here.

Would you recommend:

  • reusing a similar filter list / logic from 8Knot, or
  • keeping this as a simple first pass and iterating in a follow-up?

Happy to update based on your guidance.

""")
params = {"repo_id": repo_id}

else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these queries mostly the same except for the final join and WHERE statments?

Can we maybe build these queries with SQLAlchemy or string concatenation so the parts that are the same between these two codepaths can be reused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right. The two queries share most of their structure, with the main differences being the final join and the WHERE clause depending on whether repo_id is provided.

Refactoring the shared parts makes sense. I can:

  • extract the common SELECT / FROM portion into a base query, and
  • append the repo-specific filtering conditionally (either via string composition or SQLAlchemy constructs).

I initially kept them separate for clarity, but I’m happy to refactor this to reduce duplication if that’s preferred here. Let me know if you have a style preference between SQLAlchemy-based composition vs string concatenation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bot Activity metric API

2 participants