A Python script that gathers metadata for all repositories in a GitHub organization and automatically exports the data into a desired Google Sheet (using a Google Cloud Console Service Account) for easy viewing and analysis.
- Features
- Usage
- Set up your own GitHub Actions workflow
- Run repo exporter locally
- Important Notes
- Testing
- Fetches all repositories in an organization
- Collects key details:
- Repo visibility, name and description
- Date created and last updated
- Creator and top 4 contributors (
N/Acreator means it was either a transferred repository or a forked repository andNone (<GitHub Username>)means there was no full name attached to their github account) - Number of stars and number of branches
- README, license,
.gitignore, package requirements (requirements.txt,environment.yaml, etc.),CITATION.cff,.zenodo.jsonandCONTRIBUTING.mdfiles presence - Primary Programming Language
- Website Reference, Dataset, Model, Paper Association, DOI for GitHub Repo presence
- Exports everything to a given Google Sheet document that it will require Editor permission to on the sheet's sharing permissions list
- For Standard Files highlights No data cell values with red cell colors and for Recommended Files and Filters highlights No data cell values with orange cell colors
The workflow runs automatically each week (9am UTC on Mondays); however, you can also run the GitHub Actions workflow manually:
- Go to the Actions tab
- Click Update Metadata for GitHub Repository Sheet
- Click Run workflow, with branch as Branch: main, with selection all and finally press Run workflow
To use this script within your own GitHub organization, first fork this repo, then follow the setup steps below to ensure proper access.
To create one with permissions for both private and public repositories (public repository read-access only is enabled by default without adminstrator approval):
- Go to github.com/settings/personal-access-tokens
- Click Generate new token → Fine-grained token
- Under Resource owner, select the organization you want to access.
- Under Repository access, choose All repositories.
- Under Permissions select Repositories and set:
- Metadata -> Read-only
- Contents -> Read-only
- Adminstration -> Read-only
- Click Generate token and copy it (make sure to store it somewhere safe for future use).
- Navigate to
https://github.com/<gh-org-name>/repo-exporter/settings/secrets/actionsand click New repository secret and name it GH_TOKEN and copy paste the token into the Secret section and click Add secret Note: The token must be approved by the organization administrator before accessing private repositories.
Instructions to create a Google Cloud Console Service Account and give it permission to use in the repository and in the Google sheet:
- Go to https://console.cloud.google.com/
- Under "IAM & Admin", create a new project and name it inventory
- Go to https://console.cloud.google.com/iam-admin/serviceaccounts, if you have multiple projects you'll need to select the project that you just made if it hasn't already been selected
- Create a service account, named Imageomics, with description: "Repo checklist automation account" and finally press Done (You do not need to add any Permissions or Principals with access)
- Click on the service account email -> Keys -> Add key -> Create new key and select JSON then finally click Create
- Go to
https://github.com/<gh-org-name>/repo-exporter/settings/secrets/actionsand click New repository secret and name it GOOGLE_SERVICE_ACCOUNT_JSON and copy paste the entire contents of the JSON file into the Secret section and click Add secret - Go to https://console.cloud.google.com/apis/library/sheets.googleapis.com and enable the Google Sheets API for the project you made
- Go to your chosen Google Sheet and go to Share settings and add the new Service Account email you made and set it as an Editor
Now update the script with your GitHub Organization name and the desired spreadsheet ID, then the script can be run through the GitHub Actions workflow by following the Usage Instructions for your repository.
-
Clone this repository:
git clone https://github.com/Imageomics/repo-exporter.git cd repo-exporter -
Create and activate Conda environment:
conda create -n repo-exporter python -y conda activate repo-exporter -
Add required environment variables into your Conda environment and reload environment:
conda env config vars set GH_TOKEN="<your-token-here>" conda env config vars set GOOGLE_CREDENTIALS_PATH="/path/to/service_account.json" conda deactivate conda activate repo-exporter -
Install Python dependencies:
pip install -r requirements.txt -
Run the program
python export_repos.py
Key edits to ensure the script functions properly for your organization:
- You must enter your specific GitHub Organization Name under Config settings at the top of the Python script file (for example,
Imageomics) - You must enter your specific Google Sheet ID under Config settings at the top of the Python script file (for example, if the URL is
https://docs.google.com/spreadsheets/d/15BQimTjaOyo-jeaJRcg1Hia-9ORcilj3Jx-ks-uGyoc/edit?gid=0#gid=0, then15BQimTjaOyo-jeaJRcg1Hia-9ORcilj3Jx-ks-uGyocis the Google Sheet ID) - You must enter your specific Google Sheet Section Name. This can be found at the bottom of your Google Sheet (for example,
Sheet1)
Follow the local install instructions, then run the following in your repo-exporter environment:
python -m pytest -q