Skip to content

feature: allow for agnostic git backends#187

Open
derekbarbosa wants to merge 7 commits into
sashiko-dev:mainfrom
derekbarbosa:feature/custom-forge
Open

feature: allow for agnostic git backends#187
derekbarbosa wants to merge 7 commits into
sashiko-dev:mainfrom
derekbarbosa:feature/custom-forge

Conversation

@derekbarbosa
Copy link
Copy Markdown
Collaborator

Multi-Forge Integration via ForgeProvider Trait

Patches are currently ingested exclusively via NNTP (mailing list).
This patch adds a ForgeProvider trait that abstracts webhook-based
ingestion from code forges, with built-in implementations for GitHub and
GitLab:

  • ForgeProvider trait: validate_event(), parse_payload() — any forge
  • that speaks webhooks can implement this interface.
  • ForgeRegistry: runtime registry of providers, pre-loaded with GitHub
  • and GitLab. Third-party providers can be registered at startup.
  • ForgeMetadata: normalized struct (repo_url, base_sha, head_sha,
  • pr_number, pr_title, pr_url) — backend-agnostic representation of
  • a PR/MR event.
  • ForgeSettings in Settings.toml: enabled, provider, webhook_secret,
  • api_token.
  • Webhook endpoint (/api/webhook/:provider): receives forge events, validates
  • via the registered provider, extracts commit range, and feeds patches into
  • the existing review pipeline.
  • DB schema additions: mr_url, mr_title, mr_number, slug columns on
  • patchsets table for forge metadata tracking.

The forge layer is entirely opt-in. When [forge] enabled = false (default),
no webhook routes are registered and the NNTP pipeline is unchanged.

@rgushchin
Copy link
Copy Markdown
Member

Hey! I asked Gemini to review this PR and it seems to be very concerned :) It makes me too. Db safety is top priority. Please, take a look. Overall stability > velocity at this point, Sashiko is actively used by the kernel community and just keep it afloat is p0.

🚨 Critical Regression Risks

  1. Database Migration Crash on Startup (Existing Installs)
    There is a major issue with how the new slug column is handled in existing databases:
  • In src/schema.sql, the PR adds CREATE UNIQUE INDEX IF NOT EXISTS idx_patchsets_slug ON patchsets(slug) WHERE slug IS NOT NULL;.
  • In src/db.rs, Database::migrate() executes schema.sql before it runs manual migrations (try_add_column).
  • Because of this order, existing databases will crash on startup: SQLite will attempt to create an index on the slug column before it exists, throwing a no such column: slug error.
  • Furthermore, the PR completely missed adding self.try_add_column("patchsets", "slug", "TEXT").await; to db.rs. So even if schema.sql didn't crash, subsequent SELECT queries that now
    expect the slug column will fail.
  1. Mutual Exclusion of Ingestors (Loss of Functionality)
    In src/main.rs, the PR modifies the startup logic to disable the Lore/NNTP ingestor completely if forge integration is enabled:

1 let ingestor_handle = if !settings.forge.enabled {
2 // Starts Lore/NNTP Ingestor
3 } else {
4 info!("Forge integration is enabled. Lore/NNTP ingestor is disabled.");
5 // Spawns a pending future
6 };
This means a user cannot simultaneously monitor mailing lists and accept GitHub/GitLab webhooks. If a user enables forge features, their mailing list integration will silently stop
working.

⚠️ Moderate Risks & Behavior Changes

  1. Subsystem Mapping Fallback Logic
    In src/lib.rs -> identify_subsystems(), the logic was updated to use custom mapping rules from Settings.toml. However, the code uses:

1 if !matched && mapping.is_empty() {
2 // Fallback to default linux-kernel, netdev, etc. checks
3 }
If a user specifies even one custom subsystem mapping in their configuration, mapping.is_empty() evaluates to false, and the default fallback logic is bypassed completely. Users adding a
custom rule might be surprised when Sashiko stops identifying standard lists like LKML.

@derekbarbosa
Copy link
Copy Markdown
Collaborator Author

Hey!

Because of this order, existing databases will crash on startup: SQLite will attempt to create an index on the slug column before it exists, throwing a no such column: slug error.
Furthermore, the PR completely missed adding self.try_add_column("patchsets", "slug", "TEXT").await; to db.rs. So even if schema.sql didn't crash, subsequent SELECT queries that now
expect the slug column will fail.

Interesting that the pr-review didn't catch this bug in the first few iterations of the RFC! I will run some reviews of my own and attempt to test this behavior.

Would you mind following up on #128? It would allow me to test this with a bit more certainty 😅

Otherwise I can just implement the proposed fixes and re-run the reviews.

Mutual Exclusion of Ingestors (Loss of Functionality)
In src/main.rs, the PR modifies the startup logic to disable the Lore/NNTP ingestor completely if forge integration is enabled
This means a user cannot simultaneously monitor mailing lists and accept GitHub/GitLab webhooks. If a user enables forge features, their mailing list integration will silently stop
working.

Is this actually critical? I would imagine a user spinning up a sashiko instance to track a github repository isn't particularly interested in also tracking NNTP. If it is to be desired though, it is an easy enough fix.

If a user specifies even one custom subsystem mapping in their configuration, mapping.is_empty() evaluates to false, and the default fallback logic is bypassed completely. Users adding a
custom rule might be surprised when Sashiko stops identifying standard lists like LKML.

At first, this was intended behavior to prevent any collisions from happening with custom subsystem mappings. I realize now that it needs to be reworked a bit.

Thanks for the review! v2 coming soon :-)

Add slug, mr_url, mr_title, and mr_number columns to the patchsets
table to support forge-based (GitHub/GitLab) merge request tracking.
Extend PatchsetRow, all SELECT queries, and create_fetching_patchset
with the new fields. Add get_patchset_details_by_slug and
get_patchset_summary_by_slug lookup methods. Add mr_url, mr_title,
and mr_number fields to Event::PatchSubmitted and ParsedArticle.

Signed-off-by: derekbarbosa <derekasobrab@gmail.com>
Add configuration structs for multi-forge integration and subsystem
mapping. Add ProjectSettings, ForgeSettings, and SubsystemsSettings
types with defaults. Add [project], [forge], and [subsystems] sections
to Settings.toml.

Signed-off-by: derekbarbosa <derekasobrab@gmail.com>
Add ForgeProvider trait with GitHubForge and GitLabForge
implementations for webhook handling. Add ForgeRegistry for
provider lookup. Add extract_repo_name_from_url and
extract_repo_name_from_mr_url utility functions. Add bytes
dependency for webhook body parsing.

Signed-off-by: derekbarbosa <derekasobrab@gmail.com>
Extend AppState with Settings and ForgeRegistry. Change run_server
to accept Arc<Settings> instead of ServerSettings. Add GET /api/config
endpoint returning project metadata. Add POST /api/webhook/{provider}
for forge-agnostic webhook handling via ForgeRegistry trait dispatch.
Add slug-based patchset lookup in GET /api/patch handler. Add MR
metadata fields to FetchRequest for forge-sourced patches.

Signed-off-by: derekbarbosa <derekasobrab@gmail.com>
Add gitlab_token and mr_metadata fields to FetchAgent. Thread
mr_url, mr_title, mr_number through extract_patch into
Event::PatchSubmitted. Add commit range splitting for base..head
ranges from forge webhooks. Inject GitLab auth tokens via oauth2
URL format in ensure_remote. Add safe.bareRepository=all to all
git subprocess calls for bare repo compatibility.

Signed-off-by: derekbarbosa <derekasobrab@gmail.com>
Add conditional ingestor startup (disabled when forge is enabled).
Thread mr_url, mr_title, mr_number through Event::PatchSubmitted
into ParsedArticle and process_parsed_article. Add configurable
subsystem mapping via regex patterns. Add identify_subsystems_from_paths
for file-path-based subsystem detection. Use MR title as patchset
subject for forge-sourced patches. Move email policy loading
outside the DB worker loop.

Signed-off-by: derekbarbosa <derekasobrab@gmail.com>
@derekbarbosa derekbarbosa force-pushed the feature/custom-forge branch from 0bfe206 to 96d7854 Compare May 15, 2026 01:27
Add setup guides for GitHub and GitLab forge integration. Add webhook
testing and configuration validation scripts. Update README with forge
integration features and configuration sections. Update service
skeleton routing and landing page for webhook endpoints.

Signed-off-by: derekbarbosa <derekasobrab@gmail.com>
@derekbarbosa derekbarbosa force-pushed the feature/custom-forge branch from 96d7854 to 3376367 Compare May 15, 2026 01:32
@derekbarbosa
Copy link
Copy Markdown
Collaborator Author

V1->V2:

  • Addressed review comments
  • Additional tests around nntp disable/enable, custom subsystem identification, forge settings, etc
  • Refactored regex::Regex::new compilation for performance.
  • Replaced .unwrap() calls with if_let patterns in the mr_title extraction path for git forge
  • Rebased onto latest main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants