Skip to content

[Bug][Config-UI] No warning when the same repository is added under multiple connections, causing duplicate domain-layer records #8947

Description

@mfrancisc

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

When the same physical repository is added to DevLake under more than one connection - which the UI currently allows without any warning - every entity collected from that repository (pull requests, issues, board associations, repo-commit links) is stored as a separate record for each connection.
For example a repository connected via four connections, every pull request appears four times in the pull_requests table with four different primary keys. Any metric computed over these tables - PR count, cycle time, throughput, DORA lead time - is inflated by the number of connections pointing at the same repo.
The UI gives no indication that this configuration will produce duplicate data. Users following the suggested multi-connection workaround issue#7684 are silently creating corrupted metrics.

Verification: All duplicate records share the same url field (e.g., https://github.com/owner/repo/pull/123). Running the following query confirms the problem:

SELECT url, COUNT(*) as copies
FROM pull_requests
GROUP BY url
HAVING COUNT(*) > 1
ORDER BY copies DESC;

What do you expect to happen

When a user adds a repository scope that is already registered under a different connection (detected by matching html_url / clone_url across connections), the UI should display a clear warning before the user saves, for example:
"This repository is already connected via Connection 'GitHub Production'. Collecting it here will create duplicate pull requests and issue records, which will inflate all metrics for this repository."

The warning should not block the action - there are legitimate reasons to have the same repository under multiple connections (different scope configs, different team tokens). But the user should be able to make an informed choice.

Additionally, a backend diagnostics endpoint would help existing installations detect the problem:

GET /api/scope-duplicates

Returns a list of repository URLs that appear under more than one connection, along with the affected connection IDs, so administrators can audit and clean up existing configurations.

How to reproduce

  1. Add the same GitHub repository to DevLake under two different connections.
  2. Run blueprints for both connections.
  3. Query pull_requests grouped by url - every PR will appear twice.
  4. Note that at no point during configuration does the UI warn about this.

Anything else

Proposed implementation

Backend - one new API handler that queries _tool_github_repos (and equivalent tables for other plugins) grouped by html_url, returning repos that appear under more than one connection:

GET /api/plugins/github/scope-duplicates

Config-UI - when a user selects a repository scope in the blueprint or connection wizard, call the endpoint and render a dismissible warning banner if the selected repo URL is already registered elsewhere.

Additional context
This issue affects all data-source plugins that support multiple connections to the same platform instance (GitHub, GitLab, Bitbucket, etc.).
A related workaround exists: deduplicating views over the domain tables using url as a natural key. We are willing to contribute that as a stopgap alongside the UI fix if it would be useful to the project ( see : konflux-ci#106 )

Version

main

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Labels

type/bugThis issue is a bug

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions