Skip to content

Refactor: stable per-org locality lookup shared across all three resolution entry paths #165

Description

@coderabbitai

Background

In PR #164 (feat/ingestion-pipeline), the seed script and on-demand resolver were updated to pass townCity/county through to resolveOneSponsor for locality tiebreaking. Both currently use asc(hmrcSkilledWorkers.id) as the terminal tiebreak in their DISTINCT ON / representative-row queries.

Because ingest-hmrc-csv.ts recreates hmrc_skilled_workers on every ingest and reassigns id from current CSV order, a reorder-only feed change can select a different townCity/county for the same organisation and influence resolveOneSponsor outcomes. The measured exposure is narrow (≈40 of 126,528 orgs, 0.03%) and the resolver is fail-closed (locality only breaks ties between same-score candidates; a flip cannot select a wrong entity), but the instability should be fixed properly.

Proposed fix

Extract a single shared locality-lookup helper in apps/web/src/lib/hmrc-ch/ (e.g. get-org-locality.ts) used by all three sites:

  • apps/web/src/lib/phase5/sql.tsmakeLookupSponsor
  • apps/web/src/api/companiesHouse.ts — on-demand resolver
  • apps/web/scripts/seed-companies-house.ts — seed DISTINCT ON query

The ordering clause must use a stable, ingest-invariant tiebreak with a unique terminal key:

ORDER BY town_city ASC NULLS LAST, county ASC NULLS LAST, hash ASC

(hash is content-derived from org|rating|route, unique per row, and ingest-stable.)

All three sites must be updated together — changing only a subset desynchronizes locality selection across entry paths.

Acceptance criteria

  • Shared locality helper extracted to apps/web/src/lib/hmrc-ch/
  • makeLookupSponsor uses the shared helper / stable ordering
  • On-demand resolver (companiesHouse.ts) uses the shared helper / stable ordering
  • Seed script DISTINCT ON ordering mirrors the stable clause
  • No id column used as tiebreak in any locality query

References

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions