Skip to content

Multi-country AED import pipeline with NDJSON cache and data source management#115

Merged
vgpastor merged 14 commits intomainfrom
claude/research-defibrillator-data-spain-UWU1W
Mar 15, 2026
Merged

Multi-country AED import pipeline with NDJSON cache and data source management#115
vgpastor merged 14 commits intomainfrom
claude/research-defibrillator-data-spain-UWU1W

Conversation

@vgpastor
Copy link
Contributor

@vgpastor vgpastor commented Mar 13, 2026

Summary

Full multi-country AED import infrastructure: REST API adapter with configurable pagination,
ISO 3166-2 region codes, multi-country parsers (FR/DE/AT), NDJSON cache for chunked sync
resume, AED access points model, and data source management UI.

Key Changes

Import Pipeline

  • RestApiAdapter: Generic REST API adapter with 4 pagination strategies (offset, page, cursor, none)
  • Multi-country parsers: France (GeoJSON), Germany (HTML schedules), Austria (Vienna addresses)
  • NDJSON cache: Compressed cache in PostgreSQL — avoids re-downloading all records on every chunk resume (171k FR dataset: from ~29 GB network I/O to ~4 GB)
  • Sync bugfixes: syncStartTime preserved on resume (disappearance detection), cache-on-miss for legacy jobs

Data Model

  • Country/Region support: country_code on Aed + ExternalDataSource, ISO 3166-2 region codes
  • AedAccessPoint: Structured emergency access data (door codes, floor, landmarks)
  • Map optimization: Clustering support for 3M+ point datasets

Data Source Management

  • Full CRUD edit form with template loading from existing sources
  • Type validation on PUT endpoint
  • Seed script for 14 data sources (ES/FR/DE/AT/CH/BE/UY)
  • Scheduled sync support with cron processing

Infrastructure

  • Docker services: libpostal (address parsing) + nominatim (geocoding)
  • 9 safe database migrations (no destructive operations)

Documentation

  • European AED open data sources research (pan-European + country-by-country)
  • @batchactions feature request for source data persistence in StateStore

Test Plan

  • Unit tests pass (npm run test:unit)
  • France sync resumes from NDJSON cache (no re-download)
  • Data source CRUD works in admin UI
  • New country imports (FR/DE/AT) process correctly

🤖 Generated with Claude Code

claude added 4 commits March 12, 2026 15:43
Comprehensive inventory of publicly available defibrillator datasets
organized by autonomous community, including:
- 6 confirmed downloadable sources (Madrid, Castilla y León, Cataluña,
  Euskadi, Castellón, Sant Boi)
- OpenStreetMap as complementary source with Overpass API query
- 13 CCAA without open data but with mandatory registries
- Recommended 3-phase import strategy
- Minimum field mapping for import compatibility

https://claude.ai/code/session_01B3ye6vQimVkFS5uqqe1wJD
Comprehensive inventory of AED/defibrillator open data sources across
Europe covering 18+ countries, including:

- 7 pan-European initiatives (OSM, OpenAEDMap, EENA, DEFIBMAP, etc.)
- Country-by-country analysis: France (Géo'DAE, best-in-class),
  UK (The Circuit), Germany, Italy, Netherlands, Belgium, Denmark,
  Sweden, Norway, Finland, Austria, Switzerland (Defikarte.ch),
  Portugal, Slovenia (gov API), Czech Republic, Ireland, Poland
- 3 data standards reviewed (French schema-dae, EENA, Hinterzarten)
- Accessibility ranking table and 4-phase import strategy
- Extended minimum field mapping for European scope
- Cross-reference added to the Spain-specific document

https://claude.ai/code/session_01B3ye6vQimVkFS5uqqe1wJD
… adapter, scheduled syncs

Phase 1 (Foundation):
- Add country_code field to aeds and external_data_sources tables (default 'ES')
- Implement generic RestApiAdapter with configurable pagination (offset/page/cursor)
- Extend DataSourceConfig with pagination, method, requestBody, responseDataPath
- Activate scheduled sync check in CRON handler (DAILY/WEEKLY/MONTHLY auto-trigger)
- Expand admin form with 14 EU countries + all 17 Spanish CCAA regions

Phase 2 (Cleanup):
- Seed Madrid data source config as DB record via UPSERT migration
- Remove hardcoded MADRID_FIELD_MAPPINGS, createMadridConfig, MADRID_DEA_RESOURCE_ID
- Make CkanApiAdapter fully generic (uses config.fieldMappings, auto-detects ID field)
- Add "copy from existing source" template support in admin new data source form

https://claude.ai/code/session_01B3ye6vQimVkFS5uqqe1wJD
Replace custom region codes (MAD, CAT, CYL, etc.) with ISO 3166-2
standard codes (ES-MD, ES-CT, ES-CL, etc.) across the entire system.

- Update REGION_CODES_BY_COUNTRY with full ISO 3166-2 codes for ES, FR,
  IT, DE, PT, CH, AT, BE, SI, GB, NL, DK, SE, NO
- Add data migration to convert existing region_code values in
  external_data_sources table from custom to ISO 3166-2
- Update Madrid seed migration to use ES-MD
- Update schema comments and interface docs

Using the format {country}-{subdivision} (e.g. ES-MD) makes codes
self-descriptive and internationally recognized, which is important
as we add European data sources.

https://claude.ai/code/session_01B3ye6vQimVkFS5uqqe1wJD
@vercel
Copy link
Contributor

vercel bot commented Mar 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
dea-map Ready Ready Preview, Comment Mar 15, 2026 2:58am

Request Review

@vgpastor vgpastor added enhancement New feature or request help wanted Extra attention is needed labels Mar 13, 2026
- Replace Madrid-only seed with unified migration covering 4 CCAA:
  Madrid (JSON_FILE), Cataluña (REST_API/Socrata), Castilla y León
  (REST_API/OpenDataSoft), Euskadi (JSON_FILE/GeoJSON)
- Make country_code and region_code nullable on external_data_sources
  to support country-wide and global data sources (e.g., OSM, France)
- Change aed_external_identifiers FK from CASCADE to SET NULL to
  prevent losing identifiers when a data source is deleted/recreated
- Fix Prettier formatting on research docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
buildDataSourceConfig only handled JSON_FILE and CKAN_API types,
falling through to a base config that stripped apiEndpoint, pagination,
and responseDataPath for REST_API sources. This caused 400 errors
when previewing Cataluña and Castilla y León data sources.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
vgpastor and others added 2 commits March 13, 2026 19:47
- Replace basic 5-field edit modal with comprehensive tabbed panel
- Tab "General": name, description, sync frequency, publication mode, active toggle
- Tab "Connection": type-specific fields (JSON_FILE: fileUrl/jsonPath, CKAN_API: endpoint/resourceId/pageSize, REST_API: endpoint/method/auth/pagination config)
- Tab "Field Mappings": visual key-value editor with add/remove
- Tab "Advanced": matching strategy + threshold slider, auto-deactivate missing, auto-update field checkboxes
- Update config display section to show REST_API fields (endpoint, method, responseDataPath, pagination)
- Show matching strategy, threshold, origin and region in config display
- Fix field mapping display to read from config.fieldMappings/fieldMapping
- Fix CONTRIBUTING.md Prettier formatting for CI compliance

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
vgpastor and others added 2 commits March 14, 2026 18:43
Adds a new AedAccessPoint model that separates "how to reach a DEA"
from "where the DEA is installed". In emergencies (cardiac arrest),
responders need actionable info in seconds: which door, what floor,
is there a code, how long to walk.

Schema & migration (purely additive):
- New `aed_access_points` table with 0..N points per AED
- Enums: AccessPointType (PEDESTRIAN/VEHICLE/EMERGENCY/WHEELCHAIR/UNIVERSAL)
         AccessRestrictionType (NONE/CODE/KEY/CARD/INTERCOM/SECURITY_GUARD/LOCKED_HOURS)
- Fields: coordinates, restriction details, unlock code, contact info,
  availability (24h / schedule), indoor route (floor diff, elevator,
  estimated minutes, step-by-step JSON), emergency phone,
  can_deliver_to_entrance flag, curator traceability (created_by, verified)
- Optional FK on AedImage → AedAccessPoint (onDelete: SetNull)

Backend API:
- GET/POST /api/admin/deas/[id]/access-points (list + create)
- PATCH/DELETE /api/admin/deas/[id]/access-points/[apId]
- is_primary uniqueness enforced via transaction
- Access points included in GET /api/aeds/[id] and GET /api/admin/deas/[id]

Admin UI:
- New "Accesos" tab in DEA detail page
- AccessPointsPanel component: collapsible cards with map, restrictions,
  indoor route, photos, contacts; create form with LocationPickerMap;
  verify/delete actions

Mobile app:
- AedAccessPoint domain model + types
- DeaDetailPage: AccessPointCard with restriction badges, indoor steps,
  navigate/call buttons
- AedDetailSheet: compact access summary with restriction indicator
  and unlock code display
- Smart navigation: "Cómo llegar" uses primary access point coordinates

Fully retrocompatible — DEAs without access points return
access_points: [] and behave exactly as before.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ny types

- fix(syncRecordProcessor): parse Spanish booleans (Sí/si/S/1/verdadero)
  instead of strict "true" comparison — was causing all has_24h_surveillance
  to be false for Spanish data sources
- fix(syncRecordProcessor): default imported AEDs to PENDING_REVIEW instead
  of PUBLISHED
- fix(aedRecordProcessor): fix mojibake in Spanish comments and parseBoolean
- fix(SpanishScheduleParser): Saturday-only schedules no longer propagate to
  Sunday — only "SÁBADOS Y DOMINGOS" or "FINES DE SEMANA" set both days
- security(adapters): add SSRF protection via validateExternalUrl() blocking
  private IPs, localhost, metadata endpoints, .local/.internal TLDs
- fix(JsonFileAdapter): replace TTL-based cache with per-operation cache
  cleared after each fetchRecords to prevent stale data in serverless
- fix(ExternalSyncService): add 256MB NDJSON buffer cap to prevent OOM
- fix(aeds/[id]/route): use shared recordStatusChange() audit helper instead
  of inline aedStatusChange.create()
- refactor: eliminate 38 `any` lint warnings across PR files — use
  Record<string, unknown>, Prisma enum types, and specific interfaces

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
vgpastor and others added 2 commits March 14, 2026 23:28
…uite

Add adapters, transformers, and parsers for French (GeoDAE), German, and
Austrian data sources. Includes normalizeRecord for nested-object flattening
and WKT POINT parsing, field transformers (schedule parsers, address
splitters, HTML stripping), CSV encoding support, and enrichRecord pipeline.

Fix bug in syncRecordProcessor where updateAed path was missing
has_restricted_access and is_pmr_accessible in schedule data.

Add 636 unit tests covering ImportRecord value object, normalizeRecord,
aedRecordProcessor, syncRecordProcessor, deviceHelpers, and all
field transformers — with corner cases for coordinates, booleans,
dates, cross-source conflicts, and verified-AED protection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add sync_ndjson_cache table (gzip-compressed) to avoid re-downloading
  all records from external APIs on every chunk resume iteration
- Cache-on-miss: resumeSync writes cache when it was absent (jobs started
  before this feature), so subsequent resumes hit cache immediately
- Fix syncStartTime not extracted on resume — disappearance detection was
  silently skipped for all multi-chunk syncs
- Add type validation to PUT /api/admin/data-sources/[id] endpoint
- Add libpostal + nominatim Docker services for address parsing/geocoding
- Document @batchactions source cache feature request

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vgpastor vgpastor changed the title Add REST API adapter and European country/region support Multi-country AED import pipeline with NDJSON cache and data source management Mar 15, 2026
Loop `.replace(/<[^>]+>/g, "")` until no tags remain, preventing
incomplete sanitization of nested patterns like `<scr<script>ipt>`.
Resolves GitHub Advanced Security alerts on 3 transformers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vgpastor vgpastor merged commit 8349170 into main Mar 15, 2026
9 checks passed
@vgpastor vgpastor deleted the claude/research-defibrillator-data-spain-UWU1W branch March 15, 2026 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request help wanted Extra attention is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants