Skip to content

Upgrade UMLS data from 2021 to latest release #56

@AlexMikhalev

Description

@AlexMikhalev

Summary

Current words_cui.tsv is from April 2021. Download fresh 2025/2026 UMLS release using new UMLS account credentials. Rebuild umls_automata.bin.zst artifact.

Details

  • Current automata has ~1.4M patterns from 2021 UMLS release
  • New release will have updated concepts, retired CUIs, and new terms
  • Validate pattern count and entity extraction quality against 18 evaluation cases
  • Ensure no regression in safety gate behavior (e.g., Pembrolizumab/EGFR blocking)

Acceptance Criteria

  • Download latest UMLS release (2025AA or 2026AA)
  • Rebuild umls_automata.bin.zst with updated data
  • All 18 evaluation cases still pass 3-gate harness
  • Document pattern count delta (old vs new)

Priority: P2

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions