Skip to content

Audit norm_case usage and consider separating path normalization responsibilities #278

@karthiknadig

Description

@karthiknadig

Background

As part of fixing #186 (Windows junctions not properly resolved), we replaced fs::canonicalize() with GetLongPathNameW in norm_case() to avoid resolving junctions/symlinks.

During this fix, we discovered that norm_case() was doing more than just case normalization - it was also:

  1. Converting relative paths to absolute paths
  2. Stripping trailing path separators (to match canonicalize behavior)
  3. Normalizing case

Current State

The function is defined in crates/pet-fs/src/path.rs and is called from ~20+ locations across the codebase:

  • crates/pet-conda/src/environment_locations.rs
  • crates/pet-poetry/src/environment_locations.rs
  • crates/pet-windows-store/src/environments.rs
  • crates/pet-pipenv/src/lib.rs
  • crates/pet-global-virtualenvs/src/lib.rs
  • crates/pet-python-utils/src/fs_cache.rs
  • crates/pet-windows-registry/src/environments.rs
  • And more...

Concern

The Windows Registry stores paths with trailing backslashes (e.g., C:\...\x64\). The old fs::canonicalize() silently stripped these. When we switched to GetLongPathNameW, we had to explicitly add trailing slash stripping to maintain backward compatibility.

This raises questions:

  1. Should path sources (like Windows Registry) sanitize their own output?
  2. Should norm_case() be split into separate functions with clear responsibilities?
  3. Are there other implicit behaviors from canonicalize() that callers depend on?

Suggested Audit Tasks

  • Review all norm_case() call sites to understand what each caller actually needs
  • Consider splitting into focused functions:
    • normalize_case() - Just case normalization
    • normalize_path() - Full path normalization (relative→absolute, trailing slashes, case)
  • Evaluate if path sources (Registry, conda rc files, etc.) should sanitize their own data
  • Add documentation for expected behavior

Related

Metadata

Metadata

Assignees

Labels

debtCode quality issuesverifiedVerification succeeded

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions