Skip to content

Fix implementation of locale region override#7335

Closed
sffc wants to merge 9 commits intounicode-org:mainfrom
sffc:region-override
Closed

Fix implementation of locale region override#7335
sffc wants to merge 9 commits intounicode-org:mainfrom
sffc:region-override

Conversation

@sffc
Copy link
Copy Markdown
Member

@sffc sffc commented Dec 21, 2025

We had partial claims for supporting this, but there were zero tests, and it turns out our region override code was not in working condition. I fixed the bugs and added test coverage.

Fixes #4413

@sffc sffc changed the title Region override Fix implementation of locale region override Dec 21, 2025
@sffc sffc marked this pull request as ready for review December 21, 2025 08:01
@sffc sffc removed request for a team, aethanyc, dminor, makotokato and nciric December 21, 2025 08:01
script: self.script,
region: match (self.region, self.ue_region) {
(Some(_), Some(r)) if region_priority => Some(r),
(_, Some(r)) if region_priority => Some(r),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zbraniecki: was there a reason you were matching Some(_) on the regular region subtag here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think it was a bug! Thanks for catching it!

/// ```
///
/// [UTS #35]: https://www.unicode.org/reports/tr35/#RegionOverride
pub fn try_from_extension_value(value: &Value) -> Result<Self, RegionExtensionError> {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zbraniecki wdyt of this new API. the signature and where it lives?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if this lived on Value, or Subtag, or was internal to the preference parser. Something like parse_unicode_subdivision_id(self) -> Option<(Region, TinyAsciiStr<4>)>.

I also think it should be private for now.

Copy link
Copy Markdown
Member

@zbraniecki zbraniecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

match Region::try_from_extension_value(v) {
Ok(r) => Some(r),
Err(_e) => {
// How should we handle the error? Ignore?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. It's inherently lossy

/// ```
///
/// [UTS #35]: https://www.unicode.org/reports/tr35/#RegionOverride
pub fn try_from_extension_value(value: &Value) -> Result<Self, RegionExtensionError> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

TestCase {
input: "en-u-rg-gbzzzz-sd-ustx",
language_priority: "en-u-sd-ustx",
region_priority: "en-GB-u-sd-ustx", // does this make sense?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about as much sense as the input

please don't submit these comments

Copy link
Copy Markdown
Member

@robertbastian robertbastian Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this doesn't make sense, as the rg override overrides the subdivision as well

I think we might want to remove the subdivision entirely for language priority, as that should only be used for region-specific data?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in the context of #7337 and its follow-ups

use size_test_macro::size_test;

#[cfg(test)]
use icu_locale_core::locale;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move this into test_numbering_resolution_fallback

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

///
/// ⚠️ This constructor ignores fallback priority and Unicode extension keywords,
/// and it reads only the first variant.
pub fn from_content_language_identifier(langid: &LanguageIdentifier) -> Self {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to make the content/user locale disctinction on the DataLocale level (which I still consider to be an icu_provider type). Each component decides for itself how to map locales into DataLocales, and what the semantics of its input locales are.

If you really want to name these methods (maybe something else), then we should deprecate the From implementations.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to figure out how to deprecate the From impls.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to #7346

/// ⚠️ This constructor ignores fallback priority and Unicode extension keywords,
/// and it reads only the first variant.
pub fn from_content_language_identifier(langid: &LanguageIdentifier) -> Self {
Self {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this duplicates code in From<&LanguageIdentifier>

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

observation: unrelated cleanup

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// ```
///
/// [UTS #35]: https://www.unicode.org/reports/tr35/#RegionOverride
pub fn try_from_extension_value(value: &Value) -> Result<Self, RegionExtensionError> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if this lived on Value, or Subtag, or was internal to the preference parser. Something like parse_unicode_subdivision_id(self) -> Option<(Region, TinyAsciiStr<4>)>.

I also think it should be private for now.

///
/// [UTS #35]: https://www.unicode.org/reports/tr35/#RegionOverride
#[displaydoc("Invalid region or subdivision extension value")]
Invalid,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use ParseError::InvalidExtension here? We don't have per-type or per method parsing errors

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a different level than a syntax error. I was going to use ParseError but thought you would give the opposite feedback that we should instead use a narrow error type. If you confirm that you prefer ParseError then I'll switch over.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is obsolete since we found the existing function that returns ParseError

Comment on lines 174 to +175
let payload_locale_override = if let Some(locale) = options.content_locale {
let locale = DataLocale::from(locale);
let locale = DataLocale::from_content_language_identifier(locale);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think renaming the identifier to content_language has a similar readability effect as renaming from

Suggested change
let payload_locale_override = if let Some(locale) = options.content_locale {
let locale = DataLocale::from(locale);
let locale = DataLocale::from_content_language_identifier(locale);
let payload_locale_override = if let Some(content_language) = options.content_locale {
let locale = DataLocale::from(content_language);

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss in the context of #7346

#[diplomat::opaque]
/// An ICU4X Locale, capable of representing strings like `"en-US"`.
///
/// In `icu_capi`, this type also covers the uses of `DataLocale`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPM/Dart/MVN users won't know what icu_capi is

Suggested change
/// In `icu_capi`, this type also covers the uses of `DataLocale`.
/// This type also covers the Rust type `DataLocale`.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// ```
/// use icu_locale_core::{locale, Locale};
/// use icu_provider::DataLocale;
/// use icu_locale_core::locale;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why split these?

why change the DataLocale import? I like importing it from icu_provider

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// // For en-US with a Germany preference override
/// let info = WeekInformation::try_new(locale!("en-US-u-rg-dezzzz").into()).unwrap();
/// assert_eq!(info.first_weekday, Weekday::Monday);
/// ```
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add an example of the fw override as well, and how it interacts with rg.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have code to parse unicode_subdivision_id.

https://docs.rs/icu/latest/icu/locale/extensions/unicode/struct.SubdivisionId.html

robertbastian added a commit to robertbastian/icu4x that referenced this pull request Dec 22, 2025
robertbastian added a commit to robertbastian/icu4x that referenced this pull request Dec 22, 2025
robertbastian added a commit to robertbastian/icu4x that referenced this pull request Dec 22, 2025
robertbastian added a commit that referenced this pull request Dec 22, 2025
@sffc
Copy link
Copy Markdown
Member Author

sffc commented Dec 24, 2025

This PR has been fully split into:

#7337
#7346
#7347
#7348

@sffc sffc closed this Dec 24, 2025
@sffc sffc deleted the region-override branch December 24, 2025 21:28
sffc added a commit that referenced this pull request Dec 29, 2025
sffc added a commit that referenced this pull request Dec 31, 2025
Feedback from #7335

Demonstrates that #4413 is fixed
sffc added a commit that referenced this pull request Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support u-rg in region priority fallback (and u-sd)

3 participants