LC_MESSAGES use in PEP 668 (EXTERNALLY-MANAGED)

So I was trying to implement PEP 668 in pip and reached this part of the PEP:

If the first element of the tuple returned by locale.getlocale(locale.LC_MESSAGES), i.e., the language code, is not None, it should look for the error message as the value of a key named Error- followed by the language code.

but found that some CI jobs fail with AttributeError on the locale.LC_MESSAGES call, indicating the value is not available universally. And indeed this is the case:

LC_MESSAGES is only added if it’s defined (in system headers such as locale.h).

So we’ll need to modify the PEP a bit to either provide a fallback when the value is not available, or do away with LC_MESSAGES altogether. Thoughts?

cc @geofft


Side note: This fact is not mentioned at all in locale’s documentation. I should submit a pull request to fix this after we discuss this.

2 Likes

POSIX requires LC_MESSAGES. However, the C runtime on Windows doesn’t implement it. On Windows, by default messages are fetched based on the preferred UI language of the current thread. This should match the user locale, which also defines linguistic data such as names of months and the days of the week, but they can differ.

I think the solution that stays the truest to the PEP is to treat LC_MESSAGES missing as you would the first element of the tuple being None. The system does not support this feature essentially.

Oh, I forgot but probably should mention there’s also this comment in locale.py:

A system lacking LC_MESSAGES may still support message localisation, but simply uses another category, since LC_MESSAGES is not in the C standard. But in practice this should affect Windows users, and since using other values (e.g. LC_ALL) is probably eaqually unreliable, and most Python distributions on Windows likely won’t want to implement PEP 668 anyway, I think it’s OK to simply skip this as you suggest, at least until someone complains.

LC_MESSAGES is a standard category in POSIX[1], but it’s not standard in C. AFAIK, Windows is the only supported platform that lacks LC_MESSAGES, since all other supported platforms are POSIX systems.

If that weren’t the case, I’d suggest using ctypes to query the current thread’s preferred UI language, which will be a locale name or language name such as “es” or “es-ES”.


On POSIX, maybe the colon-delimited list of languages in the “LANGUAGE” environment variable should take priority over the LC_MESSAGES category, if the latter isn’t set to the “C” locale. I know that “LANGUAGE” is for gettext and not actually specified by POSIX, but it’s commonly supported, at least on Linux. For example:

>>> 'LANGUAGE' in os.environ
False
>>> locale.setlocale(locale.LC_MESSAGES, 'es_ES.UTF-8')
'es_ES.UTF-8'
>>> os.stat('missing')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No existe el archivo o el directorio: 'missing'
>>> os.environ['LANGUAGE'] = 'ru'
>>> os.stat('missing')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] Нет такого файла или каталога: 'missing'

  1. See IEEE Std 1003.1-2017 and IEEE Std 1003.1, 2004 Edition ↩︎

I haven’t looked at how, but the Azure CLI has localisation on Windows. 99% sure it’s not using ctypes. Might be worth digging into that (or knack, its CLI framework).

My guess is that locale.getlocale()[0] is going to be more than good enough, considering these functions aren’t really compatible with POSIX anyway and so likely aren’t used in any cross-platform scenario. locale.getdefaultlocale()[0] might also be fine - it seems to give different (unnormalised?) values from getlocale().

Most Python distributions on Windows are never released publicly, so it’s going to be real hard to make generalisations :wink: My guess is that most won’t use it, because it’s not a well known feature, though I would use it in some scenarios. Still, loss of localisation isn’t the end of the world, considering virtually no Python code does it anyway.

The default category for locale.getlocale() is LC_CTYPE, which the interpreter sets to the default locale at startup (except not if an embedding application disables this step). The C runtime’s default locale name is the full English name of the language and country of the user locale (e.g. “German_Germany”). A native locale name can be set manually (e.g. “de-DE”). Using underscores instead of hyphens is also supported (e.g. “de_DE”).

locale.getlocale() tries to normalize the locale name, but it’s inconsistent. Only 3 of the full English names are normalized: “French_France” → “fr_FR”, “German_Germany” → “de_DE”, and “Spanish_Spain” → “es_ES”. Also, normalizing a native locale name that use a hyphen (e.g. “de-DE”) fails with a ValueError.

One can call locale.setlocale(locale.LC_CTYPE) to get the real locale name, without normalization.

locale.getdefaultlocale() raises a deprecation warning in 3.11, but I don’t think there are plans to remove the underlying builtin function on Windows, _locale._getdefaultlocale(). It returns the user locale name in normal POSIX format (e.g. “en_GB”).


By default, Windows configures the user locale to match the user’s preferred display language. But they can be different, in which case locale strings such as the names of weekdays may be in a different language from resource strings and messages. On POSIX that would be like configuring LC_MESSAGES different from other categories such as LC_CTYPE and LC_TIME. I think, if for some reason we couldn’t query the LC_MESSAGES locale on a POSIX system, it wouldn’t be unreasonable to assume that it’s the same as LC_CTYPE.

OTOH, the user locale isn’t a good substitute for the ordered list of languages in the “LANGUAGE” environment variable, as is used extensively on Linux, such as by strerror(). (GNU gettext uses the “LANGUAGE” list if LC_MESSAGES isn’t “C”.) A better choice is to check GetThreadPreferredUILanguages() with MUI_UI_FALLBACK. This is the list of language names that the system uses when searching for string and message resources, such as when FormatMessageW() is called with dwLanguageId as 0 or LANG_USER_DEFAULT.