AFAIK the wcscoll(3)
API is not deprecated on macOS, but as mentioned earlier in this thread doesn’t work with multi-byte LC_CTYPE
locales, which means it doesn’t work for most users because the default LC_CTYPE
is AFAIK UTF-8 (at least on all my systems, running various macOS versions).
The documentation for strcoll(3)
doesn’t mention this restriction, but seams to suffer from the same issue.
Code used to reproduce:
import locale
print(locale.setlocale(locale.LC_COLLATE, 'no_NO.UTF-8'))
x=['å', 'æ', 'ø']
print(f"In: {x}")
print(f"Out: {sorted(x, key=locale.functools.cmp_to_key(locale.strcoll))}")
This prints:
no_NO.UTF-8
In: ['å', 'æ', 'ø']
Out: ['å', 'æ', 'ø']
The thread linked to earler mentions that the expected output is ['æ', 'ø', 'å']
.
The output is unchanged if I apply a crude patch to the locale module that switches to strcol(3)
for locale.strcoll
.
diff --git a/Modules/_localemodule.c b/Modules/_localemodule.c
index fe8e4c5e30..375cbdc6b6 100644
--- a/Modules/_localemodule.c
+++ b/Modules/_localemodule.c
@@ -349,21 +349,19 @@ _locale_strcoll_impl(PyObject *module, PyObject *os1, PyObject *os2)
/*[clinic end generated code: output=82ddc6d62c76d618 input=693cd02bcbf38dd8]*/
{
PyObject *result = NULL;
- wchar_t *ws1 = NULL, *ws2 = NULL;
+ char *ws1 = NULL, *ws2 = NULL;
/* Convert the unicode strings to wchar[]. */
- ws1 = PyUnicode_AsWideCharString(os1, NULL);
+ ws1 = PyUnicode_AsUTF8(os1);
if (ws1 == NULL)
goto done;
- ws2 = PyUnicode_AsWideCharString(os2, NULL);
+ ws2 = PyUnicode_AsUTF8(os2);
if (ws2 == NULL)
goto done;
/* Collate the strings. */
- result = PyLong_FromLong(wcscoll(ws1, ws2));
+ result = PyLong_FromLong(strcoll(ws1, ws2));
done:
/* Deallocate everything. */
- if (ws1) PyMem_Free(ws1);
- if (ws2) PyMem_Free(ws2);
return result;
}
#endif
It might be possible to get the correct behaviour using CoreFoundation functions (like CFStringCompareWithOptionsAndLocale
), but that has two problems: first of all the extra cost of converting Python strings to CoreFoundation strings, and more importantly causing more problems when using os.fork
because Apple’s Cocoa frameworks are known to be problematic when using os.fork
without immediately exec-ing a different program.
It’s probably better to just document this limitation.