'äöü'.casefold() should return 'aeoeue'

Artai · May 4, 2026, 9:42pm

casefold() is intended to remove all case distinctions in a string, for example, the German letter ‘ß’ which is equivalent to "ss". By this it also should transform the German mutated vowels ‘ä’, ‘ö’ and ‘ü’ in my opinion to its international equivalents ‘ae’, ‘oe’ and ‘ue’.

Rosuav · May 4, 2026, 9:59pm

The casefold method follows the Unicode rules. If you want to change what they are, talk to the Unicode consortium.

Artai · May 4, 2026, 10:15pm

Thanks for that information. I didn’t know about those Unicode rules and wondered in my point of view that .casefold() didn’t the whole job, only tranforming ß but not ä, ö and ü. However, good to have a discussion to learn new things

Dutcho · May 4, 2026, 10:58pm

Also note

umlauts are the German interpretation, but vowels ä, ë, ï, ö, ü, ÿ have different use in other languages (diaeresis)
the equivalence to ae, oe, ue is also specifically German, not an international standard

MegaIng · May 4, 2026, 11:22pm

At the time the casefold rules were first written, uppercase ß didn’t exists, and it still isn’t in common usage. For a long time the official recommendations were to use SS when writing a word uppercase. This was then adapted by unicode. At this point this is very unlikely to change because of backwards compatibility.

AFAIK this recommendation never existed for ä, ö, ü, which always had uppercase variants: Ä, Ö, Ü.

I am not sure why you think this would be correct behavior? casefold is not crossword rules.

Artai · May 6, 2026, 1:40pm

I understood .casefold() to be able to make strings comparable across national character sets, and in everyday international use, umlauts are converted into their outgoing vowel with an additional e.

tiran · May 6, 2026, 1:48pm

The unicode database in Python’s stdlib is not locale-aware. It is limited to UCD (Unicode Character Database). If you need locale-specific folding and sorting, then you have to use a library such as ICU (International Components for Unicode)

tjreedy · May 6, 2026, 5:30pm

In particular, str.casefold doc says “The casefolding algorithm is described in section 3.13.3 ‘Default Case Folding’ of the Unicode Standard.”