If folks need to normalize their strings, they can call:
import unicodedata
my_string = unicodedata.normalize('NFC', my_string)
Which is great – however, now that str is (and has been for a LONG time) Unicode always – it would be nice if normalize was a str method, so you could simply do:
my_string = my_string.normalize('NFC')
or even more helpful:
a_string.normalize('NFC') == another_string.normalize('NFC')
I think this goes beyond simply saving some people some typing:
As a rule, many (most?) Python developers (or any developers!) aren’t all that aware of normalized forms in Unicode, and what they mean.
But it’s an important idea, and often critical in code, to work with normalized forms.
So I think it would be very helpful if the concept, and the code, was more exposed – having to dig into unicodedata to get it makes it much less likely for people to find it without a proper search – e.g. they have to know what they are looking for.
Whereas, if it is a str method, then folks are far more likely to notice it when looking at the str docs, and maybe ask them selves “what the heck is normalize?” – and that’s a good thing.
And the saved typing is nice, too ![]()
and maybe is_normalized as well.
Thoughts?