If folks need to normalize their strings, they can call:
import unicodedata
my_string = unicodedata.normalize('NFC', my_string)
Which is great – however, now that str
is (and has been for a LONG time) Unicode always – it would be nice if normalize
was a str
method, so you could simply do:
my_string = my_string.normalize('NFC')
or even more helpful:
a_string.normalize('NFC') == another_string.normalize('NFC')
I think this goes beyond simply saving some people some typing:
As a rule, many (most?) Python developers (or any developers!) aren’t all that aware of normalized forms in Unicode, and what they mean.
But it’s an important idea, and often critical in code, to work with normalized forms.
So I think it would be very helpful if the concept, and the code, was more exposed – having to dig into unicodedata
to get it makes it much less likely for people to find it without a proper search – e.g. they have to know what they are looking for.
Whereas, if it is a str
method, then folks are far more likely to notice it when looking at the str
docs, and maybe ask them selves “what the heck is normalize
?” – and that’s a good thing.
And the saved typing is nice, too
and maybe is_normalized
as well.
Thoughts?