New PyUnicode_EqualToUTF8() function

What do you mean whit this? Cases left in which codebase? I’m sure the comparison functions would be very useful in Cython, for example.

Do you have concrete examples? Code using something something else currently.

Hi, just want to comment the more general comparison operation will also useful in numpy, which is going to add support for arrays of utf-8 strings soon.

Right now in the prototype implementation when we compare with object dtype arrays containing python strings we have to convert the python strings to UTF-8 before comparing. With this, we could in principle add loops for all the comparisons that skips this step and compares with the PyUnicode directly.

Numpy doesn’t currently implement unicode normalization or locale-aware sorting, so string comparisons just sort by codepoint (e.g. strcmp for UTF-8).

That said, just having equal and not equal available will be very nice.

1 Like

So you can just run strcmp(PyUnicode_AsUTF8(str1), PyUnicode_AsUTF8(str2)), no? (Just add error handling.)

1 Like