PyUnicode_FromFormat allow `%%` format with precision?

philg · August 2, 2022, 4:01am

The %% format in PyUnicode_FromFormat recognizes the zero flag and width but not precision:

// Recognized
PyUnicode_FromFormat(  "%% %s", "abc"); // "% abc"
PyUnicode_FromFormat( "%0% %s", "abc"); // "% abc"
PyUnicode_FromFormat("%00% %s", "abc"); // "% abc"
PyUnicode_FromFormat( "%2% %s", "abc"); // "% abc"
PyUnicode_FromFormat("%02% %s", "abc"); // "% abc"

// Not recognized
PyUnicode_FromFormat("%.0% %s", "abc"); // "%.0% %s"
PyUnicode_FromFormat("%.2% %s", "abc"); // "%.2% %s"

Can this be changed to allow precision? The original intention was to fix a crash before the %% format was supported, see Issue #10829: Refactor PyUnicode_FromFormat() · python/cpython@9686545 · GitHub.

vstinner · August 2, 2022, 2:08pm

I don’t see something other than %% should be allowed. What’s the use case? Can’t you fix your format string instead?

philg · August 6, 2022, 10:59am

There is no use case for allowing it but not allowing it is inconsistent with the other format specifiers (the zero flag, width, and precision is always allowed even if it has no effect).

I’d like to make this consistent, not because precision is useful but because there’s no reason for the inconsistency anymore.

ericvsmith · August 6, 2022, 12:26pm

I don’t think we should add something without a use case, just for consistencies sake. If no one has ever needed it; why add to the maintenance burden?

gpshead · August 7, 2022, 12:20am

If anything I’d prefer %% to not support numbers between the % signs at all given they have no meaning. But only if that is less of a maintenance burden. Nobody should intentionally be using that, most think of %% as being a way to escape % to get a % sign.

philg · August 8, 2022, 5:04am

I think I approached this the wrong way. My overall goal is to improve PyUnicode_FromFormat and there are two places where there is code left over from a previous version of the function that is no longer necessary:

Special case for %% with precision: cpython/unicodeobject.c at 330f1d58282517bdf1f19577ab9317fa9810bf95 · python/cpython · GitHub
Special case for incomplete format specifier: cpython/unicodeobject.c at 330f1d58282517bdf1f19577ab9317fa9810bf95 · python/cpython · GitHub

Both of these special cases were introduced when parsing format flags were a separate function and not handling them would crash Python (Issue #10829: Refactor PyUnicode_FromFormat() · python/cpython@9686545 · GitHub).

Since then PyUnicode_FromFormat was refactored and not handling the special cases would not cause a crash:

%% with precision would ignore precision (cpython/unicodeobject.c at 330f1d58282517bdf1f19577ab9317fa9810bf95 · python/cpython · GitHub)
Incomplete format specifier would be unrecognized (cpython/unicodeobject.c at 330f1d58282517bdf1f19577ab9317fa9810bf95 · python/cpython · GitHub)

I believe that removing the special cases would reduce the maintenance burden (less code to think about). If that’s the case can they be removed?

storchaka · August 8, 2022, 12:10pm

You are right. Yesterday I wrote a code, and today created a PR: gh-95781: More strict format string checking in PyUnicode_FromFormatV() by serhiy-storchaka · Pull Request #95784 · python/cpython · GitHub.

philg · August 8, 2022, 12:52pm

Ah okay, very cool! I thought a backwards compatible change had the highest chances of getting accepted but this solves all my concerns. Thanks!

vstinner · August 8, 2022, 5:21pm

I agree. Rejecting invalid format strings is better than copying it unchanged! The old behavior was just weird wand wrong.