Boolean arguments

storchaka · December 5, 2022, 7:51am

In Python, virtually every value (with exception of NotImplemented) has a boolean value. It is a great feature, it is very convenient to write simple code.

In builtin and extention functions, boolean arguments were historically parsed with the “i” format unit in PyArg_Parse*() functions. It accepts True and False, as well as integers 1 and 0, so it is compatible with very old Python code predating bool. But it also accepts other integers and integer-like objects which fit in the C int, and raise OverflowError if they do not fit. After introducing the “p” format unit, it is used in many functions, so they now accept arbitrary Python objects as boolean arguments. And some code use PyObject_IsTrue() directly.

The advantage of using the “p” format unit is consistency with the pure Python code. But there the disadvantage: it can hide a bug.

Forget to call a function: func instead of func().
Confuse a method with a property: x.attr instead of x.attr().
Forget await (although there is already a separate warning for this).
The code can work with a collection, but not with an iterator.
Do not unpack a 1-tuple.
Do not replace a placeholder used as a default value (like None or object()).
For positional arguments, skipping an argument or passing arguments in wrong order.

Now, the mainquestion: Do we want to accept arbitrary Python objects as boolean arguments in builtin and extention functions for consistency with the pure Python code, or limit the set af acceptable value to help catching programming errors?

If we decide to limit it, we can introduce new format unit (“P” or “p!”), or even several format units fro transition time, but I have questions about details:

Should it only accept True and False, or also 1 and 0 for compatibility, or any integer?
Should it emit a warning or raise an error for non-bool and non-int arguments?
Should it emit a warning or raise an error for integer arguments other than 1 and 0?
Should it emit a warning for 1 and 0?

gpshead · December 5, 2022, 10:13am

I Serhiy’s summary above.

Some additional context is that I recently merged a PR which was a move of a bunch of builtins and extension modules towards consistency with the truthiness style logic we usually write in pure Python code. But for the reasons listed above, maybe we don’t actually want that everywhere?

I suspect there isn’t a single right answer and it is going to depend on the API as to whether being strict about True|1|False|0 vs the object truthiness __bool__ test will be helpful or potentially hide easy bugs for the majority of users not using a type checker.

vstinner · December 5, 2022, 11:52am

I would prefer that C code behaves as Python code:

def func(option=True):
    if option:
        print("yes, the option is true!")

In Python, func() basically accepts “anything”, except of the very few objects where bool(option) raises an exception.

If func() is rewritten in C, I would expect that it behaves exactly the same. I dislike when a function behaves differently depending if it’s implemented in C or Python. See PEP 399 by the way. That’s one of the reasons why Positional-only arguments were added to Python, PEP 570.

So yes, func() accepts things which “should not” be used a boolean. But well, Python has a long story about “duck typing” and changing that is likely to break many cases.

If you want a C function to not accept obj.method but only obj.method(), the method.__bool__() method should be modified to raise an error. The problem is not specific to C.

>>> def func(option=True):
...     if option:
...         print("yes, option is true")
... 

>>> func()
yes, option is true
>>> func(True)
yes, option is true
>>> func(1)
yes, option is true
>>> func("no")
yes, option is true

>>> class MyClass:
...     def method(self): pass
... 
>>> obj=MyClass()
>>> func(obj.method)
yes, option is true
>>> func(obj.method())
>>> func(None)

Jelle · December 5, 2022, 10:15pm

As a datapoint: In typing (e.g. in typeshed stubs), we tend to use bool as an annotation for boolean parameters. Strictly speaking that’s usually not accurate, because any Python object will work in the sense that it won’t throw an error, but as Serhyi points out, using a non-bool value is often indicative of a bug. I don’t remember any user complaints about this.

gpshead · December 5, 2022, 10:34pm

Agreed, I’m actually happy to see source analysis tooling being more strict about this than our implementation.

storchaka · December 7, 2022, 7:40am

Python has a long story of accepting only int and bool as boolean arguments of the C implemented functions. Accepting arbitrary Python objects is relatively recent feature and was used only in some functions.

The question is what behavior is desired. Do we prefer purity or safety? If you really need to accept arbitrary Python object as a boolean argument in the particular function (I do not know use cases for this, but perhaps you have), you can do this. The question is whether we want to be more strict in the majority of functions? And if yes, what should be the final state and what is the transition plan for different types and values (int/non-int, 0/1/other values).

Several years ago I tested what will be broken if they only accept True and False. Very few code would be broken, and I changed it to always use True/False instead of 1/0. Few days ago I fixed also few tests added since. Besides this no tested code in the stdlib will be affected by using more strict rules.

mdickinson · December 7, 2022, 8:38am

NumPy bools (objects of type numpy.bool_) are a potential issue. Do we want to be able to use them in a context where a Python bool is expected? (Personally, I think we do: it happens a lot in our own scientific code that we naturally end up with a NumPy bool_ instead of a regular bool and we’re passing it on to some non-NumPy-aware library or built-in.)

But if we do want them to be usable but we don’t want to allow general objects, what’s the practical test for whether something is bool-like?

Somewhat related: regression when passing numpy bools to sorted(..., reverse=r) · Issue #82161 · python/cpython · GitHub

storchaka · December 7, 2022, 1:16pm

Very good point.

We could distinguish “bool-like objects” from other objects by the existence of the __bool__ method, but some collections define it for performance or to avoid OverflowError if __len__() returns too large integer.

We could distinguish “bool-like objects” from other objects by introducing a new special method (like we did with __int__ and __index__), but I think that is an overhead.

So I think we will just left with accepting arbitrary Python objects (it was done in https://github.com/python/cpython/pull/15609) until someone comlaine about it and propose a good solution.

Rosuav · December 7, 2022, 2:20pm

As a multilingual programmer, I greatly appreciate this choice. Working in a language with strict bool rules (where you can’t say if (x) but have to say if (x != 0) instead) is annoying already, and it would be far more so to have this limitation were true of certain situations but not others.

Accepting arbitrary Python objects means the language is consistent. This makes everything easier IMO.

storchaka · December 7, 2022, 3:12pm

I did not propose to change this. I only asked about boolean arguments of builtin and extension functions.

For example: string.splitlines("\n"). It works, but perhaps not in a way you can think.

Sorry if it was not clear.

Rosuav · December 7, 2022, 6:58pm

Yep, that’s exactly what I mean. To be fair, I wouldn’t often pass "\n" as a boolean, but there are certainly situations where I’d use zero for false and nonzero numbers for true. It’s handy when that works, and really annoying when it mostly works but occasionally fails.

MRAB · December 7, 2022, 7:08pm

Would that be a use-case for a __nonempty__ method?

brettcannon · December 7, 2022, 8:24pm

That’s the equivalent of __bool__ in most situations and semantically __nonempty__ is only useful from an extensions perspective since Python itself has integers of arbitrary size.

vstinner · December 9, 2022, 10:05am

I’m not sure if that it’s a good idea.

The int type has a __bool__() method. Does it mean that 0 and 1 are considered as boolean? Other examples:

>>> (3.14).__bool__()  # float
True
>>> (5j).__bool__()  # complex
True

The numpy.bool_ does not inherit from Python built-in bool type.

While I’m not surprised by flag=1 instead of flag=True, using float or complex as boolean sounds wrong to me. Either we accept any Python object which can be converted “somehow” to bool (ex: use __len__() method), or we only accept the exact bool type (sorry numpy.bool_ and other variants).

By the way, collections.abc has no Boolean Abstract Base Class (ABC) which would check for __bool__() or __len__().