~bool deprecation

barry · August 28, 2024, 6:49pm

A few days ago, I noticed an email notification from GitHub about a change in deprecation period for ~bool, i.e. bitwise inversion of a bool type. The precipitating PR is GH-82012, which is linked to the original issue GH-122982.

This surprised me because I couldn’t recall any more visible discussion about what seems to me to be a fairly impactful change for users. Granted, this is only the deprecation, but the root topic of whether this should even be deprecated, wasn’t to my knowledge discussed anywhere else but GitHub, and it was pure happenstance that I noticed this thread in the deluge of GitHub notification emails.

I’ll start a separate thread about the wider topic of change visibility^[1], but I wanted to raise this concern so it gets more discussion. I’m not presenting my opinion, either as an SC member or with my individual core dev hat on, as to whether this change is good or not. But I think it’s at least controversial enough to warrant a wider discussion before any removal actually takes place^[2].

So… should ~bool be deprecated? Discuss!

please don’t post here, it’ll just likely split the topic anyway ↩︎
deprecations are much easier to reverse, but once its removed, that’s a path that’s much more difficult to walk back ↩︎

brettcannon · August 28, 2024, 7:22pm

If we want to continue to treat booleans as proper integers then no. If we don’t quite care about maximizing that compatibility then I’m indifferent.

storchaka · August 28, 2024, 7:36pm

The original issue is bool(~True) == True · Issue #82012 · python/cpython · GitHub.

Barry, something is wrong with your links.

I agree that this change is slightly controversial, but I think that it is net positive. There is more confusion than benefit from having ~True == -2.

The problem could gone if make True equal to -1 instead of 1, but this is a bit late.

AA-Turner · August 28, 2024, 7:58pm

I’d agree that eventually removing ~bool makes sense - it is surprising in a bad way for users coming from other languages that might expect ~False is True, and also confusing for beginners. Extending by 2 years seems fine, though.

A

pf_moore · August 28, 2024, 9:23pm

IMO, removing ~bool simply changes the nature of the confusion. Why remove ~, but leave +, -, * and /? For that matter, leaving & and | simply because they happen to behave as if bool was a full-fleged type of its own, while removing ~ because it doesn’t, feels weird to me.

I’d be a lot happier if this were part of a larger change to make bool a full-fledged boolean type. Either that, or I’d prefer to simply leave it alone.

I know practicality beats purity and all that, but this feels more like a perlish “do what I mean” (or in this case, “don’t do what I don’t mean” )

oscarbenjamin · August 28, 2024, 9:34pm

What about deprecating ~bool for a while but with the intention to make it work properly in the end rather than just removing it altogether?

The use of ~ for logical negation is widespread e.g. numpy:

In [11]: a = np.array([1, 2, 3, 4])

In [12]: cond = a % 2 == 0

In [13]: cond
Out[13]: array([False,  True, False,  True])

In [14]: ~cond
Out[14]: array([ True, False,  True, False])

In [15]: a[~cond] = -1

In [16]: a
Out[16]: array([-1,  2, -1,  4])

Also sympy:

In [17]: import sympy

In [18]: x, y, z = sympy.symbols('x, y, z')

In [19]: cond = x & (y | ~z)

In [20]: cond
Out[20]: x ∧ (y ∨ ¬z)

In [21]: cond.subs(z, True)
Out[21]: x ∧ y

In [34]: x | ~sympy.S.true
Out[34]: x

In [35]: x | ~True
---------------------------------------------------------------------------
TypeError

Many more examples of libraries using these can be found. The ones I know of all broadly correspond to the above two cases though: arrays of booleans or symbolic boolean expressions.

Ultimately it would be better if bool was not a subclass of int. Having it be a subclass of int now does create a compatibility problem for any change because of e.g. if isinstance(obj, int): return ~obj but I doubt that a lot of code depends on ~bool given how useless it is: just use an int if you want an int! Even ~int is rarely needed in Python and I can’t imagine doing that in some context where bools and ints are mixed so that I’m going to do ~obj without knowing where obj is an int or not.

A future non-int bool type could still support arithmetic operations like True - True -> 0 for compatibility but could do the right thing for operators like ~ that might reasonably be expected to handle boolean logic properly.

jeff5 · August 29, 2024, 7:27am

numpy does things I would not emulate.

While bool is a subclass of int, I think the arithmetic operators ought to do what they do. I’m with @pf_moore that the confusion is worse when one arithmetic operation is “corrected” and the others do … arithmetic.

Let’s make True == -1 … . (Edit: I see this idea “if only we had a time machine” has been discussed. Now I realise why some languages make this implementation choice internally.)

bjorn-martinsson · August 29, 2024, 12:09pm

I’ve found using ~ to be very useful in many different situations, and I’ve using it a lot, both in Python and C++. I’ve identified 3 cases where my Python code breaks because of this change. Codes involving bitmasks, codes making use of ~ for “reverse indexing” of a list, and the codegolf trick of using -~x to increment x by 1.

There is a relatively easy workaround, simply search replace ~ with ~+. The + will convert any Boolean to an int.

I’m against deprecating ~ on Booleans in general, but I am strongly against changing ~True to False. A change like that is really dangerous since it could (and would) create silent bugs. It also would make new code incompatible with older versions of Python.

Something I also want to note is that ~ in numpy is always bit-inverse. ~0 is always -1, even in numpy. However, depending on the data type, -1 can have different representations. Modulo two, -1 is equal to 1.

pitrou · August 29, 2024, 12:14pm

But that’s only because it’s not possible to override not, right?

In any case, ~bool behavior is confusing enough that it does seem desirable to deprecate it. “Codegolf” is not a good enough reason to keep supporting it

pitrou · August 29, 2024, 12:18pm

That doesn’t seem to be the case here (NumPy 2.1.0):

>>> int_arr = np.array([-2, -1, 0, 1, 2], dtype=np.int8)
# Inverse then view as bool => bitwise complement
>>> (~int_arr).view(np.bool_)
array([ True, False,  True,  True,  True])
# View as bool then inverse => logical not
>>> ~(int_arr.view(np.bool_))
array([False, False,  True, False, False])

oscarbenjamin · August 29, 2024, 12:26pm

I’m a bit confused by this:

Are you using ~x to index a list like stuff[~x] but where x is a bool rather than an int?

Are you doing bitmasks like flags &= ~x but where x is a bool rather than an int?

These seem like situations where I would definitely want x to be an int rather than a bool.

bjorn-martinsson · August 29, 2024, 12:40pm

Nothing in your example there contradicts my claim. np.bool is a single bit datatype, and -1 = 1 (mod 2).

If you try using ~ on unsigned datatypes, like dtype=np.uint8, dtype=np.uint16, dtype=np.uint32, … You’ll see how ~ is always bit inverse, ~0 is always -1. The only thing that changes is how many bits the data type supports.

bjorn-martinsson · August 29, 2024, 12:50pm

Good question.

For bitmasks, I do use things like flags &= ~(x == 0). For example in my implementation of a special kind of segment tree that I’ve shared with tons of people, I have the line for i in range(n & ~(bit == 0)):, See link to blog.

Could I have coded this differently? Absolutely! But I never expected Python to suddenly make a breaking change by removing ~ from Booleans.

oscarbenjamin · August 29, 2024, 12:51pm

Yes (PEP 335), and also because other operators were never added that could be used for these things although various proposals were made and rejected. Ultimately the Python ecosystem had to make do with the operators available and as a result there is a widespread convention that ~ is used for logical negation of booleans.

There is no reason why bool couldn’t do the same apart from the accident of history that it inherited less useful behaviour from int. Hence someone opened an issue asking to change ~bool to do the right thing but the discussion then just decided to deprecate and remove it instead.

The usefulness of an operator like ~ behaving consistently across different types is that you can write generic code that works with different types so that the same code works with either numpy boolean arrays, or sympy logical expressions or plain bool. There is no way to write a function that works like that though because bool defines ~ wrong and not does not work with anything other than bool:

>>> not np.array([True, False]) # numpy (multiple values)
...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

>>> not (x > 0) # sympy (symbol may or may not be positive)
...
TypeError: cannot determine truth value of Relational

pitrou · August 29, 2024, 1:08pm

What was your point exactly? You declared against deprecating ~bool in Python and brought NumPy as an example. But the thing is, NumPy’s boolean arrays behave differently from Python’s bool.

bjorn-martinsson · August 29, 2024, 9:19pm

This is what I mean when I say that in Numpy, ~0 is always -1.

import numpy as np
print(~np.uint64(), -1 % 2**64)
print(~np.uint32(), -1 % 2**32)
print(~np.uint16(), -1 % 2**16)
print(~np.uint8(),  -1 % 2**8)
print(~np.bool_(),  -1 % 2**1)

Output:

18446744073709551615 18446744073709551615
4294967295 4294967295
65535 65535
255 255
True 1

As you can see, np.bool_ is essentially a "np_uint1". The result of ~0 is always -1.

Since Python represents Boolean number as integer 0 and 1 (and not as “uint1”), it is completely natural that ~False should be -1.

timhoffm · August 29, 2024, 10:12pm

I was the one to propose the deprecation and authored the PR. I believe it’s helpful for the discussion to summarize the motivation and considerations:

TL;DR: ~ on bool is prone to misuse. Changing behavior to logical negation would be too risky as an API change, but disallowing it can be done without significant user impact.

Most of this is guided by “practicality beats purity”

For better or worse a number of users associate ~ with negation because many downstream libraries have that notion (numpy, sympy, …). This leads to code like if ~condition, which is a hard-to-spot bug, because the code runs, but bool(~condition) is True no matter whether condition is True or False.

The fundamental question is, can we get rid of this potential footgun? Or do we tell users to RTFM and understand that bools are ints (having more than one bit) and ~ is the bitwise inversion of the underlying int?
If we decide to remove the footgun, this is is an API change: We have two options
1. change the behavior to logical negation - I regard this as too risky. There is no good migration path. Basically we’d have to hard-switch the behavior. This is problematic for the users. If they have used it in the buggy unintended way above silently fixing is not ideal, it would be better to inform them that their code has not been working as intended so far. Additionally, we cannot exclude that there are very few users, who used that behavior in a non-broken way (in fact there are https://github.com/python/cpython/pull/103487#issuecomment-1953913848).
  Side-remark: I would not be opposed to reintroducing ~ on bool as logical negation in some far future, but I believe that could only happen after it raised for several versions. Users do not update with every Python release (sometimes they step up two or three versions) and I would want to make they still touch a version that raises.
2. prohibit ~ on bool (i.e. deprecate and eventually raise) - Technically, this breaks the Liskov Substrituion Principle. However, I claim that in practice this is not an issue. The logical negation operator is not. Practically, we do not need I should be very rare to want the bitwise inversion of the underlying int represenation of a bool (i.e. map ~False → -1; ~True → -2). But if as user really wants this, one can always write ~int(b) explicitly, which is easier to understand than ~b.
  Note also that we already have not as the logical negation operator. In practice that’s enough and we do not necessarily need ~ with the same semantics.

merwok · August 29, 2024, 10:50pm

Operator behaviour can be changed by a future import!
But adding a new magic method for this (similar to div/truediv) seems heavy-handed.

Mukundan314 · August 30, 2024, 4:31am

Is there a specific reason for implementing this as a language-level change rather than introducing a new linter rule? A linter-based approach could potentially address the confusion for new Python users while being less disruptive for existing codebases (even if they are few).

storchaka · August 30, 2024, 7:30am

This cannot be caught by linter (even if use static typing).

def f(x: int) -> int:
    return ~x
f(True)  # bool is a subclass of int