~bool deprecation

jeanas · August 30, 2024, 8:45am

I like the idea of a warning. However, I don’t see the urge to remove ~bool. As long as there is a warning to alert the user that this might not do what they intended, the change serves its purpose. Why not make it just emit a warning but keep the quirk indefinitely, just like x is 5 emits a warning but will never raise an error?

AndersMunch · August 30, 2024, 11:01am

A linter can use abstract interpretation to deduce that ~ will be applied to a bool, irrespective of type declarations.

Whether it’s worth the effort is another matter.

A more practical approach would be to instruct the linter and/or type checker that doing arithmetic on a bool or passing a bool for an int should be flagged. You would then write int(b) in the code whenever you wanted a bool b to be used as an int.

effigies · August 30, 2024, 1:14pm

Please don’t do this. Arithmetic on a bool is useful and idiomatic.

sum(predicate(x) for x in iterable)
f'{count} object{"s" * (count != 1)}'
def cmp(a, b): (a > b) - (a < b)

I would be okay with a type checker complaining about using ~ on a bool, but I would like to be able to continue to pass bools as ints without complaint. As long as the function says it operates on ints and returns ints, I don’t expect to use it in a context that preserves bools. That function may do whatever it pleases, bitwise or not, and I expect to get an integer back.

TBH, even a warning has the potential to be annoying here, but it’s probably worth the cost, given the rarity of bitwise inversion on Python ints.

NeilGirdhar · August 30, 2024, 1:25pm

I agree with you that the important thing is to ask “what’s idiomatic?” I think a lot of people forget that just because they understand how Boolean numbers map to integers, it doesn’t mean that the average reader of their code has internalized that fully.

This is why the ~bool deprecation makes a lot of sense to me: most Python readers probably don’t know what ~False evaluates to.

As for arithmetic, while sum may be idiomatic, I think & for example is not. Would the average reader know immediately what True & 4 equals?

Of course, the average reader of code depends on the project. For your personal projects, it’s just you. At work, it may be one kind of engineer, and in an open-source project, another kind. Also, the average constantly changes. As Python attracts non-programmers or new programmers, the average becomes less knowledgeable; as those programmers mature, the average swings the other way. What is idiomatic code should work in nearly all contexts, and this naturally changes with readers.

In my opinion, the preferences of experienced Python program writers do not matter. The experience level of the average Python program reader is what determines idiomatic code.

petercordia · August 30, 2024, 1:53pm

I think the average reader has problems with bit operators. Not True == 1.
&, |, ^, ~, >> etc (as they act on integers) are confusing operations for modern people who’ve never had to deal with the hyper-optimised bit-wrangling of the 80s and 90s. I don’t see how these operations have a place in Python, but people still use them, so I’d leave them be.
But if you want to prevent the footguns ~False and True & 4, it seems much more reasonable to require an import to enable bit-wise operators, rather than to mutilate booleans.

AndersMunch · August 30, 2024, 2:07pm

I do arithmetic on bools often enough myself. I just write

sum(int(predicate(x)) for x in iterable)

instead, to make it explicit that I’m using the integer value of the bool. Using a bool as int is rare enough that the extra typing is really no bother.

But my point was not to push my style onto others. I was just reacting to @storchaka’s assertion that linters can’t handle this. The point being that if you want to phase out using bool in an inty way, you are free to do that in your own code base, and tools can be made to support that. It doesn’t need a language change. And if static checkers grow a --warn-int-operations-on-bool option, then it doesn’t have to be on by default.

effigies · August 30, 2024, 2:27pm

I think it’s worth distinguishing between arithmetic and bitwise arithmetic. Python has gone out of its way to apply intuitive meanings to arithmetic on bools and strings (although / doesn’t make a lot of sense). Taking advantage of that for clear, compact expressions is what I mean by idiomatic, and that is what I do not want to see go away or be discouraged by tooling.

Bitwise arithmetic is pretty much the opposite of idiomatic, and should be recognized as a clear sign of wizards at work to the novice. It may be a failure of documentation and instruction that it is unclear and people go on to decide what these symbols should mean rather than finding out what they do mean. This is unfortunate, but making it so that f(mybool) will suddenly fail in Python 3.N because f accepts an int and calls some function three levels down that gets its job done by using a ~ seems like excessive splash damage to me.

All that said, a type checker or linter could definitely look for ~(boolean variable or expression) and error on that, as that is almost certainly a sign of misuse.

pitrou · August 30, 2024, 2:27pm

Chris Markiewicz:

Please don’t do this. Arithmetic on a bool is useful and idiomatic.
sum(predicate(x) for x in iterable)
f'{count} object{"s" * (count != 1)}'
def cmp(a, b): (a > b) - (a < b)

FWIW, if I were reviewing code like this I would probably ask the author to rewrite the code in a clear and explicit way. "s" * (count != 1) might seem smart but it’s definitely less readable than "s" if count > 1 else "".

gpshead · August 30, 2024, 3:27pm

I recommend not claiming this without an implementation deployed and accepted by users. Determining if something is actually a bool type vs an int vs anything pythonicly truthy doesn’t smell solvable sufficiently often to catch all cases to me.
(read: prove me wrong)

A conservative take: A bool is an int. It does not need conversion to pass into things taking an int. Therefore having different behavior for an operator than int should come as a surprise. People do get clever and use known bools as ints because of this. So the behavior change won’t go unnoticed. As actual users have piped up to state.

The original issue argued in favor of ~ raising on bool as users, particular newbies, mistakenly type ~ instead of - and get a surprise. The user friendly solution to that is much more drastic: stop allowing the evil ~ character which kerns like a - on many displays and font renderings and eyeballs in the language entirely. That’d be a much larger breaking change that we’re just not going to do.

I’d personally just document the logical curiosity and move on. How two’s compliment logic expresses itself on variable sized signed types always blows peoples minds. A deprecation warning suggestion “you probably didn’t want to do this” seems fine as it should only show up in REPLs or tests, but actually changing the behavior to raise… May never be feasible. Time will tell. (So thanks for extending the claimed deprecation period already, but I’m not convinced it should actually ever be realized)

carljm · August 30, 2024, 4:11pm

I do find the LSP violation here quite problematic, and sufficient reason that this “deprecation” should never go beyond a quiet-by-default pending deprecation warning (and I would prefer it didn’t even go that far). Since bool is a subtype of int (which is not likely to change, given the amount of code it would impact), there is no way that a type checker can help you avoid your code blowing up due to use of ~ on a boolean. You would just have to never use ~ on integers at all, because you never know when a caller might pass you a bool instead of an int.

Put differently: if you have a function that accepts an integer and uses ~ on it, that function is correct. If you have a caller passing a bool to a function that expects an int, that call is also correct. So which code is to blame if the combination of these fails in the future with “cannot bitwise negate a boolean”? It effectively requires any code operating on integers that uses bitwise negation to loudly document “even though booleans are integers in Python, I can’t actually handle booleans, please don’t pass me one!” And this requirement (which can only be manually documented, not automatically enforced) is infectious to all code that might ever transitively call such a function.

oscarbenjamin · August 30, 2024, 6:04pm

I don’t see why typecheckers couldn’t implement some sort of strict mode where bool is not considered a subclass of int and operations such as ~ are disallowed on bool as well as other things like 2 & True. I am sure it would be a bunch of work but it seems within the realms of what type checkers can do and are used for.

Note that in Python typecheckers don’t necessarily follow runtime semantics strictly e.g. mypy just decided at some point that strings are not iterable so a, b, c = "abc" is disallowed even though it works at runtime.

carljm · August 30, 2024, 6:14pm

They certainly could; I covered that possibility in a parenthetical. If we had a time machine, it would clearly be the best answer to have done this from the start. But making this change today would cause collateral backwards-compatibility damage that would dwarf the scale of all problems ever caused by bitwise negation of booleans.

Of course. Type checkers, like any static analysis tool, provide a conservative approximation of runtime semantics. If they followed runtime semantics strictly, then they would be runtimes, not type checkers.

This is not a parallel case, or really relevant to the point I was making at all; unless you’re just trying to reinforce that the type system could choose not to have bool be assignable to int, which I agree with.

My point is that if we have all three of “~ is usable on integers”, “bool is assignable to int”, and “~ is not usable on booleans”, the result is a bug-prone contradiction which impacts anyone wanting to use ~ on integers. To avoid this, one of these three must be eliminated. The best choice if we were designing from scratch, IMO, would be to eliminate “bool is assignable to int.” The best choice in the real world, considering backwards-compatibility impact, is IMO to eliminate (or rather, not introduce) “~ is not usable on booleans.”

ChrisBarker-NOAA · August 30, 2024, 6:24pm

yup

But making it a run-time warning, similar to what we get with, e.g.

x is 3

Though it wouldn’t be a syntax warning – is there any other precedent for a runtime warning of this nature?

I think this would end up essentially warning the “right” person:

as @carljm points out, it’s perfectly allowed for code to use ~ on an int, and it’s allowed for a caller to pass in a bool where an int is expected.

So the warning would be presented to the caller of the function that passed in a bool, not the writer of the function.

The trick is that you are warning a user about code that they may not have written, and may not understand, and the issue arises somewhere different than the code that cause it.

Would this cause more confusion than help? maybe

stoneleaf · August 30, 2024, 6:49pm

I disagree. A sudden influx of new programmers would mean that

x += 1

is no longer idiomatic, nor

[l.strip() for l in some_lines]

nor

if (m := re.match(...)):
    # do something with m

nor, for that matter, just about anything with enums.

NeilGirdhar · August 30, 2024, 7:09pm

A sudden influx of more Python programmers than there currently are? I guess that could happen even if it seems unrealistic.

But I think that is what I’m saying. Just like in descriptive linguistics, a language is defined by its speakers. If a “sudden influx” of two billion new English speakers speak it in a different way, then English has changed as far as the average speaker is concerned. In fact, something like this is theorized to have happened—it was a possible cause of the Great Vowel Shift.

Even if something like this were to happen to Python, you can keep writing Python the way you want to—just as ancient English speakers could persist in their outdated speech. You just won’t be understood by most of the other programmers.

It’s not like I’m not sympathetic to a prescriptive viewpoint. For example, I love Python’s cooperative multiple inheritance. But I’ve had to accept that many people find its corner cases confusing, and that even if these corner cases make sense to me, they’re best avoided.

jamestwebber · August 30, 2024, 7:47pm

Your first comment was about “program readers” but now we’re talking about program writers. I think that distinction matters because code is read more than it’s written, and some code is read a lot more than other code.

I think of “idiomatic python” as the kind of code that is written by experienced programmers and read by many people. Obviously that’s a fuzzy definition but “idiomatic” is a fuzzy concept.

petercordia · August 30, 2024, 11:15pm

I find the argument by @effigies and @carljm, that functions that accept ints should not ‘randomly’ fail if you put in True instead of 1 quite convincing.

I also think that treating True as 1 and False as 0 is very idiomatic Python. It’s a shorthand which saves keystrokes, is easy to use, looks clean, and allows the user to make mistakes that other languages might prevent.

What’s not idiomatic Python is bit-wise operators such as ~ (acting on ints). In Python you’re not supposed to need to worry about what’s inside a byte, or how a number is implemented. That’s why ints get arbitrarily long (and you don’t need to worry about whether they’re signed), floats are actually doubles, and division converts ints to floats.

Having given it some more thought I would double down on my statement that the best way to prevent ~bool abuse is to lock bit inversion behind an import, so that the bit-wizards can still use it but the rest of us are protected.

Ideally ~True would return
TypeError: bad operand type for unary ~: 'bool'. To invert booleans, use 'not'. Please don't use ~ on booleans.
whilst ~1 would return
TypeError: bad operand type for unary ~: 'int'. To enable ~ acting on ints (as bit inversion), add the following to your code: from __past__ import bitwiseoperators

Although using imports for this has a weakness in that potentially if you’ve imported something that enabled bitwise operators, bitwise operators are also enabled in your main python file. I don’t understand the implementation details of Python well enough to know whether that is a solvable problem.

NeilGirdhar · August 31, 2024, 4:18am

Right, that’s why I said readers. “English speakers” means people who both speak and understand English.

This is analogous to linguistic prescriptivism (the theory that the preferred usage of a language is defined by experts). In linguistics, prescriptivism has been almost totally supplanted by descriptivism. It’s not surprising that the Python forum attracts prescriptivists though.

The issue with blindly following prescriptivism is that it’s not pragmatic. Even if you think that code should be written one way, if you confuse the readers of your code, you’re not achieving your goal. It’s just like in language, if you start confusing people with who-vs-whom or odd uses of common words, then you’re no longer communicating well.

When it comes to communication (and I think programming is a kind of communication), the speaker mostly bends to the world, and not the other way around. Unless you’re Shakespeare.

BrenBarn · August 31, 2024, 7:00am

I think the difference is that, well, linguistics is descriptive, but programming language design is not. A linguistic description of a language is just a statement of how things are. The design of a programming language is a normative statement of how things should be. Programming languages aren’t natural languages to be described; they’re more like conlangs that people make up along with all their rules.

Also, part of the reason for descriptiveness in linguistics is that natural languages change via evolution, and such change can be cyclical and redundant (e.g., a language may have many ways to say the same thing, or a particular structure may fade out and then return later). We probably don’t want that in a programming language.

Maybe most important, though, there’s a reason why it’s a lot easier to become “fluent” in a programming language than in a natural language. In my view, adopting a descriptive view of programming leads to a language that many people find annoying to work with, because it is confusing, illogical, redundant, inconsistent, and so on.^[1] We are better off with normative language design.

JavaScript is a prominent example of a language that has more or less followed this path, with the result being that every year you basically have to learn a new bunch of features that remain part of the language but “you shouldn’t use” because they added a new, better way without removing the old way, even though everyone agrees it is worse. ↩︎

NeilGirdhar · August 31, 2024, 7:06am

I think we’re getting off topic. We’re not talking about whether language design is descriptive. (Obviously it’s prescriptive as you say.) My point is that what is idiomatic is descriptive. This was my original comment: ~bool deprecation - #24 by NeilGirdhar