Mismatch between assert's semantics and how it's used (-O, -OO, disable)

kknechtel · July 14, 2023, 5:12pm

There is nothing confusing about the feature. It tells you right up front what it does:

-O     : remove assert and __debug__-dependent statements; add .opt-1 before
         .pyc extension; also PYTHONOPTIMIZE=x

Getting this information is as simple as python -h at the command line. You don’t even need to look up documentation.

If someone uses a flag that removes assert statements from the code, that is clearly described as such, and then the code breaks because the assert statements have been removed, then that is not a confusion about the flag; that is a confusion about how to use assert statements.

But assert in Python is not confusing, either. It works the same way, and has the same purpose, as assert in every other language that has it.

The purpose of assert is twofold:

Document to the reader of the code that something is necessarily true at a certain point in the code’s operation.
Fail loudly if the expected condition is false, in a way that the code should not attempt to recover from, so that the apparent logical contradiction can be resolved by fixing the code, which has thereby been proven to be incorrect. The problem cannot be resolved in any other way.

That does not mean “we cannot continue if this is false”. It does not mean “oh, well this file with a .json file extension actually doesn’t contain JSON at all”. It means “if this assertion fails, that fact proves that there is a logical error in the code itself”. In hypothetical code that is actually completely correct and bug-free, which uses assert properly and does not have a bug in the assertion itself, running on a machine without defects, the assert cannot fail, and evaluating it cannot impact the program state (unless we count the time taken for verification). It’s the sort of thing that points to memory corruption (undefined behaviour in languages that have that; another process scribbling on our memory; a freak cosmic ray incident…).

Code that depends on the assert being left in - i.e., because evaluating it meaningfully impacts the program state, or because it’s used for data validation - is, inherently, incorrectly written code. This is not a realistic point of confusion. It is something that I was repeatedly warned about in my own programming education, using C, more than 20 years ago.

The purpose of -O is to remove assertions and other code meant only for debug mode (i.e. guarded by the built-in __debug__ value, which is provided for the explicit purpose of guarding such code, the execution of which is not supposed to impact on program state in a way that matters in production). There is, practically speaking, only one way to achieve this purpose: by… just doing it. There is no way to be less confusing or more explicit about it, because it is a trivial task that is fully and clearly documented.

If you believe that these uses of assert are improper, file a bug report with the library.

No, not “by default” and not “for historical reasons”. It gets stripped in release mode because that’s part of the definition of release mode; and it happens for the very explicit purpose of having a “release mode” that avoids spending time on things that are only needed during development.

Using a release mode is entirely optional. Nothing prevents C programmers from compiling and distributing debug-mode executables and expecting others to use those.

However, programmers are expected to understand the proper use of assertions, in any language that has them. An educator who allows the student to know that assert exists, without clearly explaining its purpose, is ipso facto negligent.

For that matter, I’m not convinced that most Python programmers “learn about assert in their lives” - unless, of course, Python is not their first programming language and they’ve seen assert in other languages, in which case they should already understand how this works.

jamestwebber · July 14, 2023, 9:06pm

I think the primary difference between the python implementation and others is that it’s not made clear in most tutorials/instructions that running the command python is “debug mode” (this includes the official documentation…I’m sure it’s spelled out somewhere but not in some places you might expect, like the CLI instructions).

In fact the documentation makes it seem like it’s not in debug mode, because there’s a separate -d option that turns on debug mode for the parser.

So many people learn the language thinking that running python without options is the standard, “production” mode, using -O is some specialized thing they don’t need to worry about. Whereas other languages are more explicit up-front that there’s a debug mode you’ll use when developing and a release mode for later–and to be fair, it usually matters a lot more in those languages.

Rosuav · July 14, 2023, 9:08pm

Is that really a difference? Please, can you start supporting these bold assertions (pun intended) with some actual data? Give me a list of implementations or languages or compilers or whatever in which it IS made clear that the CLI runs in debug mode and will run assertions, and show that Python is somehow unusual.

PLEASE stop making these unfounded claims as if they somehow prove that Python is the one and only place where this happens.

jamestwebber · July 14, 2023, 10:30pm

The tone in this thread feels like it’s become bizarrely hostile. No one is trying to attack python or python developers. I thought the point was to discuss ways to make it more likely that people write correct code.

Karl Knechtel just said that he was taught this explicitly and repeatedly when learning programming in C. Paul Moore also mentioned that the current behavior follows his expectations because of C. NDEBUG is a standard macro in C that people learn early on when starting that language. Rust is similar in its use of profiles and the need to explicitly use --release when required, unless you’ve configured your project otherwise.

I don’t know how to prove that the python docs don’t mention __debug__ much, particularly in the tutorials and quickstart guides. It’s mentioned in the C API, and in the language specification, and tangentially in the command line docs but even then it doesn’t even say “this constant is true if you run without -O”. That’s on constants page.

A totally reasonable solution to this topic is to just to mention this earlier in the documentation and tutorials. It’s not secret, but the python docs are vast and it’s very easy to learn a lot of the language without knowing this detail.

jamestwebber · July 14, 2023, 10:56pm

I didn’t “blame the language” for anything.

It’s called expressing an opinion, dude? I think it’s reasonable. That’s my opinion.

I really don’t know why you’re getting this intense but I’ve clearly made a mistake by trying to say something again.

hauntsaninja · July 15, 2023, 12:25am

Chris, I appreciated several of your contributions in this thread.

I’m glad that you’re a strong voice for backward compatibility (this is something we need to take very seriously) and your passion for today’s Python shines through (I love this language too). Some of your messages are really useful, e.g. I thought your audit of assert statements in Lib/threading.py was a fantastic contribution to the discussion!

With all that said, I found your last couple messages quite hostile and your rhetoric needlessly inflammatory. Consider a little less flame?

James, consider sleeping a little longer between posts as way to avoid escalation.

Rosuav · July 15, 2023, 12:30am

Fair. I’m definitely stepping away from this thread. If it has any chance of being productive, it’s without me.

thejcannon · July 15, 2023, 2:30pm

So, with how strongly people feel against this change, I can’t help but think I’m simply not understanding something. And that doesn’t sit right with me, because I’d hate to be misinformed or have incorrect assumptions (the irony of which in this thread is not lost on me).

So, can someone help me see what I’m missing?

Here’s what I have for the technical:

asserts most certainly are performing exactly as documented and expected from a technical standpoint. They aren’t broken or incorrect
asserts may be stripped given certain incantations of executing Python
the usefulness of stripping out asserts is a bit of a performance bump, as the statement is not executed
this is perceived as large enough a problem to some to recommend not using asserts as a blanket security measure. Some

And then the more subjective/cultural:

Although it is documented that asserts may be stripped, not everyone who writes Python knows this. And given the popularity of Python (especially among new programmers) and the size of the community, this percentage is likely significant (maybe not large, but significant)
Largely, Python isn’t run under these incantations that strip asserts. That unfortunately strengthens the incorrect assumptions that some make.
because of this some security-minded people/tools recommend not using asserts as a way of guarding against this as a potential security concern

And then to the proposal to deprecate the current flags and introduce a new one:

Most asserts won’t change. Since most executions of Python dont strip them, it’s reasonable to assume most assert authors are OK with them being executed (even in prod situations)
the asserts that do need to change will ideally be obvious. These are perfomance sensitive code, and once the flag is deprecated, the tester/runner will know something has changed. They hopefully are also performing performance testing.
Python, a language usually touted for being beginner friendly, is now MORE beginner friendly
additionally, some places that incorrectly used asserts for verification are no longer a security concern. Python got a little bit more secure

Lastly, somewhat editorial observation:

People who cared enough about performance to strip asserts only get bitten by the deprecation when they upgrade. BUT thanks to faster CPython your code got faster just by upgrading

That’s my observations. And as they stand, from my viewpoint the pros outweigh the cons which is why I’m in favor. But, again I get the sense either my perception is wildly off of the pai upn of the cons, or in just missing additional cons.

(And please, I’m being genuine in wanting to understand how other people perceive this. I’d love if others also treated this as an educational/collaborative discussion)
.

pf_moore · July 15, 2023, 3:27pm

I’ll try. But honestly, I don’t have much more to say that hasn’t already been said, so I have little interest in further extending this discussion.

There’s a whole bunch of questions I have about your claims and assertions (even some of the “factual” points you make). But I’m going to skip over those in order to be very clear on the major, glaring problem with your proposal.

This is where your proposal is completely missing the point. You are not proposing to change asserts, you’re proposing to change -O. And in a way that will affect every current user of -O. You can’t assume that every user of -O is OK with it being deprecated. Nor can you assume that there are “too few users of -O to matter”.

Everything you say after this point (and most of what you said before) is irrelevant, because it’s not addressing the real concern with this proposal, which is that it deprecates a current behaviour of Python, without looking at the impact on users of that behaviour. Instead, you look at users of assert, who by your own claim are mostly not users of -O.

The only reason I’m still responding is because people make this mistake all the time, of making a proposal but not actually looking at who it will affect and whether the impact on them is justified or acceptable. So I appreciate that you’re trying to understand the problem here. Hopefully this helps.

PS The other proposals in this thread have different problems, and that has muddied the waters a lot. But the problem with your proposal is relatively simple. I think you’re mostly just confusing objections to other proposals with objections to yours - which is easy to do given the mess this thread has become

thejcannon · July 15, 2023, 4:20pm

Oh that helps immensely. I appreciate you taking the time and effort to chime in again with very useful perspective. I think that’s great feedback for myself, but also for the OP. Truly, thanks.

@SonOfLilit I think honestly, you have your next steps before this can reasonably continue as a PEP or as a general discussion. You need to reasonably show the impact of the change. What percent of people use -O? What’s their pain? What’s their migration like? What percent of people benefit from the proposal? Why isn’t education enough?

And most importantly, the answers to these questions should clearly show that the change is truly a net win for Python (otherwise it wouldn’t be a proposal of an enhancement )

kknechtel · July 15, 2023, 8:18pm

… Hold on, I’m confused now. Did @thejcannon make a concrete proposal that’s different from the one in the OP?

thejcannon · July 15, 2023, 8:42pm

Not all, just trying to help OP in channeling some of the energy/brains here. (As well as learn myself)

brettcannon · July 18, 2023, 7:39pm

FYI the admins have gotten a lot of flagged posts on this topic. I have turned on slow mode in hopes it cleans up, else I will lock it.

And FYI, you need to directly mention the admins or flag posts for us to see things as we do not read every single post.

SonOfLilit · July 19, 2023, 4:58pm

I agree about evidence on the effect of deprecating -O and am trying to think of how to gather it.

If anyone here knows anyone who uses -O or an industry where it is common, I would love to hear from you please.

AndersMunch · July 21, 2023, 11:22am

I use it.

It means I am much quicker to write asserts: I don’t hesitate to consider if it might negatively affect performance, because I know it won’t. Without -O, many of the asserts that I write would not be written. They would not be replaced with other checks, because they’re for something that I “know” won’t happen.

Removing -O would compel me to check all the asserts that I’ve written for the last 20 years, almost 2000 of them, to see if any needs to go. (The easy fix is to unconditionally remove every single assert, but I probably wouldn’t do that.) I’m not complaining about the work, most asserts are very simple and I could probably do that pretty quickly, but it would feel strange investing time in an effort that makes the code worse.

thejcannon · July 21, 2023, 12:58pm

I think in the OP, your remedy if you wanted to keep your asserts would be to unconditionally replace your assertion statement to use __debug__ or ...

So assert thing becomes assert __debug__ or thing. They would get executed, but the penalty would be a globals lookup and a conditional.

Have you measured the timing difference in your programs with and without -O?
Any numbers you can share?

carljm · July 21, 2023, 10:22pm

We use -O2 in production at Instagram. The primary motivation is memory savings from stripping docstrings (this we absolutely must have), but it’s currently impossible to get that without also stripping asserts.

Personally I think it would be nice if one could opt in to docstring stripping and/or assert stripping orthogonally from each other, rather than in an arbitrary layering. (In what way is stripping docstrings “more optimized” than stripping asserts?) This wouldn’t require deprecating -O or -O2, they could continue to have their current meaning, it would just mean adding new options. It would require changing the pyc file tagging strategy a bit, but that’s not a big deal.

I don’t actually know how much efficiency gain we get from assert stripping; I suspect not much. I know at some point in the distant past there was an outage because an engineer used an assert wrongly, and assert stripping in production made a problem worse than it would have otherwise been. Because of that, there’s been some cultural holdover ever since (even reified in the form of lint rules) to avoid asserts entirely So that’s a point of evidence that developers not understanding assert stripping can cause problems.

I still don’t think assert stripping as an option should ever go away, though. It is a useful tool to be able to write (possibly costly) invariant checks for development and testing, and skip checking them in production where performance matters.

fonini · July 27, 2023, 12:20am

There is no globals lookup – __debug__ is replaced by True or False as appropriate at compile-time:

$ python -m dis <<<"x = __debug__"
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (True)
              4 STORE_NAME               0 (x)
              6 LOAD_CONST               1 (None)
              8 RETURN_VALUE

$ python -O -m dis <<<"x = __debug__"
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (False)
              4 STORE_NAME               0 (x)
              6 LOAD_CONST               1 (None)
              8 RETURN_VALUE

thejcannon · July 27, 2023, 1:04am

That’s even better. Hell, some smarty pants could come along and omit the bytecode for the assert if we undeniably know it’ll evaluate to true (__debug__ or ...).

(That probably explains why assigning to __debug__ is illegal)

Rosuav · July 27, 2023, 1:14am

That’s what happens! If __debug__ is false, the bytecode for the assertion gets omitted! It’s a brilliant optimization!

Did you just reinvent the original semantics?