Stop ignoring asserts when running in optimized mode

PeterL · January 19, 2022, 5:11am

It may be useful to compare how C’s assert statement works.
It is used in debug/testing phase for things that “should not happen”. (I like how @barry expressed it - “I never write assert s expecting them to ever be triggered”). One asserts that something is true - if it’s not then something is severely broken somewhere else.
Compiling in gcc or clang with -Oanything disables the asserts, so they have zero runtime effect in production.
It makes sense to me for Python’s assert statement to have the same behaviour.

eryksun · January 19, 2022, 12:11pm

The optimization level and debugger information level (e.g. symbols, source lines, etc) used by gcc and cl (MSVC) are unrelated to disabling debug assertions. I don’t know what clang does. For example, use gcc to compile a test program with -O3 optimization that calls a function that contains the single statement assert(0). Remember to #include <assert.h>. The program will still abort due to the assertion failure. Next compile it with -DNDEBUG to make the compiler ignore debug assertions. From the Linux man page for assert:

If the macro NDEBUG is defined at the moment <assert.h> was last included, the macro assert() generates no code, and hence does nothing at all. It is not recommended to define NDEBUG if using assert() to detect error conditions since the software may behave non-deterministically.

From the POSIX specification of assert:

Forcing a definition of the name NDEBUG, either from the compiler command line or with the preprocessor control statement #define NDEBUG ahead of the #include statement, shall stop assertions from being compiled into the program.

Thus far, I’m in favor of keeping debug assertions enabled by default, as in C. I’m marginally in favor of making debug assertions independent of the optimization level, as in the C compilers that I’ve used, and adding a new -X ndebug option to disable __debug__ blocks and assertions. However, I don’t know enough about what has to change to realize this and what will be affected. I assume it requires updating py_compile and compileall to support compiling in release mode and debug mode, which will in turn be reflected in the name of PYC files, in addition to the “opt-N” optimization level in the name.

PeterL · January 19, 2022, 10:30pm

Ah yes; C’s assert is a macro, so is influenced by NDEBUG.
Yes, me too, in favor of keeping debug assertions enabled by default, as in C.

notatallshaw · January 20, 2022, 1:05am

Also if you’re making an executable using something like pyinstaller this can usually save a minimum of 0.5 MB on the installer and sometimes a few MB of the produced binary.

apalala · January 21, 2022, 8:19pm

It seems that al wee need for now is the simplest option:

A interpreter argument that makes assertions be retained independently of other options.

dimaqq · February 2, 2022, 12:00am

Featured as #1 Python Security Pitfall here:

CAM-Gerlach · February 2, 2022, 12:40am

Just to note, the text doesn’t appear to imply that the order has any meaning, so it would be more accurate to say that it was one of the ten security pitfalls mentioned. Also, the article’s representation of “optimized mode” is rather vague and arguably misleading, implying that removing asserts is one side-effect rather than the documented purpose of the mode, and the example a spectacularly poor mis-use of assert as intended and documented, which the text does not really call out nearly strongly enough IMO.

markshannon · March 4, 2022, 3:51pm

This might be a little off-topic, but would it make things better or worse if __debug__ could be changed at runtime?
(Assuming no negative performance impact)

bwoodsend · December 30, 2022, 4:11am

One thing that makes assert such a tempting shortcut for raising errors is that it so nicely explains itself in the stack-trace even without providing an error description. Short of displaying the given value of port, I don’t think that the following message could be any more helpful in telling you what you did wrong:

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    assert isinstance(port, numbers.Integral) and 0 <= port <= 65535
                                                  ^^^^^^^^^^^^^^^^^^
AssertionError

Compare to whenever you do an if not condition: raise ProperExceptionType("..."), the stack-trace will point to the raise line instead of the line containing the failing condition so the string you pass to the raised exception constructor must effectively paraphrase the condition in wordy fluffy English just to get something at least as informative as the assertion was. I agree that it’s still laziness - but it certainly had me dragging my feet for a long while after I discovered that assert isn’t for raising usage errors. (I also first had to make the startling if you make your API more intuitive, you don’t need quite so many usage error checks discovery before turning 1 line assert statements into several line exceptions felt acceptable.)

One potential remedy that springs to mind (which I can’t decide if I like or not) is to add some kind of .enforce() method to the bool() class. i.e. You’d do:

isinstance(port, numbers.Integral).enforce(TypeError, "optional message")
(0 <= port <= 65535).enforce(ValueError)

That way, at least the condition would be in the stack-trace so you could usually avoid having to add additional textual descriptions. Alternatively, I suppose we could just banish the idea that oneline if statements are a code-style blasphemy since that too leads to a stack-trace with the condition in it:

if not condition: raise Error(
  "message here"
)

So your proposal is for assertions to stop being assertions? Having assertions disappear when running under -O is fundamental to what makes them assertions.

This is only true if your interpretation of the word assertion comes from being already familiar with C’s assert. To me, the word assert means check and not check …oh, but skip over it if you’re in a hurry. I don’t think that the current meaning of assert makes much sense outside the context of compiled languages. Python has no concept of compile time macros, constants or compiler directives (which is good!) so it seems odd that this single compile time setting exists. Whichever side of this fence you choose to sit on, having Python developers on both sides effectively makes both ideals unusable - you can’t use -O in case any one of your (indirect) dependencies uses asserts as error checks and you yourself can’t use assert for error checks because someone might run your code under -O. The only way to avoid trouble from both sides is to pretend that neither feature exists.

Does -OO ever make a significant difference to memory consumption? The most significant case I can find is Inada’s example of import sqlalchemy (even more so than NumPy which likes to write essay docstrings for even non-public and obvious oneliner functions) at about 9% savings in RSS (no I don’t know what the different memory stats mean) and negligible changes everywhere else. Bearing in mind that this is very much an upper bound - a realistic program does more than import libraries and the doing of stuff will eat more memory without loading more docstrings - is <9% really worth having a separate mode for?

> python memory-test.py sqlalchemy
rss 36.2 MB | vms 265.8 MB | shared 12.7 MB | text 4.1 kB | lib 0 Bytes | data 24.3 MB | dirty 0 Bytes
> python -OO memory-test.py sqlalchemy
rss 33.8 MB | vms 263.3 MB | shared 12.6 MB | text 4.1 kB | lib 0 Bytes | data 21.8 MB | dirty 0 Bytes

> python memory-test.py numpy
rss 38.4 MB | vms 601.1 MB | shared 18.0 MB | text 4.1 kB | lib 0 Bytes | data 316.2 MB | dirty 0 Bytes
> python -OO memory-test.py numpy
rss 35.6 MB | vms 597.9 MB | shared 18.0 MB | text 4.1 kB | lib 0 Bytes | data 312.9 MB | dirty 0 Bytes

Crude memory-test.py script

import sys
[__import__(i) for i in sys.argv[1:]]
import humanize
import psutil
print(*(f"{i} {humanize.naturalsize(j)}" for (i, j) in psutil.Process().memory_info()._asdict().items()), sep=" | ")

Linux packages containing Python code (including Python itself) are installed in root owned locations meaning that the Python bytecode files need to be installed as part of the package. Needing 3 variants of pycache roughly doubles the installed footprint size. This rather stings on Alpine (where the whole distribution is designed to be runable in memory) - 18MB of the 48MB python3 package is just optimised bytecode that will likely never used. As a result, running Python in -OO mode in an in memory Alpine container actually consumes about 15MB more memory than if the optimised bytecode has been removed and Python is running in its normal mode. If -OO mode could instead read the unoptimised .pyc files but skip loading the docstrings then -OO would be an optimisation but currently, it’s an un-optimisation in container-land.

The global nature of -O and -OO or production and non production is too coarse to be usable. To me, the production environment for any package I write is anywhere that isn’t that package’s own test suite. I wouldn’t even want my assertions slowing down the test suites of dependent packages. Since that per-package mode doesn’t exist (and to be honest, I’m glad that it doesn’t), and I certainly don’t want to force people to use -O to get decent performance out of my code, make do without Python’s assert (usually I find that if I break my overly long functions up and move the checks into some low level unit tests, I’m happy without them).

The same is true for -OO. pycparser which, due to it being a dependency of cffi, is a classic xkcd 2347 project. It stores its parsing rules in docstrings which vanish under -OO thus making it and all its dependencies unusable under -OO. It even knocks out Damian’s point about running PyInstaller with -OO to reduce the application size. (Admittedly, here I think the correct solution is to change the underlying parser to not use docstrings. The poor maintainer is not convinced.)

Given all I say above (apologies for the length by the way), I’d personally be strongly in favour of removing both -O and -OO modes. I think they’re both micro optimisations on when the sun is shining and far too much trouble the rest of the time.

TeamSpen210 · December 30, 2022, 8:23am

What if the optimisation modes were changed to become per-module, instead of global? The key problem seems to be that some packages may not be compatible with the optimisations, but application developers may want the slight improvements. I’m imagining some method of specifying a module/package name and one of the optimisation levels. That would then apply to it and all submodules (if any). When importing that is checked, and the module is compiled with __debug__ replaced with the appropriate LOAD_CONST. That should handle most usages of the constant, though dynamic access via builtins would be inconsistent.

This way libraries could advertise whether they do/do not support the optimisation levels, and then the author of an application could enable optimisations for all libraries that can handle it. It might be too much effort to be worth the small improvements the optimisation levels give though.

smontanaro · December 30, 2022, 12:00pm

Brénainn Woodsend:

One potential remedy that springs to mind (which I can’t decide if I like or not) is to add some kind of .enforce() method to the bool() class. i.e. You’d do:
isinstance(port, numbers.Integral).enforce(TypeError, "optional message")
(0 <= port <= 65535).enforce(ValueError)
That way, at least the condition would be in the stack-trace so you could usually avoid having to add additional textual descriptions.

You could define an enforce function which takes a boolean and an exception class. Mess around with that to see if you really like the idea.

bwoodsend · December 30, 2022, 5:52pm

Yeah, that’s true. Just fleshing it out, this

import numbers

def enforce(ok, type, message=""):
    if not ok:
        raise type(message)

port = 12345432
enforce(isinstance(port, numbers.Integral), TypeError)
enforce(0 <= port <= 65535, ValueError, "Out of bounds port number")

Gives:

Traceback (most recent call last):
  File "error.py", line 9, in <module>
    enforce(0 <= port <= 65535, ValueError, "Out of bounds port number")
  File "error.py", line 5, in enforce
    raise type(message)
ValueError: Out of bounds port number

The last frame in the stracktrace is distracting (there’s no way to address that short of writing enforce() in a C extension, right?) but the rest of the output is quite nice.

The one thing that really kills the idea for me though is that it’s dependent on formatting. If the enforce() call was written across multiple lines (which formatters like black will insist you do if the line gets too long) you get

Traceback (most recent call last):
  File "error.py", line 9, in <module>
    enforce(
  File "error.py", line 5, in enforce
    raise type(message)
ValueError: Out of bounds port number

which is useless.

EpicWink · December 31, 2022, 2:01am

You’re able to manipulate traceback stacks with the traceback module.

At a higher level, however, I’m not sure you should care: I think either let downstream developers decide which frames to mentally ignore when reading the traceback, or don’t show tracebacks at all to end users (using sys.tracebacklimit = 0),

yonillasky · January 1, 2023, 9:47pm

You could define an enforce function which takes a boolean and an exception class. Mess around with that to see if you really like the idea.

The problem with the enforce function ideas, compared to assert, is that they require message to be evaluated up front. Asserts are supposed to “never happen” in normal use, and the message might be arbitrarily costly to produce.

This would be a non-issue if Python supported inlining. I don’t know if we can expect anything like that any time soon, though.

Rosuav · January 1, 2023, 10:28pm

If it’s that costly to generate the message text, the classic if ...: raise Ex(...) syntax should be fine. Of course, if they really truoly “never happen”, then regular assertions are fine, since optimizing them out won’t be a problem.

storchaka · January 2, 2023, 7:41am

If it’s that costly to generate the message text, and you hate the if statement, you can use the or operator:

def die(type, /, *args, **kwargs):
    raise type(*args, **kwargs)

port = 12345432
isinstance(port, numbers.Integral) or die(TypeError)
(0 <= port <= 65535) or die(ValueError, "Out of bounds port number")

vovavili · January 3, 2023, 1:17am

It’s useful when you want to trigger an alternative procedure on either an error or if some condition is not met. Here is a sample from one of my scripts, where I query cache either if it is out of date, if it does not exist, if it has been corrupted or if I force the update manually:

try:
    assert not force_update
    # Check whether the locally stored cache needs an update
    timestamp = Parser().load(timestamp_filename)
    # Only prune cache if new patch has been released
    if datetime.now() > datetime.fromisoformat(timestamp["timestamp"]):
        patch = get_latest_patch()
        assert patch == timestamp["patch"]
        update_timestamp(patch)
    # Load the locally stored cache, if it exists
    data = Parser().load(cache_filename)
except (FileNotFoundError, OSError, AssertionError, KeyError):
    ....

Technically, I could do the same with a tracking variable and if...: raise ValueError statement, but the way above does not offend my aesthetic taste in any way.

Rosuav · January 3, 2023, 1:50am

Well, it’s broken code, so whether it offends your aesthetic taste or not, this code has to be assumed to be buggy. Even if a proposal like this goes through, it will be fragile code that will LOOK like it works on all versions of Python, but will be subtly broken on every version up to X.Y where the change happens.

TBH exception handling is a pretty poor way to handle force_update here anyway. Here’s how I would code that sort of logic:

def need_update():
    if force_update: return "forced"
    timestamp = Parser().load(timestamp_filename)
    if datetime.now() > datetime.fromisoformat(timestamp["timestamp"]):
        patch = get_latest_patch()
        if patch != timestamp["patch"]: return "patch"
        update_timestamp(patch)
    try:
        data = Parser().load(cache_filename)
    except (FileNotFoundError, OSError):
        return "not-found"
    return ""

if need_update():
    ...

(There’s now an encapsulation problem in that data is local to the function, but the precise solution to that depends on where you want to put the data. Alternatively, you could have the failure modes return None, as long as the data itself will never be None, and then check for that instead of looking for an empty cause-of-reload keyword. In any case, the logic is the same.)

There’s really no reason to use assert for this kind of check. Honestly, I don’t think that raise ValueError is right either, but it’s certainly less wrong than assert.

steven.daprano · January 3, 2023, 5:25am

At first glance, that looks like an abuse of AssertionError, and risky code (non-obvious bug).

I call it an abuse of AssertionError because assertions have a very clear set of well-established semantics. In the same way that we expect that SyntaxError should only be used for actual syntax errors, and ImportError should only be used for import errors, AssertionError should only be used for failed assertions and (maybe) failed tests (unit tests, regression tests, etc).

We would never use assertions in that way in Java, C/C++ or Eiffel, and we shouldn’t make a practice of it in Python either.

Whether your application runs with assertions switched on or not is not under your control as developer, it is under the control of the user running the application. If they are disabled, your application may break.

I think that perhaps the best way to think of assert is that it is a checked comment: they are comments to the reader that this statement is true, but the interpreter happens to (sometimes) actually check them at runtime too.

Checks which are expected to sometimes fail should not use assert, for the same reason you wouldn’t write a class that used the __eq__ special method to implement addition and the __add__ method to implement equality. Sure, it works, but it is bad code because it is misleading and surprising to the human reader.

Except that misusing operator overloading actually does work. Misusing assert risks breaking your code, if the end-user runs it with the -O commandline switch.

bwoodsend · January 3, 2023, 1:42pm

Even if -O was removed, I’d still consider that to be wrong. I prefer to handle the error or false condition logic scenario with by using return to escape when the condition is true:

try:
    if not force_update:
        upstream_timestamp = (whatever you need here)
        if cache.stat().st_mtime >= upstream_timestamp:
            # Everything up to date. Nothing to do
            return
except FileNotFoundError:
    pass
(refresh the cache here)