Flat exception groups (alternative to PEP 654)

njs · September 1, 2021, 2:15am

The steering council asked me to write up my thoughts on PEP 654 and the alternative design that I think should also be considered, so, here’s an attempt!

Flat exception groups

Philosophy

My favorite thing about Python is the smooth, incremental learning curve. Programming is complicated, and Python is a powerful and complicated language – but it doesn’t feel complicated, because the complexity is carefully arranged so you can start being productive right away with a minimal investment, and then learn more as you go, only when needed.

For example: the semantics of Python’s ubiquitous . operator are extraordinarily complex. But you can get started with just:

# You can assign attributes to an object
some_object.my_attr = "hello world"

# You can read them back again
print(some_object.my_attr)

Later, you learn about @property. Later, __getattr__. And maybe, eventually, metaclass dunders and the difference between data and non-data descriptors and all that stuff. But at each point you can be productive, and the only reason to move onto the next step is if it helps you solve some problem. It’s like how in C++ people say “you only pay for what you use”, except in C++ they’re talking about runtime overhead, and in Python it’s conceptual overhead.

Python’s error handling is similar: it’s very common to start out by writing code that only handles the happy-path, and then add error handling incrementally when you actually see exceptions. This is very different from, say, Java’s checked exceptions or Rust’s fully-typed Results, where the language forces you to think about all the errors that could happen up front. These languages are also great! But they’re not Python.

This is dear to my heart; my goal with Trio has been to extend this Python philosophy to concurrency. With Trio, if you know that await fn() is the way to call async functions and how to use a nursery, then that’s already enough to write useful scripts. Of course there’s a ton more sophistication there when you need it, but the minimum investment is very small, and (hopefully) everything past that is a series of bite-size pieces.

PEP 654

Error handling in concurrent programs is necessarily a gnarly, complex topic, and experts are always going to have to deal with that. My concern with PEP 654 is that it forces users to confront the potential gnarliness up-front, before they need it.

First off: the point of exception groups is to be able to handle the case where concurrent code raises multiple exceptions simultaneously. But with PEP 654, if concurrent code raises just one exception, then asyncio/trio will still wrap it into an exception group. That’s because we’re worried about code like this:

# Run 'child1' and 'child2' concurrently:
async def parent():
    async with trio.open_nursery() as nursery:
        nursery.start_soon(child1)
        nursery.start_soon(child2)
        
# Simulate two tasks that can independently fail due to external conditions
async def child1():
    if coin_flip():
        raise ConnectionResetError
        
async def child2():
    if coin_flip():
        raise FileNotFoundError

This code might end up raising ConnectionResetError, FileNotFoundError, or both at once. So if asyncio/trio let solo exceptions propagate normally without wrapping, you could have:

User runs the code, gets ConnectionResetError
User adds try: ... except ConnectionResetError: ... to handle that case
User runs the code again, the exact same ConnectionResetError happens again… but their except doesn’t run, because this time they lost the race and got an ExceptionGroup([ConnectionResetError, FileNotFoundError]), and that doesn’t trigger except ConnectionResetError.

So now we’ve accidentally tricked the user into adding the wrong exception handling code – they need to go back and replace it with except* at least.

OTOH, by making wrapping unconditional, in the first step the user gets ExceptionGroup([ConnectionResetError]), so they’re alerted up-front that they might have to deal with multiple exceptions in the future. This is the same idea as Java’s checked exceptions: we’re going to force you to be prepared for things that haven’t happened yet, just in case. It’s not ideal, but it’s still better than actively sending you down the wrong path.

In summary: except doesn’t work with PEP 654 ExceptionGroups → therefore libraries are forced to use ExceptionGroups even for individual exceptions → therefore users end up getting ExceptionGroups all over the place, even in programs that never actually have multiple concurrent exceptions.

And then, once you get an ExceptionGroup, you need to know a lot to handle it appropriately. You need to understand:

what the internal nodes in the tree mean. (Usually the structure just represents the path the exceptions took, like traceback data, and only the leaves are interesting. But sometimes the internal nodes reflect semantic information, e.g. if a library subclasses ExceptionGroup to give the internal nodes custom types.)
the different options for iterating over an ExceptionGroup, and how to choose between them
how traceback information is stored (it’s spread out across the whole tree, so working with individual exceptions inside the tree – e.g. by re-raising them or accessing their __traceback__ attributes – will usually produce incomplete or corrupted traceback info)
when to use except SomeErrorType, except ExceptionGroup, except* (all three are supported, even if the first two are usually not helpful, so before you can write any of them you need to understand the pitfalls so you can pick the right one)

Again, the problem isn’t that PEP 654 can represent these details – these all reflect real complexities of the underlying domain, and experts will want to understand them regardless. The problem is that these details are front-and-center for all users whether they need them or not.

An alternative approach: “flat” exception groups

The core idea proposed here is to “denormalize” tracebacks, making “flat” exception groups (versus PEP 654’s “nested” exception groups). Every time an ExceptionGroup traverses a stack frame, that stack frame gets appended to all of the individual exceptions’ __traceback__s, rather than just a single ExceptionGroup.__traceback__. Of course we’ll renormalize when printing tracebacks, to avoid overwhelming the user with the same stack frames over and over – we have all the same data, we’re just using a different intermediate representation.

Implementation-wise, this is complicated by the way the interpreter’s unwinding code builds tracebacks in tstate->curexc_traceback, and only writes them back to exc.__traceback__ when the exception is caught. But this is easy to fix: just special case the writeback code (e.g. PyException_SetTraceback) to detect ExceptionGroup objects and pass through the writes to the exceptions inside them, instead of storing the traceback on the EG object itself. For example, exception groups could have the C equivalent of:

    @property
    def __traceback__(self):
        return None
        
    @__traceback__.setter
    def __traceback__(self, new_frames):
        for exc in self:
            exc.__traceback__ = concat_tb(new_frames, exc.__traceback__)

You can also imagine optimizations, like deferring the traceback update until someone actually accesses a __traceback__ attribute. But in any case, all this is invisible to users: from the Python level, the rule is just that every exception in an exception group has an appropriate __traceback__.

By itself, this denormalization is a small change; but its ripples spread out and affect every other aspect of the design. Now that internal nodes aren’t needed to hold traceback information, exception group objects can be a single flat list, which means there’s just one obvious way to iterate them. And if you do, it’s safe to work with the individual exceptions without any special precautions. So this immediately removes a bunch of the concepts that users would have to learn with PEP 654.

And, it doesn’t actually reduce expressiveness at all: Say you have a library that really wants to bundle up a group of exceptions into a single meaningful exception – like a HappyEyeballsError that holds multiple OSErrors representing individual connection attempts; a HypothesisError that holds multiple failures for different randomly generated test cases. The library can still do that explicitly with code like raise HypothesisError from ExceptionGroup([exc1, exc2, ...]). Now the nested exceptions will automatically be included in tracebacks, and are accessible to handling code if desired. But this way the tree structure only occurs in cases where it’s actually meaningful to the user, and users who don’t need this kind of tree structure never see it at all.

But that’s not all: since the exceptions now have self-contained metadata, it becomes possible to give plain except useful semantics (which I’ll discuss in their own section below). And if plain except can do something useful with exception groups, then asyncio/trio aren’t forced to wrap solo exceptions! Most of the time, programs don’t experience multiple simultaneous exceptions, so this hugely reduces how often users are exposed to ExceptionGroups (I’d guess by maybe two orders of magnitude?) – and it means that when users do finally see an ExceptionGroup, it’s actually relevant to them, because it means their program actually has multiple exceptions raised simultaneously.

Of course, this depends on except having useful default behavior with exception groups, which is the most complicated and controversial part of the proposal, so it gets its own section.

Exception groups and `except`

Recall our example above, where a user attempted to use a regular except to catch a ConnectionResetError from concurrent code:

async def parent():
    try:
        async with trio.open_nursery() as nursery:
            nursery.start_soon(child1)  # might raise ConnectionResetError
            nursery.start_soon(child2)  # might raise FileNotFoundError
    except ConnectionResetError as exc:
        print(f"Connection lost: {exc!r}")

The core intuition is that:

If the block raises ConnectionResetError, then the message should be printed and then the program should terminate normally.
If the block raises FileNotFoundError, then the program should terminate with a FileNotFoundError
If the block raises both exceptions, then the message should be printed and the program should terminate with a FileNotFoundError.

Or put another way:

If ConnectionResetError is raised, then the ConnectionResetError handler should run
If FileNotFoundError is raised, then the user should get a FileNotFoundError and traceback.

And if they do get a FileNotFoundError, then they can modify their program, in the obvious way:

async def parent():
    try:
        async with trio.open_nursery() as nursery:
            nursery.start_soon(child1)  # might raise ConnectionResetError
            nursery.start_soon(child2)  # might raise FileNotFoundError
    except ConnectionResetError as exc:
        print(f"Connection lost: {exc!r}")
    except FileNotFoundError as exc:
        print(f"File not found: {exc!r}")

…and now either or both messages might be printed, depending on what the child tasks do.

So the big change here is that except handles ExceptionGroups by running all matching clauses, as many times as necessary until all handleable exceptions have been handled.

This is a substantial change to except's invariants, and I expect we’ll have a lot more discussion about it :-). But I think it’s justified given that:

This only affects code that starts raising ExceptionGroups; existing programs are completely unaffected.
If you do have an ExceptionGroup([ConnectionResetError, FileNotFoundError]), then this behavior is surprising, but every other behavior would be even more surprising
For existing code, these semantics won’t be correct 100% of the time, but I think they’ll they’ll be correct more often than PEP 654’s semantics. (In PEP 654, existing handlers will never run on ExceptionGroup objects, even if it would make sense.)
For new code, this makes it easier to fall into the “pit of success”, as in our example – the first thing Python programmers try will work.

I think we do still want except*, mostly for cases where you want to print a traceback: catching a whole group at once lets you print better tracebacks, because you can merge duplicated parts. Maybe also for cases where you specifically want to handle a specific combination of exceptions in a special way? But in this proposal except* becomes much less emphasized.

Detailed semantics

(Treat this section as a first draft – it’s detailed for concreteness, but I’ve mostly been focused on the overall concepts so this isn’t super polished.)

Here’s the basic type – it’s basically an immutable list that is also a BaseException. Not much going on here:

class ExceptionGroup(BaseException, collections.abc.Sequence):
    def __init__(self, excs):
        self._exc= []
        for exc in excs:
            if isinstance(exc, ExceptionGroup):
                self._excs += exc.excs
            elif isinstance(exc, BaseException):
                self._excs.append(exc)
            else:
                raise TypeError
    
    # No subclassing -- this is a pure carrier for other exceptions, and has no semantics
    # beyond that, so subclassing doesn't make sense.
    def __init_subclass__(self):
        raise TypeError

    # Acts as an immutable sequence
    def __len__(self):
        return len(self._excs)
        
    def __getitem__(self, idx):
        return self._excs[idx]

    # Tracebacks are stored on the contained exceptions
    @property
    def __traceback__(self):
        return None

    @__traceback__.setter
    def __traceback__(self, tb):
        for exc in self.excs:
            exc.__traceback__ = concat_tb(tb, exc.__traceback__)
    
    # todo: what to do about __context__/__cause__?

And then the semantics of try/except/except*:

After running the try block, if it raised an exception:

Set unhandled to the exceptions in the exception group; or, if this is a regular non-exception-group exception, set unhandled to the singleton set containing that exception.
Set raised to an empty set.
From top to bottom, for each except or except* clause:
- If ExceptionGroup appears in list of exception types, raise a RuntimeError.
- If this is an except* clause:
  - Scan over unhandled to find all exceptions that match the requested types, and remove them from unhandled.
  - Let current = ExceptionGroup([matched exceptions])
  - Set tstate->cur_exc to current
  - Bind the except* clause’s as variable (if any) to current
  - Run the body of the except* clause
  - If this raises an exception:
    - If it’s a group, append the contents to raised
    - Otherwise, append the single exception to raised
  - Unset the as variable
- If this is an except clause:
  - If the except clause matches BaseException (either because it’s explicitly mentioned, or because it’s a bare except:) AND len(unhandled) > 1:
    - Set matched = [ExceptionGroup([unhandled])] and clear unhandled (i.e., matched is a list containing one exception, where that exception happens to be an ExceptionGroup)
  - Otherwise:
    - Set matched = [exc for exc in unhandled if matches_this_clause(exc)], and remove these exceptions from unhandled
  - for match in matched:
    - Set tstate->cur_exc to match
    - Bind the as variable (if any) to match
    - Run the body of the except clause
    - If this raises an exception:
      - If it’s a group, append the contents to raised
      - Otherwise, append the single exception to raised
    - Unset the as variable
Once all except and except* clauses have been run:
- Let raised += unhandled
  - If len(raised) == 1, then set tstate->cur_exc = raised[0]
  - If len(raised) > 1, then set tstate->cur_exc = ExceptionGroup(raised)
- Run the finally block (if any)
- If tstate->cur_exc is set, then continue unwinding

Notes:

For except + non-EG exceptions, this ends up producing the same behavior as classic try/except.
The except* behavior is identical to PEP 654 (I think)
The behavior of raise SomeError and raise ExceptionGroup([SomeError]) ends up being identical. The only way to distinguish between these is to explicitly peek at sys.exc_info() while an exception is in flight – and maybe raise should unwrap singleton ExceptionGroups to make them fully identical?

The special case for except BaseException deserves more discussion. The rationale is:

except BaseException is saying that it can handle any kind of exception; it doesn’t make any assumptions at all about the exception type. And ExceptionGroup is an exception, so it’s safe to bundle up all the remaining exceptions and pass them in together.
This means that except: and except BaseException: continue to run at-most-once, which reduces compatibility risks in existing code
This makes it possible to catch ExceptionGroup without using except* syntax, which is important for one very specific use case: writing six-style helpers that handle ExceptionGroup correctly on new Python and emulate it on old Python, without using new syntax like except*.

Possible extension: We could also steal the idea of PEP 654’s BaseExceptionGroup/ExceptionGroup split, where the two types are identical except that ExceptionGroup is guaranteed to contain only Exceptions. And then we could extend the except BaseException special case to also apply to except Exception. The advantage would be to improve backwards-compatibility by giving except Exception at-most-once semantics. In fact, we could do slightly better than PEP 654 here. Consider this code:

try:
    raise BaseExceptionGroup([KeyboardInterrupt, RuntimeError, KeyError])
except Exception as exc:
    print(f"Squashing boring exception: {exc!r}")

With PEP 654 semantics, the presence of the KeyboardInterrupt turns the whole exception into a BaseExceptionGroup, so the except Exception doesn’t catch the Exceptions. With this version, the except Exception would catch ExceptionGroup([RuntimeError, KeyError]), and then the KeyboardInterrupt would continue propagating.

EpicWink · September 1, 2021, 2:47am

What happens if multiple of the exception handlers return? Does a return break out of this loop?

njs · September 1, 2021, 1:28pm

Excellent question! It’s a bit of a weird situation, because there might be uncaught exceptions in-flight in unhandled or raised. I don’t have a strong intuition about what’s “right”, but the situation is similar to this existing code:

try:
    # something that raises an exception
finally:
    return

Here there’s an uncaught exception in-flight during the finally block.

In current Python, the return wins, and the uncaught exception is discarded. So I guess we’d copy that for return inside except/except*? And likewise for break and continue.

I’m not super excited about this solution, but it’s enough of an edge case that it’s probably fine in practice – most people will never encounter it, and if you do encounter it there are enough clues to figure out what’s going on.

guido · September 4, 2021, 12:48am

FYI, Irit, Yury and I (the authors of PEP 654) have posted a response to the SC tracker issue. For completeness here is the text we posted there:

We have read Nathaniel’s alternative proposal, and we believe that the two approaches are now clear and that they are unreconcilable. We would like your guidance on how to proceed.

The following are what we see as the main differences:

Nathaniel proposes to change the semantics of (regular) try-except such that multiple except clauses can execute (multiple times). try-except is a decades-old feature which has similar semantics in other languages, and we don’t know how to evaluate the risks of (a) backwards compatibility breakage; (b) language ergonomics and predictability when breaking away from the semantics in other languages. We added except* because we assumed that such changes to except should not be considered. We would like a clear indication from the SC whether this aspect of the proposal should be discussed further.

A primary design goal of Nathaniel’s proposal is to make it easy to iterate over an exception group. In an early draft of PEP-654, ExceptionGroup was iterable. We chose to remove that feature in order to de-emphasize iteration as the way to handle exception groups. (We believe that a correct usage pattern will need to do “if there were CancellationErrors in my async task, do X” rather than “for each CancellationError in my async task, do X”). Our point here is that the discussion about iteration is not about how to provide this capability – PEP 654 ExceptionGroup can provide an iteration API. The question is whether we should, and if it will turn out that we are wrong and this is useful, then adding it to a PEP-654 ExceptionGroup is a matter of implementing an iterator along the lines of the recipe we provide in the PEP.

While the choice of data structure that the interpreter uses internally to represent an exception group is of secondary importance relative to questions of semantics, we wish to point out that Nathaniel did not discuss how the __context__ and __cause__ links of exception groups and the exceptions nested in them will be handled. They cannot be flattened like the __traceback__ s, and this will add major complications or limitations to his design. The whole point of ExceptionGroups is to make it possible to handle multiple exceptions without loss of error information. The integrity of the exceptions information, including the cause and context links, must be preserved for this to be a robust language feature. We do not believe that the exception group data structure, which is a tree that has meaningful context/cause information on its internal nodes, can be flattened to a list without loss of information.

Irit, Yury and Guido

h-vetinari · September 4, 2021, 10:45am

I really like the ergonomics of Nathaniel’s proposal. Provided the tracebacks enable the user to understand where and why each of the exceptions originated (especially those that were raised while handling other exceptions), it feels much more natural - but I get that the context & cause situation is exactly one of the sticking points raised by the PEP 654 authors (see point 3. above)

I think it would be very intuitive for except that each instance of an Error gets a corresponding exception (since it’s the most simple from the user’s POV - getting one exception per error). For users that then get flooded by 100x the same exception, they could easily find out that changing their code to except* allows them to handle all instances of the same error at once.

Can you specify that a bit? I mean, I can read your point 1. to see the argument why you didn’t consider changing the semantics of except, but now that it’s explicitly on the table, I don’t see how letting except run exhaustively on top of underlying tree-like EGs would be irreconcilable.

Something along those lines would allow the gradual learning curve Nathaniel is talking about (i.e. it’s easy to fall in the “pit of success” even with simple except), while leaving the full control for those who want to learn about except*.

njs · September 6, 2021, 2:30am

This is challenging for sure. And I love stealing mature ideas, it’s fantastic when it works. But the problem here is… do you know of other languages that have succeeded at making concurrent error handling ergonomic? I don’t. So trying to stick close to prior art is also very risky.

Python’s async/await diverged substantially from prior languages, in that it doesn’t hard-code a Future concept into the language… and without this decision, trio wouldn’t exist, no-one would have heard of structured concurrency, and we wouldn’t be having this discussion. So that bet paid off! I also note that back in the day, Python’s threading APIs were closely inspired by Java, because Java was the state-of-the-art. But Java’s next-generation threading API is copying from Python, because apparently now we’re the most advanced. The cost of being at the head of the class is that you can’t copy other people’s homework

Of course, none of this proves that this particular proposal is a good one. But I think we should consider it on the merits, not just dismiss it because it’s novel.

Hmm, sort of, but sort of not? It’s not that I think iterating over exception groups is the best way to work with them. It’s that… our users already understand lists, they use them all the time. If an exception group is basically just a list, then it empowers our users to figure out for themselves whether they want to iterate or not. In the PEP 654 approach, we can certainly use our expert understanding to write helpers that do the Right Thing for most users. But then non-expert users just have to copy-paste our examples and hope that they do the right thing; they can’t figure it out for themselves from first principles.

That’s true, I didn’t talk about __context__ and __cause__. Mostly because I don’t know what they would mean :-). I went to the store because I needed milk; you went to the airport to pick up your friend. What’s the cause of [I went to the store AND you went to the airport]? To me cause/context seem like properties of individual exceptions, not groups of exceptions.

What are you imagining PEP 654’s intermediate nodes would do with __context__/__cause__? Do you have a use case in mind?

(FWIW: In Trio, __context__/__cause__ on MultiErrors have just been a nuisance, because Python keeps trying to tack them on and creating reference loops and stuff, and we need to be robust against that. We don’t actually use them for anything, and if ExceptionGroup just hard-coded them to None that would be fine for us. We have considered potentially abusing __context__ on intermediate nodes to record preempted exceptions. But (a) this is a gross hack because it’s not what __context__ means, and the standard __context__ traceback formatting will be confusing, (b) in the flat exception groups approach, this problem is solvable by allowing richer traceback entries. I won’t go into more detail here because that would be like, it’s own PEP :-). But the point is that AFAIK __context__ support isn’t urgent, and we aren’t ruling out further extensions to support context-like features in the future.)

yselivanov · September 8, 2021, 8:46pm

I’d like to share my perspective on the alternative semantics of try..except of the “flat exceptions” proposal. While working on PEP 654 we considered that option, as well as several variations of it. We hoped to be able to make except work with exception groups without the need for the new except* syntax, but came to the conclusion that this would not work well. We documented some of our thoughts about this in the PEP’s rejected ideas section, but it appears now that it is necessary to cover this in more detail, and I do that below.

Let me first outline the high-level difference between the proposals. With PEP 654, one can use the special except* syntax to handle exceptions raised from concurrent execution of asynchronous tasks:

try:
    async with TaskGroup() as g:
        ...
except *DatabaseNotAvailable as e:
    # This `except*` clause runs at most once; if it does,
    # `e` would be bound to an exception group containing instances
    # of `DatabaseNotAvailable`, e.g.:
    #
    #    ExceptionGroup(
    #       "", [DatabaseNotAvailable(...),
    #            DatabaseNotAvailable(...), ...])
    ...
except *SyslogNotAvailable as e:
    # This `except*` clause also runs at most once; `e` would be
    # similarly bound to a group of `SyslogNotAvailable` errors.
    ...

With the “flat exceptions” proposal you would have:

try:
    async with TaskGroup() as g:
        ...
except DatabaseNotAvailable as e:
    # This `except` clause can run multiple times; every run
    # `e` would be bound to a different `DatabaseNotAvailable()`.
    ...
except SyslogNotAvailable as e:
    # This `except` clause can also run multiple times.
    ...

With PEP 654:

Multiple except* clauses can be evaluated if a try: ... block fails. Every except* clause can be evaluated at most once.
It is prohibited to mix except and except* clauses in the same try block.

The user has to explicitly choose if they want to use the classic try..except (which PEP 654 does not alter in any way) or try..except* (which is a new language construct).

With the “flat exceptions” proposal:

Multiple except clauses can be evaluated if a try: ... block fails. Every except clause can be evaluated more than once.

The “flat exceptions” proposal changes the behavior of Python’s try..except block.
except* is also available and defined very similarly to PEP 654.

In my opinion, the “flat exceptions” proposal has serious flaws: lack of predictability and backwards compatibility issues.

Before we talk about these flaws in detail I’d like to discuss the usage pattern of the except* syntax proposed in PEP 654.

PEP 654: `except*` usage pattern

Quoting the “flat exceptions” proposal:

In summary: except doesn’t work with PEP 654 ExceptionGroups → therefore libraries are forced to use ExceptionGroups even for individual exceptions → therefore users end up getting ExceptionGroups all over the place, even in programs that never actually have multiple concurrent exceptions.

I think this is blowing it out of proportion.

I argue that the list of use-cases when one would need to reach for except* is short and distinct. The design of PEP 654 is informed by the simple fact that most of the actionable error handling is happening right where the potentially failing operation is performed.

Some examples:

Handling a KeyError around a dict operation allows to use a default value;
Handling an OSError around a block that calls low-level OS functions allows for accurate diagnostics or for trying an alternative code path;
Handling a library.DatabaseConnectionError around a block that opens and then uses a connection to a database allows to retry the operation; etc.

And here’s an example that does not make much sense:

try:
    async with TaskGroup() as g:
        for _ in range(jobs):
            g.create_task(start_job())
except *KeyError:
    # There is no context here to handle a KeyError here!
    # It should be handled in the `start_job()` implementation.

This does not make sense because when a set of concurrent tasks is running it is very hard to meaningfully interpret individual low-level errors. Correlating errors with individual asynchronous tasks is a lot of effort, which is better spent by moving the error handling logic inside those asynchronous tasks.

Which brings us to a simple set of rules:

Use try..except* around any API call that is explicitly documented to raise an ExceptionGroup, such as an asyncio TaskGroup or a Trio Nursery. This is the main motivation to add exception groups in the first place.
Use try..except* to intercept and react to control-flow errors like asyncio.CancelledError or KeyboardInterrupt. Typically this isn’t needed except in the application entry point and a few select places.
Use regular try..except in every other situation.

Lastly, libraries should not have APIs that “leak” ExceptionGroups to the user code. For the same reason as it would be a bug for a library like sqlalchemy to leak an internal KeyError to the user code. Libraries should instead handle exception groups and produce single and clear exceptions that callers can handle.

Lack of predictability

To get the obvious out of the way: many languages have a concept of try..except. It works more or less the same everywhere.

With the “flat exceptions” proposal an except clause might suddenly become a loop. This behavior will be unexpected for anybody who can read Python code but isn’t intimately familiar with this new feature.

Let’s construct an example to show how this can be confusing:

while True:
    try:
        async with TaskGroup() as g:
            for ids in groupped_ids:
                g.create_task(fetch_ids(ids))
        break
    except mydb.ConnectionError:
        await sleep(1)

The intent here is clear: if a database connection is interrupted in any of the tasks - wait 1 second and retry the entire operation. Most of the time everything would work as expected in the “flat exceptions” proposal, but some times two or more tasks would crash and the wait time would increase to two or more seconds.

The proposal does not give any visual clue for this behavior. The magic is implicit and requires people who read, write, and review code to always keep the new try..except semantics in mind.

This also affects the learning curve of the entire language. Quoting the proposal:

My favorite thing about Python is the smooth, incremental learning curve. Programming is complicated, and Python is a powerful and complicated language – but it doesn’t feel complicated, because the complexity is carefully arranged so you can start being productive right away with a minimal investment, and then learn more as you go, only when needed.

Well, with PEP 654 there is no need to know about exception groups until you start to learn async/await or an API that produces them. The learning curve is incremental.

With the “flat exceptions” proposal, one would need to be aware that an except clause can run more than once and that will show up in the documentation, and in early examples and tutorials of even simple code (otherwise those examples would be misleading and teach unsafe practices.)

Backwards compatibility

Strictly speaking both proposals are backwards compatible. If you take an existing Python code and run it with a newer Python with either proposal implemented the code would work.

The actual issue is more subtle here. The proposed “flat exceptions” semantics of try..except can lead to sporadic unexpected errors or to a surprising and hard-to-track behavior.

Sporadic unexpected errors can be illustrated with a common pattern:

resource = create_resource()
try:
    await some_code(resource)
except ResourceError:
    resource.close()
    resource = None

With the “flat exceptions” semantics, this code can produce an AttributeError: 'NoneType' object has no attribute 'close' error from time to time.

With PEP 654, if some_code() propagates an ExceptionGroup this would fail with “Unhandled ExceptionGroup exception” error. The user will learn quickly that they need to switch to except*.

The other popular pattern is to have an except Exception clause in applications and frameworks to run error reporting and potentially some cleanup code. If the “flat exceptions” proposal is adopted, people would need to thoroughly audit code like that to make sure it is reenterable.

To illustrate the surprising and hard-to-track behavior, suppose that there’s error reporting in the above example:

resource = create_resource()
try:
    await some_code(resource)
except ResourceError as e:
    await report_error(e)
finally:
    resource.close()

With the “flat exceptions” proposal, if some_code() propagates an ExceptionGroup the function would run the except clause for every ResourceError error in it. If some_code() spawns a big number of tasks, error reporting might suddenly require more resources leading to all kinds of production problems: degraded performance due to excessive logging or to exceeding the error reporting API quota.

To summarize, I believe that these examples prove the point: the sudden change of the regular try..except semantics can be quite tricky to deal with, especially in complex code bases.

ncoghlan · September 9, 2021, 11:02am

Just noting that the except/continue behaviour wouldn’t change in the flat exception groups proposal: return/break/continue control flow commands would interrupt the exception group handling, just as they interrupt finally clause execution today.

That said, Yury’s overall point regarding the riskiness of allowing existing exception handling clauses to run multiple times still stands.

FWIW, while I think the ergonomics of Nathaniel’s idea do sound potentially attractive, I’d personally side with the PEP 654 authors in considering it too great a compatibility risk compared to the more conservative “new syntax for new semantics” approach that PEP 654 takes.

yselivanov · September 9, 2021, 3:41pm

Just noting that the except / continue behaviour wouldn’t change in the flat exception groups proposal: return / break / continue control flow commands would interrupt the exception group handling, just as they interrupt finally clause execution today.

Good catch Nick, I’ve fixed that example by removing the continue command.

njs · September 17, 2021, 11:13am

Hey Yury, thanks for the thoughtful comments! I know it’s a lot of work to articulate these things, but I’m still optimistic that if we keep digging in we’ll at least understand the tradeoffs better, and hopefully even find a consensus. I’ve been working on a more detailed response about except semantics, but I’ve been struggling with health a bit this week again so it’s only ~3/4 done, sorry about that.

But while I’m finishing that up, a question – except semantics is clearly the thorniest part of PEP 654 vs flat EGs, but there’s also the somewhat separate question about which EG representation to use. In particular, flat EGs are simpler, but lose __cause__, __context__, and subclass typing for internal exception nodes. Are these things that you all still think are important? And if so, can you give some concrete examples of when they’re needed? Or would the flat representation + PEP 654 semantics for except be a viable option to consider?

guido · September 19, 2021, 3:34pm

(I’m replying for Yury & Irit here. All our responses have been jointly drafted.)

In particular, flat EGs are simpler,

We feel flat EGs have the following disadvantages:

they take up more space (because frames are duplicated);
the interpreter needs more time to update the traceback when adding a frame;
they require re-normalization for display.

but lose __cause__, __context__, and subclass typing for internal exception nodes. Are these things that you all still think are important? And if so, can you give some concrete examples of when they’re needed?

Yes.

For example, take a try block that run a group of tasks, where if any tasks fail, a further group of cleanup tasks has to be run. If some cleanup tasks fail, the resulting exception group needs context showing which of the original task group failed.

njs · September 28, 2021, 6:46am

Can I step back for a moment and say I’m getting a weird vibe here? Like, the actual conversations are apparently happening in some backchannel where I’m not allowed to participate, and all your jointly-crafted public posts are rigorously on-message and never acknowledge any downsides to PEP 654 or upsides to alternatives. I don’t know if you’re angry, or frustrated, or feel blindsided, or aren’t interested in spending energy on improving a “good enough” solution, or are just trying to perform the political games required to get proposals through python-dev’s toxic culture, or what. But I end up feeling like at some point you decided that I am actually some kind of enemy trying to waste your time with impractical nonsense.

I promise, I’m not here to try to attack you or waste your time or anything. All I wanted, and all I want, is to collaborate with my friends and come up with beautiful code where we’re satisfied we’ve picked the best tradeoffs. I don’t care if that’s PEP 654, or my proposal at the top of this thread, or some compromise, or what. I’ve been trying to discuss this stuff with you since October last year, and I’m tired of fighting about it too. I don’t know if I offended you somehow. If I did I’m sorry. Is there any way I can make amends, or repair the relationship, or something? Because this is miserable, and I don’t think it’s good for Python either. Can we talk offline, maybe do a call?

I agree, these are disadvantages. There’s a lot of room to optimize the frame duplication’s extra space/time costs – I think they’ll end up negligible in real programs. (If you disagree, I’d love to see your reasoning – maybe I missed something.) But it’s true they’re not zero. And renormalization does require some extra code when displaying tracebacks – it’s a pretty simple linear-time algorithm, just a classic prefix trie construction, and very few users write their own traceback printing code from scratch, but again, the cost isn’t zero.

But the question isn’t “do flat exception groups have costs”; it’s about weighing the tradeoffs. To me, the big thing is: I’ve spent a lot of time helping confused beginners making their first steps into writing concurrent programs, and based on that, I’m confident I can explain how to use flat exception groups, but I don’t think I can do that with PEP 654 exception groups. I’m very willing to accept some minor runtime penalties and a bit of extra code in the traceback printing libraries if it makes Python easier to “fit in your head” for all users.

I wonder if part of the difference in attitude here is coming from asyncio’s history? Asyncio is excellently designed for its era, but I think everyone agrees that writing asyncio programs is still radically more difficult than writing regular synchronous Python programs. At the time asyncio was designed, we just didn’t know how to do better than that – “easy to use” for concurrency libraries meant “possible to use correctly if you’re an expert and think really hard”. Against that background, the complex structure of nested exception groups + the attitude of “you don’t have to understand it, just use the tricky helpers someone we wrote for you” makes a lot of sense. [Edit after seeing the latest SC post: I guess this perspective is also implicit in Thomas’s comment that EGs are addressing a rare/niche use case.]

My perspective is more optimistic: concurrency is ubiquitous in the real world – we all multitask, and split up work (“you wash and I’ll dry”), it’s so natural that non-programmers take it for granted. I think the main reason programmers consider concurrency such an advanced topic is mostly not because it’s intrinsically impossible, but because our tools have always been so low-level and difficult to use. I believe we can make concurrent Python “fit in your head” almost as well as regular Python. Of course I don’t want to force concurrency on anyone, or compromise Python’s usability for sequential use-cases; but I think we can make the step from sequential to concurrent programming much, much more approachable than it is now. So I think it’s really important to make EGs “fit in your head” as much as possible, however that’s accomplished.

I’m not not attached to my exact proposal in all its details, but I do think we can find ways to simplify PEP 654 quite a bit without losing anything important.

Ah, yeah, this case worried me too! For concreteness, we’re talking about code like:

try:
    ...
except ...:
    # Concurrent cleanup version:
    async with io_lib.open_nursery() as nursery:
        nursery.start_soon(cleanup)
        ...
   # Sequential cleanup version:
   await cleanup()
   ...

In both of these cases, if cleanup() raises an exception then it’s really nice to get the original exception attached to it as __context__.

However… we already have a good place to put that info, in the __context__ on the original exception. That’s where it goes with the sequential cleanup case, and it’s still available in the concurrent cleanup case. So with nested EGs, I think what you’re talking about is actually a second redundant place to store this same information?

And it’s kind of an awkward place, in the middle of the second exception’s traceback. Consider a case where exception A leads to exception B leads to exception C. Right now you get:

[traceback for exception A]
While handling this exception, another exception occurred:
[traceback for exception B]
While handling this exception, another exception occurred:
[traceback for exception C]

and it’s represented as C.__context__ → B, B.__context__ → A.

If we use the EG’s __context__, then we have an ExceptionGroup holding:

half of C’s traceback
a __context__ pointing to A
and as payload, the exception C, which holds:
- a __context__ pointing to B
- the other half of C’s traceback

I’m not sure how you untangle that to produce the nice linear printed output we want.

That said, it’s true that setting the leaf exceptions’ __context__ is a bit awkward right now, but only because of an unrelated limitation in Python’s exception handling system: it’s not easy for asyncio/trio to propagate excinfo into new tasks, so __context__ propagation currently doesn’t work automatically across task boundaries. But, this is fixable, and then I think that would be the superior approach even with nested EGs.

So I think this is actually an example where nested EGs are slightly worse than flat EGs: they provide this extra representational option (__context__ on intermediate nodes), but it turns out to be just a red herring.

Hmm. I think you misunderstood my argument. I’m not saying “except* will be used a lot, so requiring it will make code complicated”. I actually agree with everything you wrote here; except* is only needed in relatively rare circumstances. But, this doesn’t mean users will automatically know when except* is irrelevant – they still have to figure that out themselves in each case.

With PEP 654, exception groups are ubiquitous, so users will see them and have to look them up to figure out what’s going on, and they’re hard to understand, so users will have to wrap their head around red herrings like except *, and except ExceptionGroup before they figure out that actually all they want is a regular except close to where the exception is raised.

Probably the worst part of this is allowing except ExceptionGroup – I see people trying to use this all the time with Trio’s current EG-equivalent, and it’s never what they actually want. Even if except doesn’t loop, it would still help to make except ExceptionGroup an error – then it’s at least obvious that your options really are except* for ExceptionGroups or except before you have an ExceptionGroup. (And the other simplifications of flat EGs are also still helpful.)

Have you seen the paragraph at the very end of my first post, the one that starts “Possible extension:”? I skimmed over it pretty quickly so it was easy to miss, but flat EGs can use the same trick that was added to PEP 654 for improved compatibility with existing except Exception clauses specifically. It’s not as obviously necessary for flat EGs as it is for nested EGs, but it’s still available and works just as well.

Yeah, this concerned me too, but once I thought through the details I think it’s actually fine. By assumption, your system has the resources to run a large number of tasks, and then unwind them all. Calling looping over report_error just adds a small extra amount of work for each task.

Put another way: if calling report_error for each leaf exception is prohibitively expensive, then it’s also prohibitively expensive to put try blocks inside individual tasks – they do the same thing in the end. But we don’t worry about that cost, so we shouldn’t worry this cost either.

Huh, this is a fascinating point! The first time I read this, I was like “oh whoa that’s an important insight, hmmmmm what do I make of it”. And then the second time I read it I was like “wait a second, if we take what Yury wrote literally then it makes no sense at all”. Which is a weird dichotomy!

Like, if you read the actual quote above carefully, maybe you’ll see what I mean. Yury points out – correctly – that with with PEP 654, if you never use an API that produces exception groups, then you can get along just fine without knowing them. But with my proposal, on the other hand… if you never use an API that produces exception groups, then again, you can get along just fine without knowing anything about them. Like, by definition, right? If we start by assuming EGs never happen, then you never need to deal with them; if we start by assuming that they do happen, then you do need to deal with them. This isn’t a difference between the proposals at all!

So I think there’s an important insight here, but it’s something more nuanced that we haven’t quite articulated yet. Which is cool! That usually means we’re learning something. I’ll take a stab at trying to draw out Yury’s comment into something more concrete – Yury, lmk if these cover what you’re thinking or not?

One argument I thought about while reading Yury’s comment: With the flat EG proposal, ExceptionGroups will be easier to use and more-integrated with the language. So, people will use them more than if they’re hard to use and quarantined off in the async-only box. And that means people will encounter them sooner, and that ends up making Python harder to learn.

Or put another way: EGs always make APIs worse, and should only be used if absolutely necessary. But with flat EGs the downsides are hidden better, so API designers won’t notice the problems until it’s too late, while PEP 654 EGs make their downsides more obvious so API designers will naturally shun them.

This is an interesting argument, and I keep going back and forth on it. Like – at some level yeah obviously, all else being equal, nice features are used more than awkward features But it also feels weird to argue for a feature because it’s harder to use and makes APIs rigid and harder to refactor! Neither proposal forces APIs to raise EGs – either way we can have the same documented conventions about when they’re appropriate (“it’s for concurrency, don’t use it just to be cute”) and API designers will have the same options for avoiding exposing them (“prefer raise MyLibraryError(...) from EG(...) whenever it makes sense”). And either way, they only show up if an API designer consciously decides that using them will produce a better API than not using them.

My intuition is that there are two kinds of developers out there:

Ones who like to experiment with exotic features and will raise EGs no matter what, just because they can. Their downstream users are going to have to learn about and cope with EGs no matter what design we use. Fortunately, these kinds of libraries don’t tend to see widespread usage.
Ones who are smart enough to read the docs and avoid using EGs unless it really is a good idea. For these, the most important thing is good docs and clear use cases, and flat EGs simpler and more focused design might help with that?

It’s especially weird that PEP 654 both takes the position that APIs should not raise ExceptionGroups unless absolutely necessary, and talks up the possibility of doing class HypothesisError(ExceptionGroup). In my proposal, that’s not even possible – you have to do raise HypothesisError from ExceptionGroup instead. So these two aspects of PEP 654 seem to contradict each other?

Overall I’m feeling like this topic is something to keep in mind, but it doesn’t provide much clear guidance for any specific technical questions.

Another argument I thought about while reading Yury’s comment: There are two kinds of exceptions. The ones that you expect – they happen in some well-defined situation, you know about that situation, you write code to handle it appropriately. For these, it doesn’t really matter whether except has automatic looping or not – if you’re explicitly writing code to handle an ExceptionGroup then you’ll pick the right tools for the job.

But then there are the exceptions that catch you by surprise. Your program runs into some situation that never even occurred to you as a possibility, and things start falling apart. Of course, there’s no way to handle these situations 100% reliably – by definition, your program is now in some unknown state that you don’t understand. But it’s still worth trying to do some kind of last-ditch recovery – like log an error and retry, or trying to things up before crashing the program. It’s not guaranteed to work, but often you get lucky and it works Well Enough™. Or maybe you just have some code and you want to make some predictions about what could happen if it saw an unexpected new error.

Currently, and with PEP 654’s except semantics, you can’t predict exactly what will happen with an unexpected exception, because of the whole “system is in an unknown state thing”. But you at least know that there are only two possibilities: try will either execute an except block or not, and you can make some approximate guess about what happens in each case. But, if except starts automatically looping on ExceptionGroups, then you also need to consider the looping case. So it’s increasing the number of possibilities that experts need to keep in mind, and it creates potential control-flow paths that are impossible for beginners to anticipate if they don’t even know about ExceptionGroups.

I think this might be the most important issue with automatic looping in except – the one that’s really making everyone (including me) nervous. Yury’s except mydb.ConnectionError: await sleep(1) and except ResourceError: resource.close() snippets are both examples of this issue.

The weird thing is, on the practical-to-mathematical spectrum of programmers, my natural disposition puts me way over on the mathematical side – I’m the kind of person who wants to meticulously enumerate all possible states a program could possibly get into and handle them all correctly and verify that to the maximum extent possible, and I have to fight against that urge to get stuff done. So you’d expect me to be super creeped out by the idea of these new, unaudited control flow paths through programs – and in fact, I was. Like everyone else here, I started out with the intuition that automatic looping in except was obviously unacceptable.

So what changed? Why am I even raising the possibility?

Well… I got to thinking. The vast majority of non-trivial Python programs contain some short windows where a KeyboardInterrupt will irrecoverably corrupt their internal state. And it’s… basically fine. The interpreter itself has had bugs like this that persisted for years and the world didn’t end. Theoretically, the chance of things going wrong is definitely larger than zero, but numerically, it’s rare enough that almost no-one cares, or even notices. It disappears into the background noise of programs doing weird stuff.

And it’s not just KeyboardInterrupt – it’s a particularly blatant example, but in some sense, the whole point of Python choosing an exception-based model for error handling instead of, say, Rust’s type-checked error types or Java’s checked exceptions, is to let Python users lean into the philosophy of “well we might end up in some undefined state but whatever, let’s YOLO ahead anyway and deal with it later if it’s a problem”. Practically every bytecode in Python is allowed to bail out with a variety of different exceptions, and even mypy doesn’t offer any way to track which exceptions are possible and make sure you’re handling them all. Actual existing Python programs never properly handle all possible exceptions. And it’s fine.

Of course, this isn’t an accident – it’s critical that Python does something reasonable with unhandled exceptions (propagate, crash the program, print diagnostics, etc.). It might not be the right thing for any particular case, but it’s close-enough to right, often enough, that it’s fine.

So with that in mind – for the interaction between except + EG to cause a problem, you need a case where a bunch of things line up:

you’re calling some code where you don’t understand the error-reporting API, so you can’t make any guarantees, just do “best effort” defensive programming.
The specific way you don’t understand the error-handling is that the API can raise EGs and you didn’t know about it.
you hit a situation where an EG is actually raised, which may require losing a race condition
you have a try block to handle exceptions from this API, but it’s not an except: or except BaseException: or except Exception: or finally, because those all retain at-most-once semantics in every EG proposal
the default except semantics turn out to be the wrong thing your situation

Note that everything above applies equally to PEP 654 and the flat EGs/automatic looping approach. The only difference is that they have different fallback semantics if you finally reach the bottom of that list: for PEP 654 you get an unhandled exception escaping from the except block that you might expect to catch it (err on the side of potentially running except too few times), while with automatic looping we err on the side of potentially running except too many times.

Obviously you can construct examples where either of these fallbacks are arbitrarily broken. But is one of them broken more often? None of us have collected data on this, but it’s at least plausible that automatic looping is right more often than letting unhandled exceptions escape. Do these situations even happen often enough to materially affect the bug rate of Python programs? Also unclear – again, all Python programs have bugs with unexpected exceptions. (How many of your programs handle ENOSPACE correctly?)

So… I’m not saying this argument makes a slam-dunk case for automatic looping in except being perfect and wonderful. It doesn’t. But I do think it makes the case that instead of automatically rejecting it out-of-hand, we should try to gather more data (e.g. grep through some projects and see how many try blocks are broken under each design), and consider whether the other advantages might be enough to outweigh the problems.

guido · September 28, 2021, 7:17pm

Nathaniel, your message is too long for me to read. You don’t make it easy for me to even want to read it by starting with what sounds like a personal accusation. We can chat in person (well, online) during the core dev sprint which will be in a few weeks.

ambv · September 28, 2021, 8:11pm

The prosaic truth is that co-authors of a PEP communicate with each other, as I imagine was true through the entire process of writing the PEP in question. When Guido, Jukka, and I worked on PEP 484, we did the same. We even met in person a few times to speed up some of the discussions. The end result was the PEP document, still the longest one we have

brettcannon · September 28, 2021, 9:51pm

I also view it as Guido saying he wasn’t speaking for himself but also on behalf of the other PEP 654 co-authors so everyone knew it wasn’t personal opinion that Yury and Irit may disagree with.

Another way to look at it is Guido bothered to have editors of his post instead of just tossing a reply up w/o some external proof-reading.

h-vetinari · September 28, 2021, 10:23pm

I find it troubling that the SC pronounced on PEP654 apparently without awaiting Nathaniel’s response - especially in the face of:

Python 3.11b1 is a more than half a year away, so there isn’t even any real urgency.

brettcannon · September 28, 2021, 11:45pm

There are details you might be missing. The SC went so far as to meet w/ Nathanial personally on July 5 and waited on pronouncing on PEP 654 until he was able to provide more details on his alternative proposal to accommodate him. This was after the initial PEP was posted back in February and Nathaniel participated then as well. So the timeline isn’t 28 days since this topic was opened, but 7 months (the co-authors of PEP 654 have been very patient waiting this long for the SC to make a decision).

Also please go back and read Accepting PEP 654 (Exception Groups and except*) for why we don’t think waiting longer for Nathaniel’s response was going to change our minds on his alternative proposal. But also note we left the door open for changes if a compelling case can be made even with our acceptance of PEP 654 based on lessons learned from implementing the PEP.

Dates can sneak up on you. b1 means initial implementations need to be done, stdlib needs updating, docs, discussion of all the code and APIs have to be resolved, we have to agree that nothing is changing, etc. Add on to the fact that this all takes a while due to limited volunteer time for everyone involved and it makes the timeline not as expansive as you might think (for instance, I don’t know what people’s vacations are like between now and then, people change jobs, have kids, etc.; remember the co-authors submitted this back in February when their availability may have been widely different than it is now). And if people want to argue that PEP 654 doesn’t work then nothing beats having live code showing how it doesn’t work and how there’s a better alternative (hence why we said we are opening to changing things if it makes sense, but we think getting PEP 654 implemented now is a good thing).

Delaying the decision longer also assumes the SC would have enough time to review any alternatives to give enough time for landing anything for 3.11b1 (this already took so long that this missed 3.10 even though the PEP was written over 2 months before 3.10b1).

Do understand it’s not just Nathaniel’s schedule, but all all five members of the SC and the 3 co-authors of PEP 654 we have to try and accommodate as well.

h-vetinari · September 29, 2021, 12:46am

I’m aware of the discussions all the way to back to MultiError. Still - with respect - the patience of the PEP654 authors should not be a deciding factor here. Nathaniel wasn’t available to respond in time for settling the discussion before 3.10, which is a pity, but a design that all the key domain experts can (perhaps reluctantly) agree on should IMO be more important than how long it takes to get there.

Everything you write in the following is certainly correct, but a bit of a strawman. I wasn’t saying “delay the decision indefinitely”, but “wait for the response that came a week later” (presumably the ETA of that could have been determined by reaching out). Despite the validity of your arguments, I doubt that a week or two would have seriously changed anything.

That sounds a bit like “there might be a better design, but we need something like ExceptionGroups ASAP”. Is that the view of the SC?

iritkatriel · September 29, 2021, 7:55am

You could easily find out because we provided a complete implementation with the PEP (the same one we asked you to try with Trio when we invited you to work with us on the PEP before we submitted it).

thomas · September 29, 2021, 9:57am

There might be a better design, but radically changing try/except’s semantics as proposed by Nathaniel isn’t it. We’ve received no other proposals, and PEP 654 is a complete, well-thought out solution for a problem that’s worth fixing. What’s more, we do not want to accept this at the last minute, exactly because there might be better choices available. Waiting around isn’t going to make them show up, however; practical experience with the proposed design might.

Right now the number of people with experience in the field is limited, and we want more eyes on the new API design. Up until 3.11b1 we can change the design, even revert it entirely, and we should make as good a use of that time as we can. Any delay in moving forward with PEP 654 eats into that time.