Introducing a Safe Navigation Operator in Python

jamesdow21 · October 13, 2024, 11:11am

I’m not sure if I exactly follow what you mean here, but I believe you’re asking about examples like these as comparisons to sensor.machine?.line?.department.engineer?.email

if (line := gettattr(sensor.machine, "line", None)) is not None
    address = getattr(line.department.engineer, "email", None)
else:
    address = None

if (line := sensor.machine?.line) is not None:
    address = line.department.engineer?.email
else:
    address = None

The “find an engineer for this machine” example could also then be

engineer = getattr(machine.maintenance_team, "enigneer", None)
if engineer is None and (line := machine.line) is not None:
    engineer = line.engineer or line.department.engineer

engineer = machine.maintenance_team?.engineer
if engineer is None and (line := machine.line) is not None:
    engineer = line.engineer or line.department.engineer

Or even

engineer = (
    getattr(machine.maintenance_team, "engineer", None)
    or getattr(machine.line, "engineer", None)
    or getattr(getattr(machine.line, "department", None), "engineer", None)
)

I think that last line shows another benefit of ?.

Going from machine to department engineer requires just a single optional lookup machine.line?.department.engineer whereas using getattr means that every lookup after the optional one also needs to use getattr.

And to mention it again, getattr drops my type annotations

Speaking of problems with the getattr approach, I intentionally left a typo I made in one of the strings above so there’s actually a bug somewhere up there that would have been caught when using ?.

The semantics laid out in the PEP seem clear to me, but a ton of the discussion is talking about all kinds of other behavior, so I’ve also been confused. Maybe I’m misreading and the discussion is actually more about proposing changes for how it should work instead.

Quoting (most of) the example from grammar changes section of the PEP here (I’ve taken out the await parts since they’re only included to complete the grammar specification)

For example, a.b(c).d[e] is currently parsed as ['a', '.b', '(c)', '.d', '[e]'] and evaluated:

_v = a
_v = _v.b
_v = _v(c)
_v = _v.d
_v = _v[e]

When a None -aware operator is present, the left-to-right evaluation may be short-circuited. For example, a?.b(c).d?[e] is evaluated:

_v = a
if _v is not None:
    _v = _v.b
    _v = _v(c)
    _v = _v.d
    if _v is not None:
        _v = _v[e]

The proposal is just to insert a check for None anytime you hit a ? and stop evaluating anymore if it is None

Alyssa Coghlan:

jamesdow21:
I don’t even need to remember which relationships are optional, since I can start out writing it as
address = sensor.machine.line.department.engineer.email
and mypy will show 3 [union-attr] typing errors, reminding me to put the 3 ?'s where they’re needed.
Working through the process of resolving a union-attr error like this would also make a good example.

It’s incredibly straightforward, the output from running mypy on the file where I’ve been saving all the scratch work for these examples is

$ mypy better_pep505_example.py
better_pep505_example.py:60: error: Item "None" of "Machine | None" has no attribute "line"  [union-attr]
better_pep505_example.py:60: error: Item "None" of "Line | Any | None" has no attribute "department"  [union-attr]
better_pep505_example.py:60: error: Item "None" of "Person | Any | None" has no attribute "email"  [union-attr]
Found 3 errors in 1 file (checked 1 source file)

So I need to add three ?'s: before line, department, and email

address = sensor.machine.line.department.engineer.email

becomes

address = sensor.machine?.line?.department.engineer?.email

I’ve been meaning to read that thread, but haven’t gotten a chance yet. Seems very similar to the ? Unary Postfix Operator in PEP 505’s Rejected Ideas.

I’ve looked at some of the implementations of Result and Maybe classes before and it seems quite noisy, so I would hope that any consideration of adding those also had nice ways to use them very simply, but I imagine that’s impossible to do in a type-safe way without syntax changes like what PEP 505 proposes.

For instance, using the Maybe class from the returns package, my example of

address = sensor.machine?.line?.department.engineer?.email

becomes

address = (
    Maybe.from_optional(sensor.machine)
    .bind_optional(lambda machine: machine.line)
    .bind_optional(lambda line: line.department.engineer)
    .map(lambda engineer: engineer.email)
    .value_or(None)
)

Michael H:

Not sure if that should be split off into it’s own thread or if discussing it in context would be better to ensure those in favor of this syntax can directly compare it as a reason to not accept the syntax, for now I’ll discuss here, but if it needs to be split, please just move the messages.

Perhaps a function
def extract_value(
    obj: Any,
    path: str,
    type: type[T] = object,
    on_exception: Callable[[Context], T] | Literal[MISSING] = MISSING,
    default_value: T | Literal[MISSING] = MISSING,
) -> T:
    ...
where MISSING is an internal sentinel.

would allow type safe handling and user defined behavior on access failure, but that if both default_value and an exception handler are omitted, that the path specified not existing should error.

For the “type safe” handling part of that, in my eyes, there’s not much point.
I pass in type = str | None and then a type checker says “Great! This function must return str | None”, but the type checker is not actually checking anything.

jcampbell05 · October 13, 2024, 11:22am

I think in this case the library is limited as you noted in you example without syntax changes

The idea would be this for your sample

email = sensor?.machine?.line?.department?.email

match email:
  case Result.Success:
      send_email(email.value)
  case Result.Error:
      logger.exception(“Coild not send email:”, email.exception)

mikeshardmind · October 13, 2024, 11:39am

That’s actually there for the function to check that it’s that type as part of validation. defaulting to object so that it’s entirely optional to enforce a specific type (since everything’s an object), while providing the info to a type checker should a user use one.

an actual implementation rather than ... would look something more like

def extract_value(
    obj: Any,
    path: str,
    typ: type[T] = object,
    on_exception: Callable[[Context], T] | Literal[MISSING] = MISSING,
    default_value: T | Literal[MISSING] = MISSING,
) -> T:
    try:
        value = _internal_fetch_by_path(obj, path)
    except NoSuchObject:
        if default value is not MISSING:
            return default_value
         if on_exception is not MISSING:
            return on_exception(...)
         raise
    else:
        if isinstance(value, typ):
            return value
        if on_exception is not MISSING:
            return on_exception(...)
        raise TypeMismatch

jamesdow21 · October 13, 2024, 11:53am

James Campbell:

James Dow:

I’ve looked at some of the implementations of Result and Maybe classes before and it seems quite noisy, so I would hope that any consideration of adding those also had nice ways to use them very simply, but I imagine that’s impossible to do in a type-safe way without syntax changes like what PEP 505 proposes.

I think in this case the library is limited as you noted in you example without syntax changes

The idea would be this for your sample
email = sensor?.machine?.line?.department?.email

match email:
  case Result.Success:
      send_email(email.value)
  case Result.Error:
        logger.exception(“Coild not send email:”, email.exception)

If that’s actually what the alternate proposal is, then I would say that’s much worse.

Original PEP 505 has exactly 5 scenarios

address = sensor.machine?.line?.department.engineer?.email
if email is not None:
    send_email(to=address)

sensor.machine is None so email gets assigned None
sensor.machine.line is None so email gets assigned None
sensor.machine.line.department.engineer is None so email gets assigned None
email get assigned full expression sensor.machine.line.department.engineer.email
An Exception was raised somewhere in there, for instance in a property or a __getattr__ method

For my example, I wouldn’t include the exception logging because the vast majority of the time, it’s not an exception, it is not exceptional for one of those intermediate fields to be None

So it would actually be

maybe_address = sensor?.machine?.line?.department?.email

match maybe_address:
  case Result.Success(address):
      send_email(to=address)
# there's no requirement to handle every case from a Python `match`

But now I’m actually losing the information about the only actual exceptions that could occur, because I’m inadvertently catching and ignoring them

That’s the equivalent for me taking my original PEP 505

address = sensor.machine?.line?.department.engineer?.email

and replacing it with

try:
    address = sensor.machine?.line?.department.engineer?.email
except Exception:
    pass

jamesdow21 · October 13, 2024, 12:03pm

Michael H:

James Dow:

For the “type safe” handling part of that, in my eyes, there’s not much point.
I pass in type = str | None and then a type checker says “Great! This function must return str | None”, but the type checker is not actually checking anything.

That’s actually there for the function to check that it’s that type as part of validation. defaulting to object so that it’s entirely optional to enforce a specific type (since everything’s an object), while providing the info to a type checker should a user use one.

an actual implementation rather than ... would look something more like
def extract_value(
    obj: Any,
    path: str,
    typ: type[T] = object,
    on_exception: Callable[[Context], T] | Literal[MISSING] = MISSING,
    default_value: T | Literal[MISSING] = MISSING,
) -> T:
    try:
        value = _internal_fetch_by_path(obj, path)
    except NoSuchObject:
        if default value is not MISSING:
            return default_value
         if on_exception is not MISSING:
            return on_exception(...)
         raise
    else:
        if isinstance(value, typ):
            return value
        if on_exception is not MISSING:
            return on_exception(...)
        raise TypeMismatch

What if my expected type was a Protocol, a Callable, or a type[T]? Those can’t be checked with isinstance

Also, that’s not writing type-safe code, that’s writing code that will pass a type checker and then raise a TypeMismatch at run time

jcampbell05 · October 13, 2024, 12:08pm

To add a bit more context, the Failure case would have the exception from wherever it happened in that chain

The Exception itself would express which key it was that failed.

If you only care about getting the value then you can unwrap it by doing “email.value” if you have a value then it would return that otherwise throw an error

The precise semantics around what exceptions you would get or if you would return None as a successful result are for future discussion

mikeshardmind · October 13, 2024, 12:10pm

That’s considered type-safe. calling this function will only give you back a value at runtime if it matches the type checker’s knowledge.

runtime validation of types that can’t be runtime validated with isinstance would not be in scope. You can write more complex validation yourself for those cases (or just use a more in depth validation library, the idea here is something relatively basic, but in the standard library to ease the basic script use cases without requiring every user reimplement the wheel)

pf_moore · October 13, 2024, 12:47pm

Yes, I had assumed that from how the code was clearly expected to work (I didn’t bother going back to read the proposal, because no-one ever reads the documentation ).

My point is that we need the proposal to discuss how we make sure that people don’t talk about the new feature in terms of being “like getattr()”, because that’s confusing. And how we stop people thinking that the new operators catch lookup errors. And all of the other ways in which people have misunderstood the proposal in this, and the various previous threads this has been discussed in.

If it’s as easy as it seems to have a confused understanding of how the feature works, it will be easy to (1) write code that doesn’t work the way you expect, and (2) not spot such errors in code review.

This, IMO, is the biggest problem with the proposal. It’s just very hard to be sure precisely how it works - especially given that, as I said above, no-one ever reads the documentation… Add to that the fact that LLMs have probably been trained by now on all of the discussions that have got confused over the semantics, and you’ll get people using coding assistants that suggest flat-out wrong code, based on those misinterpretations. How do we help those developers spot such errors?

ncoghlan · October 13, 2024, 1:00pm

In the specific example given, there are several cases where “x is None” and “x.attr is None” need to trigger the same code path, rather than needing to ever deal with attr not existing on a non-None value. When that’s the case, the following two examples are roughly equivalent (except for the first one silently eating the AttributeError if sensor.machine is ever set to a non-None value that doesn’t provide a line attribute):

if (line := gettattr(sensor.machine, "line", None)) is not None:
    ...

machine = sensor.machine
line = machine.line if machine is not None else None
if line is not None:
    ...

PEP 505 allows that logic to instead be written as:

if (line := sensor.machine?.line) is not None:
    ...

which is even more concise than the first formulation, but has the correct semantics shown in the second formulation (that is, it doesn’t incorrectly eat an AttributeError from a non-None sensor.machine value - those will escape as they should).

Edit: Putting the above another way:

gettattr(obj, "attr", None) is effectively short for obj.attr if hasattr(obj, "attr") else None
When obj is known to be a member of a union type SomeTypeThatDefinesAttr|None, then obj.attr if hasattr(obj, "attr") else None has exactly the same effect as obj.attr if obj is not None else None (since the only way for attr to be missing is for obj to be None instead of an instance of SomeTypeThatDefinesAttr)
as a result, gettattr(obj, "attr", None) is sometimes used as an abbreviation of obj.attr if obj is not None else None when obj is known to be a member of a union type SomeTypeThatDefinesAttr|None (since it avoids repeating the obj expression)

The benefit that PEP 505 offers in that situation is giving people a concise syntax for what they actually mean, so they’re less tempted to reach for an existing concise expression that doesn’t actually mean what they want to express, but is often close enough for practical purposes.

jcampbell05 · October 13, 2024, 1:15pm

So if I’ve understood the proposal correctly, it sounds like he wants to distinguish between the data missing because the key is missing (maybe API no longer follows the schema) and an explicit None which indicates the API respects the schema but is saying the data doesn’t exist

Python right now will throw an error for each, I.e an KeyError for a missing key and a TypeError for when we get an explicit None

So in theory you can do this

try:
  email = data[“user”][“email”]
except TypeError:
  # user explicitly not set 
  email = None

# Otherwise not sure if data reliable so allow KeyError to throw

The problem I can see if for more nested types the TypeError isn’t useful it will just say a NoneType isn’t subscriptable without indicating the key or so even having a better error message such as “NoneType cannot be subscripted for key ‘email’” will be much more useful for his case

If the desire is to expand this beyond dicts to objects in case we already have this in the form of AttributeError which gives a much more useful error message

The only advantage this syntax gives is the need to pepper try catch blocks everywhere but if we introduced it, it seems most people are after a way to get a reason for the None value if it’s due to an exception

Which end user can desire what to do with, if this method was something with a well known protocol such as a result your then perhaps it would make code review easier

But we should at least improve the TypeError message from

TypeError: ‘NoneType’ object is not subscriptable

To something like

TypeError: ‘NoneType’ object cannot be subscriptied for key “email”

efimov-mikhail · October 13, 2024, 1:20pm

Totally agree. First thought arising in me head when I read something like this “print(a?.b)” is “How exactly it works”?

Behavior is not obvious at first, and this is a huge issue for me. “Readability counts”!

flyinghyrax · October 13, 2024, 1:37pm

This matches my personal impressions as well. Languages with idiomatic errors-as-values often have syntactic support to avoid repeated nested pattern matching or binding, e.g.

Haskell has do notation
F# has computation expressions
Rust has the ? operator for error propagation

Using algebraic error handling types in languages that don’t intentionally accommodate them tends to be verbose or frustrating.

This seems fair to me. Although “null-coalescing” syntax has become more widespread, it works differently in different languages. Someone’s expectation of how such a feature works in Python would be based on their previous exposure in other languages, and they may assume they know how it works and be surprised.

Personally I think this points in favor of syntactic support over/beyond library support, because it would allow tools like linters and type checkers to catch incorrect usage. But I understand that is not a moderate position and would take significant community buy-in to be successful.

pf_moore · October 13, 2024, 2:19pm

The disadvantage is that while it’s concise, it’s not as clear. Explicit is better than implicit and all that. Even with the extensive explanation you gave, I still don’t know if the intention of obj?.attr is to return None if obj doesn’t have an attr attribute. Both of the answers “yes” and “no” have featured in various posts in this discussion.

(I know what PEP 505 says. What I don’t know is whether someone writing obj?.attr also knows what PEP 505 says ).

mikeshardmind · October 13, 2024, 2:31pm

I somewhat think this actually points the opposite direction. If this is so likely to require linting just for syntax, it’s not the right syntax for a use case that has had people saying they want this for quick scripts.

I mocked up an example of something we could make available somewhere in the standard library, and while this uses a definitively more limited syntax than any final proposal might have a reason to support for the path there, and does not currently track Where in traversal or path parsing certain exceptions are raised, shows that this could be provided at relatively low maintenance burden within the standard library.

I did do a little bit to ensure the performance shouldn’t be terrible, but it’s worse than what I would do if we had syntactic macros.

Importantly, this differentiates between why failure happened if failure happens, but it does not track a specific point of failure, assuming that for this kind of thing, you care about the path you pointed at being there, the rest of the object is irrelevant.

This somewhat maps well to the deeply nested attribute from an API that returns objects without a guaranteed structure kinds of examples which were given earlier.

import enum
import operator
import re
from collections.abc import Callable
from functools import lru_cache
from typing import Any, Literal, TypeVar

ident_pattern = re.compile(r"([A-Za-z_][A-Za-z_0-9].)")
numeric_pattern = re.compile(r"(\d+)")


class InvalidPath(Exception):
    pass


class InvalidAccess(Exception):
    pass


class _Sentinel(enum.Enum):
    MISSING = enum.auto()


MISSING = _Sentinel.MISSING

type Maybe[T] = T | Literal[_Sentinel.MISSING]


T = TypeVar("T")


@lru_cache(128)
def _minispec_to_handler(path: str, /, typ: type[T] = object) -> Callable[[Any], T]:
    traversal: list[Callable[[Any], Any]] = []

    if not path.startswith("$"):
        raise InvalidPath

    pos = 1

    while pos < len(path):
        match path[pos]:
            case ".":
                if m := ident_pattern.search(path, pos + 1):
                    traversal.append(operator.attrgetter(m.group(1)))
                    pos = m.end(1) + 1
                else:
                    raise InvalidPath
            case ":":
                pos += 1
                quote = path[pos]
                if quote in ('"', "'"):
                    close_quote_pos = path.find(quote, pos + 1)
                    if close_quote_pos < pos:
                        raise InvalidPath
                    traversal.append(operator.itemgetter(path[pos + 1 : close_quote_pos]))
                    pos = close_quote_pos + 1
                elif m := numeric_pattern.search(path, pos):
                    num = int(m.group(1))
                    traversal.append(operator.itemgetter(num))
                    pos = m.endpos
                else:
                    raise InvalidPath
            case _:
                raise InvalidPath

    def handler(obj: Any, /) -> T:
        for t in traversal:
            try:
                obj = t(obj)
            except (TypeError, LookupError, AttributeError):
                raise InvalidAccess from None

        if isinstance(obj, typ):
            return obj

        raise TypeError

    return handler


def extract_value(
    obj: Any,
    path: str,
    *,
    typ: type[T] = object,
    default_value: Maybe[T] = MISSING,
) -> T:
    """
    mini spec for path
    # $ = object root
    # .IDENT = attribute access
    # :1 = integer getitem access
    # :'key' = string getitem access
    # :"key" = string getitem access
    # string getitem access does not support escapes

    typ: optional runtime type enforcement via isinstance
    default_value: optional default only to be used when traversing to a given path isn't possible

    Raises
    ------
    Invalid Path:
        path was invalid for the mini spec
    TypeError:
        a value existed at that path, but was not of the right type
    InvalidAccess:
        The path could not be followed for the given object and a default
        value was not provided

    example use

    >>> x = {'a': [1, 2, 3]}
    >>> extract_value(x, "$:'a':1")
    2
    >>> extract_value(x, "$:'a':1", typ=str)
    TypeError
    >>> extract_value(1, "$:'a':1")
    InvalidAccess
    >>> extract_value(1, "$:'a':1", default_value=0)
    0
    >>> extract_value(x, "$:'a':42", default_value=0)
    0
    """

    handler = _minispec_to_handler(path, typ)

    try:
        return handler(obj)
    except InvalidAccess:
        if default_value is not MISSING:
            return default_value
        raise

the path spec here is intentionally simplified. I don’t think this path spec is the right one for standard library inclusion, but I wanted the focus to be on the overall “can we reasonably do this without too much burden in the standard library”, writing a better path format spec, generating the parsing needed for that, and writing better context preserving error handling are all things that can be done if people agree this path would address their needs, or at least be a workable improvement.

jamsamcam · October 13, 2024, 2:36pm

Could we at least improve this error message ? As James suggests

jamesdow21 · October 13, 2024, 6:01pm

Alyssa Coghlan:

Edit: Putting the above another way:

gettattr(obj, "attr", None) is effectively short for obj.attr if hasattr(obj, "attr") else None

When obj is known to be a member of a union type SomeTypeThatDefinesAttr|None, then obj.attr if hasattr(obj, "attr") else None has exactly the same effect as obj.attr if obj is not None else None (since the only way for attr to be missing is for obj to be None instead of an instance of SomeTypeThatDefinesAttr)

as a result, gettattr(obj, "attr", None) is sometimes used as an abbreviation of obj.attr if obj is not None else None when obj is known to be a member of a union type SomeTypeThatDefinesAttr|None (since it avoids repeating the obj expression)

The benefit that PEP 505 offers in that situation is giving people a concise syntax for what they actually mean, so they’re less tempted to reach for an existing concise expression that doesn’t actually mean what they want to express, but is often close enough for practical purposes.

I’ve realized that there is a key part of how I’ve been mentally “framing” and parsing each of these that is actually not stated in the PEP at all (and at a quick skim, the PEP seems to be rejecting outright).

When I’ve been reading obj?.attr, I think of it as obj? .attr not obj ?.attr

i.e. the question that I’m asking in the expression is “Is obj None?”, not “Does obj have attr?”

“Is obj None?” is equivalent to None if obj is None else obj.attr
whereas “Does obj have attr?” is equivalent to getattr(obj, "attr", None)

I have a rudimentary (at best) understanding of parsing grammars, but I think what I actually want is it to be a “unary postfix operator”, but the actual implementation is adding ?. and ?[...] “trailers” for weird grammar reasons that I don’t understand and just assumed must have been the easier way for internal implementation details.

There’s the unary postfix operator section under Rejected Ideas, but it’s mostly about returning a NoneQuestion type that implements dunder methods to return itself, rather than short-circuiting in the expression evaluation grammar.

It’s also addresses the problem of some other combining rules with expressions like:
What should x? + 1 mean?

My answer to that is that I would have it mean:
(None if x is None else x) + 1

Python lets me do that currently, but it’s pretty obvious that it’s risking a TypeError

In [1]: x = 1

In [2]: (None if x is None else x) + 1
Out[2]: 2

In [3]: x = None

In [4]: (None if x is None else x) + 1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 (None if x is None else x) + 1

TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

edit: I actually had put a few different times in earlier draft comments (but then removed before posting since it didn’t feel relevant) that I might like to write

engineer = machine.line?.department.engineer?

as a visual indicator that I know line and engineer could be None, and I don’t care that the final ? there is a no-op and could be removed without affecting anything

Nineteendo · October 14, 2024, 6:41am

If we want, we can actually support whitespace in between: obj "?" "." attr.

efimov-mikhail · October 14, 2024, 7:42am

Is it really possible to provide such change?
For obj=None value of obj? will be None, and None.attr should be obtained. But this expression should provide an exception.

Or “unary postfix operator” is just a name and only “?.” and “?[]” pairs will be allowed?

jcampbell05 · October 14, 2024, 8:08am

That already gives an attribute error in python

It’s just the error only says which attribute and not which object

Nineteendo · October 14, 2024, 10:51am

It’s just a name. You can define grammar with a single token or multiple.
Whitespace is allowed between tokens. obj? would be invalid syntax, but obj? .attr would be allowed. I don’t see a practical benefit of this though.