Revisiting PEP 505

Lucas_Malor · December 19, 2024, 9:29pm

Just my two cents: IMHO ?. it’s very useful even if only for checking None.

Consider for example:

liters = fridge?.milk?.liters

This sort of lookup IMHO is very common, and actually you have to write

liters = None

if fridge is not None:
    milk = fridge.milk
    
    if milk is not None:
        liters = milk.liters

Swallowing silently AttributeError is a bug magnet for this operator, and possibly a source of hours wasted to understand where the bug is.

Take into consideration also that Python is largely used by sysops, and sysops can be forced to operate on a machine where only vi is present - no IDE, no linters, no help from home.

About ?[, I’m neutral on allowing it to swallow KeyError, because in this case an accidental error is very unlikely.

guido · December 19, 2024, 9:59pm

I’m glad you’re coming around to “postfix” ? (though it’s not simply a postfix operator – it’s also a shortcut that stops evaluating following attribute/subscript accesses and calls in the same “primary”).

I would strongly object to the introduction of a new sentinel “undefined” though. One of the big conundrums I had when learning TypeScript for real was what the difference was between undefined, null and other special values. Effectively in our project we don’t use null at all, which is why I’ve come to see undefined as TypeScript’s None, and null as some historical wart.

BTW, one reason this works so well in TypeScript is that the type declares which fields are optional (may be missing) or may be undefined. This prevents the thing that Steve and Raymond are worried about – if you declare your list as a list[int], the type checker will produce an error if you are putting a None in it. (Of course, IIRC Steve and Raymond also don’t like static typing at all. So this may not alleviate their concerns. )

The type checker will also warn or error when you’re applying ? to a field that cannot be optional or None.

My big objection to having a helper for getting an attribute that requires you to spell the attribute name in quotes is that this mixes syntax and data. I see a.attr as a compile-time thing (easy to type-check) but getattr(a, "attr") as a runtime thing (requires the type checker to special-case getattr with a literal to be able to do the same check). Not to mention that it’s hard on the shift key.

mikeshardmind · December 19, 2024, 11:19pm

I’m going to focus on the traversal parts here, because I do think they are much more problematic than presented. I don’t particularly care for ?? and would prefer people stop using None in function results + gain late-bound defaults, but I’d also probably use ?? if given it.

While I can see an argument for .? behaving as shown in typescript, where type checking is required to get any use out of it, I don’t think this works particularly well for developers in even just vanilla javascript^[1], and I don’t think it would work well in python, with an optional type system that supports more than structural typing.

Unless you’re suggesting that static typing is required for a first-class experience with new features in python, this isn’t actually any different than getattr is, and I think this is actually an argument against the feature.

Even for those who like type checking and use it, there are existing places with type checking where python type checkers currently produce the wrong results due to poorly defined interactions between structural subtyping and nominal subtyping. There are even existing cases of optional members of the data model that type checkers don’t handle correctly yet. While I use static analysis where I can, it’s nowhere near good enough in python to rely upon to catch all type issues, and some of the largest issues right now are due to incorrect simplifications with structural typing.

The closest we can get right now to undefined in TS is in dicts (missing key-value pairs), and in a specific case with slotted classes that most people would consider a programming error to write intentionally. dicts, we have a means to do this already with .get, and I’m not really convinced that the syntax is more intuitive and a win for brevity. Over several discussions, people have had various opinions about what errors should and shouldn’t propagate, and even when discussing a concrete proposal, there have been multiple misuses of the proposed operator used in examples people have written to discuss with.

The slotted class version still raises an error, but it’s as close as you can get for an instance of a class to the ts example, where it’s still a static member (inspect.getmembers_static agrees that it is at runtime as well), and where static analysis should apply.

>>> class Example:
...     __slots__ = ('a', 'b')
...
>>> 'a' in dict(inspect.getmembers_static(Example))
True
>>> Example().a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Example' object has no attribute 'a'

It seems to be that the people who see a use of it have a use that is largely rooted in handling json data without actually validating it and turning it into python objects. While I don’t want to just say “don’t do that then”, it does seem like we have a number of relatively mature options both in the standard library, and a few outside of them that people could be using instead, both for just traversal (eg. glom) and also for robust structuring of data (eg. msgspec, pydantic, attrs + cattrs, mashumaro)

Is there a strong use case for this outside of handing json data? The only arguments I’ve seen for the traversal portions (.? and []?) involve externally sourced json data.

If it’s primarily this, I think we should point people to the existing libraries that help here, and consider adding something to the json module for jsonpath support. Possibly also something to the dataclasses module to parse json into dataclasses, and then put the rest of the focus on the less controversial ?? for a more immediate set of “wins”

And the prior mentioned issue of continuing to pass around incomplete data is prevalent in js, where in many other languages people handle this at where they receive data and don’t have to constantly check. ↩︎

elis.byberi · December 19, 2024, 11:23pm

I can see only 32 files where getattr(obj, 'attr', None) has been used. It’s worth noting that almost all of these usages are immediately followed by an if statement checking if the variable is None, or the expression itself is part of an if statement. Most of these usages could have been written as:

if 'attr' in obj:
    func(obj.attr)

…but that comes down to code style.

Also, I just checked my personal codebase, and None is used only for default argument values, where a mutable default argument value does not work.

guido · December 20, 2024, 12:39am

I’ve said my piece, now I’m off on vacation, maybe I’ll check in next year how it’s going. No hard feelings — in the end it’s the SC who need to be convinced, not me.

blhsing · December 20, 2024, 3:08am

In a perfect world, yes, I agree with you 100%, but we live in the reality, where we (as in my team at work) usually don’t own the often proprietary and/or legacy API/datasets we have to interact with, and where we are often under time pressure to deliver for example a query tool that extracts only a tiny fraction of fields out of huge API responses/datasets. In those cases we don’t usually waste time writing a complete schema for the API responses/datasets when all we want are a selective few fields.

Python is renowned for its flexibility, where the grammar and the data model are suited to both quick prototyping and sophisticated large projects, depending on the development speed vs. maintainability tradeoffs one is willing to make. Offering safe navigation operators may encourage “bad” API designs, but if there is little-to-none consequence from “bad” API designs, then maybe they shouldn’t be considered “bad” anymore when Python offering safe navigation operators effectively nullifies the consequences of such “bad” APIs. No time wasted in asking the vendors for a new version of the API, and no time wasted in writing verbose/less readable code to deal with optional fields in the API responses. Everybody wins and I don’t really see a loser here. Practicality beats purity, and perfect is the enemy of good IMHO.

My personal use cases aside, as shown in my examples above, CPython itself has a very flexible data model, where objects of the same “type” may or may not have specific attributes due to dynamic initializers (e.g. modules with different loaders may or may not have __file__), duck typing (e.g. “types” may or may not have __type_params__), object state (e.g. a module spec may have an _initializing attribute during initialization that other methods need to check), and backwards compatiblity (e.g. objects may or may not have __reduce_ex__ in place of __reduce__).

Having safe navigation operators will help turn codes that have to deal with those inevitable “bad” data models in our reality into much more readable and, as Guido pointed out, compiler-optimizable ones.

mikeshardmind · December 20, 2024, 3:43am

Then by all means, just type a select few fields. The proposed existing solutions don’t require you to type an entire API to use them. For example, If you type only the fields you care about with msgspec, then only those parts of the response are even loaded from the json payload, saving you memory in the process too if these persist and are passed around. You should be able to at least specify the parts your application relies upon right?

>>> import msgspec
>>> class Ex(msgspec.Struct):
...     a: int
...

>>> msgspec.json.decode(b'{"a":1,"b":2}', type=Ex)
Ex(a=1)

msgspec does this intentionally, as documented for flexibility and better compatibility when external apis add fields with schema evolution.

There’s stuff in msgspec for tagged and untagged unions, as well as optional fields as well. The other libraries I mentioned also have varying support for these things.

It’s not about perfectly modeling a potentially imperfect API outside of your control to change, it’s that there are reasonable, incredibly easy-to-use, flexible, and robust solutions that mean you don’t have unreliable data once it’s parsed.

Alternatively by using one of the traversal libraries, like glom, you can traverse to just the few fields you need.

blhsing · December 20, 2024, 4:16am

Michael H:

Then by all means, just type a select few fields. The proposed existing solutions don’t require you to type an entire API to use them. For example, If you type only the fields you care about with msgspec, then only those parts of the response are even loaded from the json payload, saving you memory in the process too if these persist and are passed around. You should be able to at least specify the parts your application relies upon right?
>>> import msgspec
>>> class Ex(msgspec.Struct):
...     a: int
...

>>> msgspec.json.decode(b'{"a":1,"b":2}', type=Ex)
Ex(a=1)
msgspec does this intentionally, as documented for flexibility and better compatibility when external apis add fields with schema evolution.

Ah, I wasn’t aware of this library. The schema evolution feature of msgspec does solve our problem of dealing with partial datasets from an inconsistent API in a reasonably low-effort manner. Will switch to it from now on. Thanks!

I suppose the rest of my argument for the idea still stands though, that safe navigation operators can help deal with optional attributes in a flexible data model much more elegantly. And again if we are to support safe navigation for attribute access we might as well make the syntax consistent for subscript access too, so in the end of day I’m still in support of PEP-505 with Guido’s postfix generalization.

Nineteendo · December 20, 2024, 8:39am

It should be:

if hasattr(obj, 'attr'):
    func(obj.attr)

Note that not everyone is on the latest version, and the problem is also that we’re now passing garbage around.

elis.byberi · December 21, 2024, 12:49am

In their current usage, yes. I was referring to the case when working with optional data. I personally implement __contains__ whenever I implement __getattr__ and similar methods.

Yes, I wasn’t paying too much attention to the details; the user may use self.get(key, {}). That was just a simple demonstration. Other libraries mentioned in previous posts are more flexible and performant as well.

achimnol · December 21, 2024, 7:29am

With the None-aware operators or “missing”-aware operators in the context of API handlers, I’d like to be able to distinguish the followings with hypothetical data[key]?:

del data[key]: the value is not present; the value is not changed
data[key] = None: the value is set to null; the value is deliberately set empty

When creating new records or in general contexts of function argument passing, it is okay to treat None to indicate some “default” value as we do with typing.Optional.

When modifying existing records or in the context of API DTO design, we should be able to distinguish unchanged (missing) vs. deliberate-reset (set-to-null) cases. For this scenario, graphene has added Undefined constant: graphql-python/graphene#1344.

Whatever the design decision is made, regardless of addition of new constants like undefined, I believe this scenario should be taken into account. Maybe, we could rename the title of PEP-505 to “missing-aware operators” to signify that, or make two separate designs for None-coalescing and graceful handling of missing keys/attributes.

barry-scott · December 21, 2024, 9:54am

I do not think that should be solved with these operators.
It will make them overly complex to understand I suspect.

You can already do the checks you describe and probably should use existing code structures to get the semantics of missing and reset clear in the code.

adamsol · December 21, 2024, 1:44pm

I’d like to add my vote for None-aware operators dealing solely with None values:

a ?? b  # equivalent to `a if a is not None else b`
a?.b    # equivalent to `__tmp.b if (__tmp := a) is not None else None`
a?[b]   # equivalent to `__tmp[b] if (__tmp := a) is not None else None`

This has always felt to me like one of the most obviously missing features in Python, especially after using languages like JavaScript or C#. These patterns are quite common, and repeatedly writing (or reading) is not None else becomes tedious.

I was surprised, while reading this thread, to find suggestions of much more complicated semantics involving intercepting AttributeErrors or KeyErrors to achieve similar functionality, or to handle additional cases of dictionary traversal. While it’s true that, even with the None-aware operators as specified above, Python will not allow for arbitrary object or dictionary traversal as easily as JavaScript, this is a result of a different language design (exceptions vs returning undefined), which should not be overridden by these operators. So:

a?.b should still raise an exception if b is not an attribute of a (and a is not None).
a['x']?.b and a?['x'] should raise an exception if 'x' isn’t a key in a.

We will still have a.get('x')?.get('y') for dictionary traversal. One could also use defaultdict(lambda: None), which would then enable making use of the ?[] operator: a['x']?['y'].

As for object traversal, PEP 505 already explicitly states that the new operators are “intended to aid traversal of partially populated hierarchical data structures, not for traversal of arbitrary class hierarchies”, which was also the reason for not including ?() (see PEP 505 – None-aware operators | peps.python.org).

barry-scott · December 21, 2024, 2:17pm

None aware as you describe I would use and have old code that would have benifited from.

I fully agree with you I dealing with use cases that include eating exceptions should not be included. I fear it woul be a source of bugs and missunderstanding.

Lucas_Malor · December 21, 2024, 7:04pm

I have to agree with you. I said I was neutral about ?[ (or []?) suppressing KeyError. But if so, why should it not suppress also IndexError? And, this code will be legal?

myval = d[key]?

I think we should consider carefully the implications of such a behavior.

guido · December 21, 2024, 7:40pm

My answer (as I wrote when I proposed this first) would be yes and yes – suppress IndexError (if we can’t LBYL it) and allow that example.

ambv · December 21, 2024, 10:03pm

Matt, your post is hidden because it breaks community guidelines:

never simply copy and paste code or answers generated by LLMs

ncoghlan · December 22, 2024, 2:41am

Splitting a None-coalescing proposal from safe navigation feels like a good idea to me. ? is already de facto reserved for this conceptual area anyway, so accepting such a proposal would just make that official, and a dedicated shorthand for lhs if lhs is not None else rhs would help eliminate some dubious uses of lhs or rhs in the same way that conditional expressions themselves eliminated dubious uses of c and a or b.

Safe navigation, by contrast, suffers from a fundamental ambiguity problem in Python (specifically, whether it implicitly handles exceptions or not) that doesn’t exist in JavaScript. In C#, the null-propagating operators do not suppress exceptions, so it would be surprising to at least some users if Python made the opposite choice.

This essential ambiguity is the origin of the “query expression” concept in Safe navigation operators by way of expression result queries (thread previously linked near the start of this discussion).

Edit: I forgot to add that I’ve been doing a fair bit of unvalidated JSON processing lately, and dict pattern matching has been entirely up to the task in the handful of cases where I might otherwise have wished for safe navigation support.

achimnol · December 22, 2024, 6:37am

Yes, as others mentioned, it would be better to handle the “missing” value cases in a some different (distinguished) way. As a code-reviewer more than code-writer these days, I’d like to have clear, defined semantics without ambiguity.

JadenCorr · December 22, 2024, 4:44pm

Totally agree with this distinguish between “Attribute is None” and “It’s not an Attribute of an object”

I really get a great improvement in my work with this operator. Chains of a.b.c if a is not None and a.b is not None else default really annoys and make the code uglier and unreadable.

On other hand, I really don’t want to have errors silencing, if c is not an attribute of existing b - it must be an AttributeError, in the same way as it works with current version of python.