Revisiting PEP 505

hugovk · December 19, 2024, 4:27pm

Here’s @steve.dower explaining at EuroPython 2022 why he withdrew PEP 505:

The reason I eventually kind of withdrew that is I had I had a really interesting discussion with Raymond Hettinger at one of our core dev sprints where he basically laid out: the features that go into the language actually change the way people program. And it has far more of a follow-on effect than just how that particular thing gets written. It’s going to affect how the rest of the code is written.

So for example, right now if you’re going to append a value into a list, you may check that you’re not appending None and you’ll say, “if I have a good value i’ll put it in the list”. But if you know that when you go to get values out of the list, you’re going to None coalesce everything you’re not going to be worried about that.

And so you’ll happily throw, you know garbage values into a list, and then they get carried around passed around and show up all over the place and they may get serialised and pickled and and all sorts of stuff. Because when you come to use it that’s when you know you’re going to handle it.

And basically I had this conversation and I thought about it some more and I’m like, I don’t want to encourage that kind of coding in Python. I think being able to trust the value that you have been given, wherever it came from, is kind of a core part of Python and we don’t defensively double check everything before we use it. Because we have the code laid out in a way that we check it when we first get it and then use it and that gives us nice, straight-line readable code that doesn’t have conditions and checks and exception handling all over the place, apart from a few specific things we know about.

So I didn’t like the idea of: if I get this feature in everyone’s going start putting Nones in lists everywhere, and instead of [1, 2, 3] we’ll have [1, None, None, None, 2, None, None, None, None, None, 3]. So I didn’t like the implication of what what coding might turn into with that feature there.

noahbkim · December 19, 2024, 4:42pm

First, thanks for mentioning this. I hadn’t seen the talk and I appreciate the perspective. This is just a response to Steve’s point

I’m nowhere near getting invited to give a talk at EuroPython, but this comes off to me as nothing more than a slippery slope argument. The same reasoning could be used to argue for null coalescing, e.g. the inclusion of this feature will stop people from writing a or b and use the safer a ?? b, which is a great thing!

More broadly, I think his point misses the fact that in addition to controlling how the feature is implemented, we also control how it is presented and taught. I think the ?? and ??= operators are at least solidly defensible in this regard.

noahbkim · December 19, 2024, 5:24pm

I’d like to float another possible (though perhaps too radical) path to facilitating safe navigation with my preferred specification for ?. (where foo?.bar is identical to foo.bar if foo is not None else None and foo["bar"] works similarly): adding .get() to list and tuple.

Traversing optional JSON paths becomes as simple as using .get as needed:

response.get("users")?.get(0)?["username"]

You could then implement safedict and safelist, which return None instead of raising KeyError and IndexError respectively to achieve the original syntax (exercise left to the reader):

response = safedict(response)
response["users"]?[0]?["username"]

You could just as easily emulate safe attribute traversal by implementing a similar wrapper where __getattr__ returns None. This way users who want error suppression can opt in without forcing users who don’t to participate.

Nineteendo · December 19, 2024, 5:29pm

Can we make this somehow work for an undefined constant? Like in Javascript.

>>> d = defaultdict(lambda: undefined)
>>> d["a"]
undefined

Although in javascript you get undefined with safe traversal (and not null):

> d = null
> d?.a
undefined

Liz · December 19, 2024, 5:39pm

I think ?? makes a lot of sense. .? and []? don’t seem defensible to me.

The parallel people have made with js doesn’t feel right to me, because I’ve never had the problems people have with js objects in python. I don’t deal with objects that are pathologically bad in python. Python’s None feels much more like C’s null than js’s undefined.

Some people brought up Zig’s orelse. Zig has a c-compatible null and optional types. From this same lens zig’s definitions of .? is to unwrap an optional you know can’t be optional anymore. It’s actually short for orelse unreachable, and in safe compile modes, this results in a panic if it is null, not suppression.

I think late-bound defaults would help reduce the number of places people have None to handle, but I think value ?? default is still an improvement over default if value is None else value and that these aren’t mutually exclusive improvements.

steve.dower · December 19, 2024, 5:42pm

The alternative to it being a “slippery slope” is that it becomes an unused and unloved feature. We should want new syntax to influence the way people code, otherwise why are we going through the rigmarole of changing the language.

The match statement was added specifically to help guide users away from runtime type checks and towards structural matching. (Whether it’s achieved that yet is irrelevant.)

If you don’t care about your syntax proposal having an effect on how people code, then just say so, but I expect people will take the proposal less seriously in that case.

Alternatively, if you think it’ll have a different effect on how people will code (a different “slippery slope”, if you will), then describe it and convince people it’s a good idea.

A proposal to add syntax just for the sake of syntax is going to find opposition from all sorts of directions. The best example of that is this actual proposal in its various attempts over the years.

guido · December 19, 2024, 5:42pm

I’d like to go back to basics, trying to make my case.

Here’s a typical use case. Recently I work with these types of schemas in TypeScript all day. (I don’t write them, but I use and maintain code that depends on them).

Here’s a very simple example. I don’t present this to be criticized – please accept that this is a common style of describing things in our project, and I believe this style is common throughout the TypeScript (and JavaScript) world.

export type Event = {
    day: string;
    timeRange?: EventTimeRange;
    translatedDate?: string;
    description: string;
    location?: string;
    participants?: string[];
};
export type EventTimeRange = {
  startTime?: string;
  endTime?: string;
  duration?: string;
};

(The notation name?: type in TS means “field may be present, with this type, or field may be missing”. Attempts to access such optional fields if the field is missing return undefined, which is close to Python’s None in meaning, though undefined is also returned for cases where Python would raise e.g. AttributeError or KeyError.)

This data structure is common and it’s designed to be transmitted to JSON. The transformation to JSON omits fields that are missing from the in-memory data structure, for compactness (JSON is verbose, which makes it expensive to transmit; there’s a general culture to keep it short were possible).

So e.g. a minimal Event instance could be

{
  "day": "tomorrow",
  "description": "lunch"
}

Here, there is no location given, and the system can’t “fix” that by making one up. Ditto for the duration or time range, and the participants. (This might represent a calendar Event that the user intends to update at a later time.)

If I received that JSON blob in Python, I’d have a nested dict looking exactly like that. Today, if I wanted to look at the event’s start time (defaulting to None if not given), I’d have to write something like

start_time = None
if "timeRange" in event and "startTime" in event["timeRange"]:
    start_time = event["timeRange"]["startTime"]

Using the ? notation to mean what I’d like it to mean, I could write this as

start_time = event["timeRange"]?["startTime"]?

Here, THING[KEY]? must have two effects:

First, if THING has no key KEY, the expression evaluates to None.
Second, if the expression is None, subsequent [KEY], .ATTR, and (ARGUMENTS) operations are not evaluated and the result of the entire “primary” is None.

A “primary” is a Python grammatical concept, it always starts with an “atom” (a name, certain reserved keywords, or something parenthesized or bracketed) and is followed by zero or more “suffixes” (this I made up), and a suffix is either an attribute-taking (P.NAME), a subscription (P[KEY] or P[SLICE...]), or a call (P(ARGUMENTS)). Some grammar changes are needed to describe it as a chain rather than recursively. Possibly P[SLICE...] does not catch exceptions but only checks for None.

Detail that may be skipped on first reading: In e.g. (foo.bar).baz, the first primary is foo.bar and the power of ? if present would be limited to that – (foo?.bar).baz skips the .bar lookup if foo is None, but it would evaluate None.baz. However, in foo?.bar.baz, both the .bar and the .baz lookup are skipped if foo is None. This would require some tweaks to the grammar, but the scope of ? should be carefully limited.

There would also be a THING.NAME? operation that would work similarly for Python objects with optional fields (possibly created from JSON using e.g. https://github.com/nandoabreu/python-dict2dot, but possibly just due to some other legitimate design choice).

For calls, THING(ARGUMENTS)? would not catch exceptions from the call, but it would skip the rest of that primary, returning None right there.

PS. I’d prefer not to describe THING[KEY]? and THING.NAME? as catching exceptions – I would describe them as “if the key/attribute does not exist in THING the result is None rather than an exception.” This can be described formally as KEY in THING, and for attributes we can use getattr(THING, "NAME", None). We might eventually introduce new dunders so some types can implement the combined operation faster, but that’s not necessary in the first round. If those dunders existed, they could still raise KeyError, IndexError or AttributeError (from internal bugs), and those would not be caught. When not implemented, a default implementation that does catch exceptions might used as a fallback (like hasattr()).

noahbkim · December 19, 2024, 6:25pm

I appreciate you asking these questions because they’re helping me interrogate my motives a bit more carefully. What I want out of PEP 505 is just syntax–in the same way decorators are just syntax. It is my belief that, for as long as there is a need to write a if b is not None else c, the proposed ?., ??, and ??= operators will justify their existence with conciseness and readability (and be preferred by users for those reasons). In a minority of cases, I think they will also steer users away from or and its footguns, producing safer code. I don’t believe they will make users less attentive to the values they’re operating on, but instead more aware of the space of types they’re working with (although I worry this won’t be the case if we go the safe navigation route).

I’ll admit that this is heavily influenced by my own desire for the feature, though. Not to mention my various biases regarding typing, code style, etc.

Addendum: to be explicit about my conciseness and readability claim: I think there is significant value in being able to read code in a single pass from left-to-right. As close to English as a if b else c might be, it requires you to first examine/verify b is what you expect, then review/reason about a and c in context. This is exacerbated exponentially if you chain more than one if/else in a single expression. foo?.bar does not suffer from this even when chained.

steve.dower · December 19, 2024, 6:35pm

Close, but the need is to write a.b if a is not None else None, which is a code pattern that can be searched for to verify whether there is a need.

Arguably a.b if a is not None else c is another potential pattern (a?.b ?? c). Similarly getattr(a, "b", None) or getattr(a, "b", c) or getattr(a, "b", None) or c.

The range of constructs this feature solves is smaller than a if b else c because a is constrained to directly depend upon b. So as a result, the benefit of a solution that only handles the narrower case is smaller than a solution that handles a broader case.

All of which is still unrelated to the question that put me off: should be encourage this case in the first place? Why is a == None at all? Are we better off disallowing that? (I mostly only have personal opinions on this, though I will point out that despite C# adding ?. and friends, they still saw value in letting developers disallow null entirely via the type system. So you could argue that ?. wasn’t good enough to justify the underlying pattern of passing/returning null to functions that expect a value.)

cdce8p · December 19, 2024, 7:02pm

Why not use get for NotRequired keys? This doesn’t require catching any exceptions with ?.

start_time = event.get("timeRange")?.get("startTime")

noahbkim · December 19, 2024, 7:05pm

Ah shoot, you’re right. It should be:

a if a is not None else b becomes a ?? b, sometimes a ??= b in context
a.b if a is not None else None becomes a?.b
a.b if a is not None else c becomes a?.b ?? c

I do think the latter reads better in all cases, but I agree that we’re covering fewer cases than a if b is not None else c as I originally wrote.

Could you clarify what you’re asking here? As Guido wrote above,

Are you saying that we should steer users away from using None in their own code and APIs? What would the alternative be? I would venture to say that None as an orthogonal, sentinel value is canonical at this point and for good reason: it’s descriptive, efficient, and the only value of its type, simplifying typing. I’m not sure what other path there might be without the introduction of some zero-cost Option type.

steve.dower · December 19, 2024, 7:54pm

Look into where None comes from. The most common cases are that it’s passed in as an argument, or that it’s returned from a function.

Now ask why it’s been passed in. For the argument case, it’s probably to indicate “don’t care” or “don’t use”, and for the return case it’s to indicate “no result”.

So the alternatives for arguments would be to not require users provide parameters that they don’t care about, or include meaningful defaults. This isn’t always the best design, and has technical issues, which have been raised above. I personally use None as a default argument value often, but try to replace it with a no-op instance as quickly as possible, so at most I’d use ?? once per argument.

The alternative for “no result” return values are exceptions or default arguments (i.e. the value to return when there’s no result). You may end up with None coming out because you passed None in, but that’s your own fault.

Once you mostly have set up your code to not return None as if it’s a valid result, or to not be easier to pass None than to use an appropriate API, I think you’ll find that you are so rarely doing None checks that the syntax here doesn’t add much value.

Now, it’s all different if you’re trying for exception-free traversal. I’m not referring to that case at all with this analysis. But in that case, the semantics are so much less clear that it’s going to be really hard to choose the set that make sense when you read them (as in, you can very easily grasp the intent, even if the actual behaviour varies slightly) and also have it be easy to discover when you’ve misused them. Plus I think there are better, non-syntax ways to do traversal.

noahbkim · December 19, 2024, 8:02pm

Setting aside the technical difficulty of implementing it this way, I’m starting to come around to this idea of a postfix ?. Especially given that linters could be made to warn about nonexistent attribute accesses in typed scenarios (which is all I really care about in that regard). Though I still think it would be best as a separate PEP.

A pipe dream: it would be particularly cool if this design could preserve explicit None values by using some intermediate, hidden undefined marker. Ignoring the fact that this example is not compliant with your spec:

# The proposed behavior makes these two cases indistinguishable
foo = {"timeRange": {}}
bar = {"timeRange": {"startTime": None}}
foo["timeRange"]?["startTime"]? is None
bar["timeRange"]?["startTime"]? is None

# If we somehow preserved the `undefined`-ness of our cursor as we evaluate, we could imagine an implementation in which explicitly coalescing would reveal an explicit `None` without the need to add a new sentinel to the language.
foo = {"timeRange": {}}
bar = {"timeRange": {"startTime": None}}
foo["timeRange"]?["startTime"]? ?? "foo" is "foo"
bar["timeRange"]?["startTime"]? ?? "foo" is None

Writing this out makes it look a bit silly because of the ? ??, though…

elis.byberi · December 19, 2024, 8:19pm

That’s the issue I’m addressing. While using ?. (or similar constructs) can help avoid some if statements, explicit checks for None values are still necessary. This approach doesn’t significantly reduce the need for if statements; it merely defers their use.

In my view, the concept of a safe operator primarily arises from the need to handle optional values from external data sources. It’s less applicable to optional variables declared within our code, where explicit if checks are inevitable for effective use. Declaring a variable as optional inherently requires a commitment to checking for None before use. For this reason, I avoid optional variables and prefer explicit if/else constructs.

However, when dealing with optional external data, the scenario is different. In such cases, treating them as a special data structure explicitly designed to handle optionality might be a cleaner and more reliable solution.

class SafeDict(dict):
    def __getitem__(self, key):
        return self.get(key, None)


# Example usage
s = SafeDict({
    "name": "Bob",
    "age": 30,
    "tasks": SafeDict({
        "read": SafeDict({})
    })
})

print(s["name"])  # Bob
print(s["age"])  # 30

print(s["tasks"]["read"]["book"])  # None

This approach works with existing code. The user only needs to use the safe dictionary data structure.
s["tasks"]["read"]["book"] is much easier to write and read than:
s["tasks"]?["read"]?["book"]?

noahbkim · December 19, 2024, 8:46pm

This is an interesting perspective and I appreciate your elaboration. Some initial thoughts:

The third, equally important case in my mind is composition of optional objects, which I think is a very reasonable pattern to reach for in an object-oriented language. Looking through the standard library there is plenty of code that falls into this category, for example: classes that cache attributes, classes with initially-unpopulated state, and classes with optionally-extensible behavior via some kind of strategy pattern. I can’t think of any alternatives besides type states, which are yet uncommon in standard library code.

Also, “I specifically want the default behavior”. Sometimes it is needlessly inefficient/unergonomic to express this any other way.

While I see the benefits of philosophy, I’ve never preferred it myself because I would rather avoid creating temporaries. I’m totally happy to propagate None through my code and I don’t think it’s something that users should be steered away from.

But why is it so bad to use None as an indication of no value? You seem to dislike it, but I don’t think the language as a whole should shy away from it. It’s perfectly-suited to cases where there is no orthogonal sentinel value of the type you’re expecting! That’s why the default default in e.g. dict.get is None!

Let’s set aside safe traversal. If I understand correctly, we are in essence arguing about whether we should encourage users to use None (if we are to encourage any paradigm at all). My personal feeling is yes: it is intuitive, efficient, and comes with niceties like being well-typed. But objectively, None is already ingrained in the language. Making it easier to work with benefits any who has to interact with the standard library (nearly everyone), not just people who prefer to use it in their own code.

noahbkim · December 19, 2024, 8:53pm

I think the point here is that the key "tasks" may not be present at all, in which case this SafeDict implementation doesn’t help us much.

elis.byberi · December 19, 2024, 8:56pm

Could you please elaborate?

pf_moore · December 19, 2024, 8:57pm

This is definitely a common scenario. I have to deal with similar data structures relatively often. So I can certainly appreciate the desire for having a better way of dealing with such data structures.

What I’m not clear on is why this needs to be syntax. First of all, the only use case I’ve encountered is this sort of unstructured nested dictionary coming from an external system. There’s not a broad spectrum of use cases here, but rather a single, relatively common, one. Secondly, this seems fairly easy to handle using a library. The glom library has already been mentioned here, and it handles this use case just fine. But maybe it’s more than we want here - if so, what would be wrong with a stdlib function^[1], something like this?

start_time = traverse(event, "timeRange", "startTime")

The arguments could be strings (representing [KEY]), numbers ([INDEX]), or strings starting with a dot (.KEY). That would seem to cover all the reasonable use cases (I can imagine arguments over the details, but IMO to be convincing they would have to come with a use case that we haven’t seen so far in this discussion).

We could even revisit the question of whether dedicated syntax was worth adding at a later date, once we’d had experience with the library function, and knew what the rough edges were, and what advantages we’d get from adding syntax.

I don’t have a good intuition for where it would go - maybe collections? ↩︎

steve.dower · December 19, 2024, 9:10pm

Then this is the philosophical difference that you ought to highlight. Or alternatively, you’re saying that the slippery slope that worries me is actually the one that you intend!

Either way, you don’t need to convince me. My position is more than adequately stated, and I don’t have veto powers, so it’s up to the rest of the community to find consensus on one side or the other.

cdce8p · December 19, 2024, 9:21pm

I like the idea. Especially since a “standard” option would allow for better ecosystem support. Language servers might suggest keys to use, type checkers could implement custom logic for it. That isn’t really feasible for library solutions like glom.

It does only address one direction though JSON → Python, not the return which will still require if expressions ?. could get rid of.

Edit: Overall, while I’d support it, I still think ?. would provide additional opportunities a simple traverse function doesn’t cover.