Introducing a Safe Navigation Operator in Python

jamesdow21 · October 12, 2024, 11:38pm

It sure will, which is doing 2 of the 3 requirements of the function wrong.

And I absolutely would handle them if I knew they were there. Do I actually need to write out an example of a property with a bug inside it to show that silencing an AttributeError is different than a is not None check?

It’s directly addressed both in the original PEP and in my post

If you want to short-circuit on both missing keys and keys whose value is None, you can do
artist_name = r.get("result")?.get("music")?.get("artist")?.get("name")

If you only want to short-circuit on keys whose value is None, you can do
artist_name = r["result"]?["music"]?["artist"]?["name"]

Fundamentally, both the ?. and ?[...] operators are not about silencing KeyError’s or AttributeError’s. All they are doing is checking if the left operand is None before doing either member access or index access.

The original PEP has a dedicated section under “Rejected Ideas” about why these should be “None-aware” operators and not “Exception-aware” operators.

I was originally trying to not get too focused on specific terms and just use the classic foo, bar, baz, etc., but seems like that ended up backfiring.

Timezone was also probably not my best choice when I was condensing down my original scenario, since it’s hard to convey succinctly (and I’m writing waaayyy too much in each post already) why there can be multiple optional associated timezones, but it was the first example I thought of with an obvious default of just sticking with UTC.

Let me try the example again using more realistic terms (from the manufacturing industry) which will hopefully show that the attribute chain really isn’t some abstract internal detail, while also demonstrating why properties/methods aren’t always helpful.

I’ll drop the timezone part, and just limit it to email addresses.

A sensor can record multiple data streams
That sensor might be attached to a particular machine at a manufacturing facility, but doesn’t have to be (could just be free-standing), e.g.:
- One attached to a machine is measuring Temperature and Vibration
- A free-standing one in a warehouse is measuring Temperature and Humidity
Most machines have a dedicated maintenance team, but some are simple and can just be fixed by any operator at the facility
Most maintenance teams have an email address for a distribution list that includes every maintenance tech on the team, but not every maintenance team has bothered to set one up
A machine can be part of a particular manufacturing line, but doesn’t have to be, e.g.:
- A case packer that takes widgets off a conveyor belt, puts them in a corrugated box, and then glues the box shut - belongs to Widget Manufacturing Line A
- A fork truck that gets driven anywhere around the manufacturing facility - no associated manufacturing line
Every manufacturing line is always part of a department
A department should have a dedicated engineer, but people quit jobs and they don’t always get immediately back-filled, so there might not be one currently assigned

(Branching off in a different direction from the beginning)

Every sensor records its data streams in a data historian server
A historian server might have a dedicated IT support person, but dedicated IT is not required
For large sites, there is a local data historian, which then aggregates all of the data to send to a central data historian
Smaller sites just send the data directly to a central historian
One of the business divisions decided that they actually wanted to aggregate all of the data for their local historians together into a business-specific historian server, before sending it on to a central historian
There are multiple central historian servers

That combines to form a forest, where the root of any particular tree is a central historian, there are an arbitrary number of levels of intermediate servers, then a level for a single sensor’s data streams, and the leaves are each individual data stream.

New example class definitions (for the code inclined)

class Sensor:
    streams: DataStreams
    machine: Machine | None

class DataStreams(collections.abc.Mapping[str, "DataStream"]):
    sensor: Sensor
    _streams: dict[str, DataStream]  # internal storage for the mapping
    historian: Historian

class DataStream:
    _parent: DataStreams
    name: str
    data: pd.DataFrame | None

class Machine:
    sensors: tuple[Sensor, ...]
    maintenance_team: MaintenanceTeam | None
    line: Line | None

class MaintenanceTeam:
    machines: tuple[Machine, ...]
    members: tuple[Person, ...]
    email: str | None

class Line:
    department: Department
    machines: tuple[Machine, ...]

class Department:
    lines: tuple[Line, ...]
    engineer: Person | None

class Person:
    email: str

class Historian:
    streams: tuple[DataStreams, ...]
    aggregator: Historian | None
    it_support: Person | None

Entity Relationship Diagram w/ Crow's foot notation (for the graphically inclined)

pep505_example_ER_diagram

Finally to the point, I’m in a function doing something with a sensor and I want to find the email address for the department’s engineer if this particular sensor is attached to a machine that’s part of a manufacturing line. If it’s not there for any reason, that’s fine and I can move on.

Currently I would do that as:

if (
    (machine := sensor.machine) is not None
    and (line := machine.line) is not None
    and (engineer := line.department.engineer) is not None
):
    address = engineer.email
else:
    address = None

PEP 505 would allow that to instead just be

address = sensor.machine?.line?.department.engineer?.email

condensing 8 lines of code to just 1, and removing 3 unnecessary variable assignments.

It’s immediately obvious to me whose email address that is if it’s not None, I see all 3 ways it could end up being None, and I’ve made the conscious choice that I don’t care to distinguish between any of those 3.

I don’t even need to remember which relationships are optional, since I can start out writing it as

address = sensor.machine.line.department.engineer.email

and mypy will show 3 [union-attr] typing errors, reminding me to put the 3 ?'s where they’re needed.

It is certainly possible to add a property or a method to Sensor to get this email address, but this might be the only place in the entire code base that I ever want to go from a sensor to the department engineer’s email, so writing a dedicated method/property/function wouldn’t be worthwhile.

A generic Sensor.get_email() method is not straightforward, since the email address I want could be any of:

The email address for the department’s engineer
The distribution list for the maintenance team
The email address for a particular member of the maintenance team
The email address for an IT support person for the historian that the sensor directly sends it’s data to (or any other level walking up the historian aggregation tree)

Maybe the next suggestion would be to just have a Sensor.get_engineer() or a Machine.get_engineer() method instead.

But this is all just an incredibly simplified demonstration. There are only 9 classes included here, and only 1 place that there could be an engineer.

More realistically, there could be an engineer for each line in addition to the overall department engineer, as well as one on the maintenance team.

So even that seems as simple as Machine.get_engineer() actually has 3 different possible places it could pick an engineer from and could be customized in multiple ways.

Do I only want a specific one of them?
Do I want to exclude one of them instead?
Do I want to customize the fallback order?

All depends on the specific scenario. So I could add include filter, exclude_filter, and fallback_order arguments to the Machine.get_engineer method

I actually might want all of them that are not None instead of just 1, so I could also add a return_all boolean argument (more likely, just add a separate Machine.engineers property).

That’s quite a lot to keep in your head when you’re doing code review, vs.

engineer = (
    machine.maintenance_team?.engineer
    ?? machine.line?.engineer
    ?? machine.line?.department.engineer
)

(I’m aware that this is accessing machine.line twice, that feels worth it for the simpler syntax)

Same thing again, but only accessing `machine.line` once

engineer = (
    machine.maintenance_team?.engineer
    ?? (line := machine.line)?.engineer
    ?? line?.department.engineer
)

The current equivalent for that (without just silencing AttributeErrors) could be:

# have to add this explicit type hint to pass mypy strict type checking
engineer: Person | None
match machine:
    case Machine(maintenance_team=MaintenanceTeam(engineer=Person() as engineer)):
        pass
    case Machine(line=Line(engineer=Person() as engineer)):
        pass
    case Machine(line=Line(department=Department(engineer=Person() as engineer))):
        pass
    case _:
        engineer = None

or

engineer: Person | None = None
if (mt := machine.maintenance_team) is not None:
    engineer = mt.engineer

if engineer is None and (line := machine.line) is not None:
    # in this case, could have instead used
    # `enigneer = line.engineer or line.department.engineer`
    # but a reminder that some of my classes are also collections
    # and I want to reliably distinguish None from an empty collection
    engineer = line.engineer if line.engineer is not None else line.department.engineer

This if version is tricky to change if I later realize that I want the fallback order to instead be:
Line Engineer → Maintenance Engineer → Department Engineer
For pattern matching or PEP 505 version, all I have to do is change the order

Bonus rearrangeable `if..elif..else` version

# please don't actually write it this way
if (
    (mt := machine.maintenance_team) is not None
    and (engineer := mt.engineer) is not None
):
    pass
elif (
    (line := machine.line) is not None
    and (engineer := line.engineer) is not None
):
    pass
elif (
    line is not None
    and (engineer := line.department.engineer) is not None
):
   pass
else:
    engineer = None

In the full reality, there are ~50 classes and hundreds of total fields with countless different scenarios where we want to traverse this graph of connected classes, checking for None’s along the way.

Just saying “validate your data” doesn’t help me. These are validated and optional attributes potentially being None is acceptable and expected.