PEP 661: Sentinel Values

encukou · October 15, 2024, 9:48am

Thank you for considering my concerns!
I think it’s a mistake, but you don’t need to convince me :‍)

To clarify my thinking: I’d treat existing uses as special cases, or backwards-compatible API with some mistakes to learn from.
Python implementations are forced to either provide a frame API, or special-case (e.g. re-implement or de-optimize) all stdlib’s uses of it – so, adding more is a burden. We can make their job easier in the future, as with the easier-to-implement _getframemodulename, but that doesn’t solve concerns about using CPython-specific API unnecessarily.
But again: this is to clarify where I’m coming from; no need to convince me.

It’s not ideal, but I don’t think they had a meeting yet.
They’ll want to know ASAP that there might be changes. Since you said you’d like to change the PEP, I left a comment on the CS issue.

steve.dower · October 15, 2024, 4:18pm

It’s a private function But we’re allowed to use those in the stdlib.

h-vetinari · October 16, 2024, 5:37am

tomllib manages just fine with the double L. Personally I think sentinellib is much better than sentinelslib.

jamesdow21 · October 18, 2024, 5:41pm

Thank you Tal for all the work to get this PEP ready and it’s very exciting to see it has been submitted (fingers crossed that it will be accepted)^[1]

And also thank you to Jelle for catching the Missing vs Literal[Missing] (seems like it was just in time). Really appreciate the ergonomics benefit and fits exactly how I would expect to use it (i.e. drop-in replacement for None in an optional field: T | None → T | Missing)

A backport (that type checkers would be able to recognize and handle appropriately) would be very nice. Personally, typing_extensions is exactly where I would have naively looked first to find the backport, since the reason I would be reaching for the “official” backport ^[2] is to get type checker support.

On a completely separate note, I was looking at the reference implementation in the PEP and had a question about one of the details there (I think I know the answer, but wanted to double check).

It has (with comments I’ve added)

_registry = {}

class Sentinel:
    """Unique sentinel values."""

    def __new__(cls, name, repr=None, bool_value=True, module_name=None):
        ...
        # if there is an existing sentinel with this registry key,
        # return that one instead of making a new one
        sentinel = _registry.get(registry_key, None)
        if sentinel is not None:
            return sentinel
        # otherwise, continue on setting up the new sentinel
        ...

        # check _registry again?
        return _registry.setdefault(registry_key, sentinel)

I imagine this is for thread safety reasons since it prevents the possibility of a data race between two threads both constructing the same sentinel and trades one extra dict lookup (fast) for always doing the “full” setup (slow) before checking the _registry dict.

Is there some other benefit to this design that I’ve missed or does that reasoning cover it (more-or-less) completely?

It’s been an especially exciting time lately since the three additions I’ve wanted most are this, dependency groups in pyproject.toml (just accepted! congrats to Stephen Rosen), and standardized lock files (looking like it could be right around the corner) ↩︎
instead of just dropping the reference implementation into my own personal utils module ↩︎

taleinat · October 19, 2024, 8:36am

Regarding the reference implementation, you’ve got the reasoning covered 100%

techdragon · October 24, 2024, 8:07am

Not to raise anything major here… but the back and forth over if the sentinel objects should compare to true or false having been settled with “you can pick”… has left out the important 3rd option. There was a lot of comments to the effect of raising an error being surprising… which is entirely fair if it was default behaviour. But if I’m using a sentinel for a value on something that normally holds boolean data, the risk of mistakes is a lot higher, sometimes it doesn’t matter, the default is fine, other times you have to know you got the sentinel, and you then have to be sure you check for the sentinel first, and never make the logical mistake of doing otherwise, and it won’t always be obvious, etc… Having an option to “not compare to boolean” when you define your sentinel would be useful and provide added safety when you’re dealing with this.

Given how the proposed sentinel does practically everything anyone has asked it to do, it would be pretty irritating to have to constantly subclass it whenever I’m dealing with boolean values and want the extra safety to avoid potential problems down the line. We Developers aren’t perfect and the new hotness of AI code tools are especially imperfect… I can readily imagine a scenario where someone comes in, makes a small change to something and thinks “this would be more readable if I moved things around like this” and could introduce potentially difficult to test for bugs depending on what’s going on.

I’d like to respectfully propose that bool_value argument, which unless I’ve misunderstood something defaults to True so as to preserve object style semantics… be adjusted so that the argument accepts True, False and None, and to tweak the __bool__ function like this.

def __bool__(self):
    if self._bool_value is None:
        raise SuitablyNamedSubClassOfRuntimeError("simple explanation goes here")
    else:
        return self._bool_value

The default semantics remain unchanged, False remains a choice, and we would also have the option to use the unambiguous sentinel that cannot be accidentally interpreted as a boolean.

Edit: Adding an extra thought…
I don’t want to come across as bikeshedding or anything. I’ve legitimately needed to disambiguate boolean data like this on multiple occasions, and I’ve also seen bugs where mistakes have been made with this, even made a few myself over the years when the test coverage isn’t good enough to catch them… it just seems like such a big win in terms of having the power to “explicitly convey programmer intent” for such a small change.

skirpichev · October 25, 2024, 5:04am

Perhaps, this is too late to ask, but it seems the PEP address pure-python use cases in the stdlib. What about C extensions?

dg-pb · October 25, 2024, 6:54am

These are just my initial thoughts, but I think there is a different/separate path for C extensions.

I.e. singleton sentinels. Such as None.
These are a bit different.
They have their own type.
Allow custom methods.

I think it would be good to implement standard template for singleton, and sentinel would be a specialisation for it.

Ideally, such could be used to replace all existing sentinels in C.

taleinat · October 25, 2024, 7:32am

Hi Sam, thanks for adding to the conversation!

I think that cases like this are quite rare. There is one example of this in Python itself: NotImplemented, for which boolean evaluation is deprecated and raises a warning (this warning was introduced in Python 3.9: PR, issue). However, the reasons for that are extremely specific.

The proposal now prohibits sub-classing, so that would not be possible.

Thanks for the suggestion. Indeed, I had considered that too. I’ve so far avoided adding this, thinking that the need would likely be rare, and that adding this in the future would be easy, whereas removing features which add complexity is nearly impossible.

Also, several developers have commented that the implementation could be simpler than it currently is, by being even more opinionated, so there are opinions either way.

In that vein, I’ve even considered making boolean evaluation always raise an exception, to make comparisons only possible via identity (is), but I think that might be too extreme and preclude use in place of many existing sentinels.

Still, I’ll consider this further.

taleinat · October 25, 2024, 7:33am

Python modules implemented in C would also be free to import the module and use such sentinels.

taleinat · October 25, 2024, 7:34am

Thanks for your thoughts, but that is not currently considered a goal of this PEP.

skirpichev · October 25, 2024, 8:43am

That seems possible, but has some performance drawbacks. Thought, I think this should mostly affects module import timings.

On another hand, most (all?) possible use cases for C extensions now use multiple function signatures, instead of some poor people singleton ideoms. The functools.reduce - is an example. The C version has multiple signatures:

reduce(function, iterable, /)
reduce(function, iterable, initial, /)

while pure-Python fallback has “a common ideom”:

_initial_missing = object()

def reduce(function, sequence, initial=_initial_missing):
    ...

I think that support for overloading functions and methods is an alternative to this PEP. E.g. with typing.overload:

@overload
def reduce(func, lst):
    ...
@overload
def reduce(func, lst, init):
    ...
def reduce(*args):
    if len(args) == 2:
        ...
    elif len(args) == 3:
        ...
    else:
        raise ValueError

steve.dower · October 25, 2024, 8:54am

It’s just rendered that way in documentation for readability. In reality, the C version is reduce(*args, **kwargs) and then the function “body” will check how many arguments were passed. There’s no concept of default arguments here, only missing/unspecified arguments.

(Argument Clinic generates a function body for you, so it handles the default value substitution. Some of the other native calling conventions in CPython differ slightly, but fundamentally there’s no common code to handle every single set of arguments like there is for Python functions.)

skirpichev · October 25, 2024, 9:19am

Sure, I realize that. Python has no real concept for multiple signatures, e.g. this is not available for introspection with the inspect module. But logically - reduce() in C version has multiple signatures (either for 2 or 3 arguments), regardless on how it’s actually implements handling of arguments.

steve.dower · October 25, 2024, 9:44am

If you want to argue it that way, you also should argue that def reduce(function, sequence, initial=None) also has multiple signatures. There’s no difference to your argument.

But the core point is that C doesn’t require sentinels for optional arguments, because it doesn’t require missing arguments to have a value. Python functions require a value for optional arguments.

beauxq · October 25, 2024, 12:23pm

This looks to me like a misunderstanding if you think this is rare.
When I search typeshed stdlib for “bool | None”, I see 381 results.
Any time you have a bool | None, it’s possible to accidentally interpret the None as bool.

sirosen · October 25, 2024, 1:59pm

I don’t think it’s correct to extrapolate that each of those cases matches the request for a sentinel which raises an exception if treated as a bool.
A better point of comparison is how much code is out there today implements a custom sentinel with this behavior.

The reason that I requested bool values of false is that I know of several cases of sentinels in libraries I maintain with that behavior. Having matching behavior from the stdlib will allow me to replace those, which I otherwise would not be able to do.

There was an earlier discussion in which some devs expressed the opinion that “can be converted to a bool” is part of the definition of a well behaved object, and that violations of that rule are surprising and strange. I have no opinion on that, but I think the initial PEP should defer any discussion of such a feature, since it could put the whole proposal at higher risk for rejection.

beauxq · October 25, 2024, 2:56pm

Every case of bool | None has the danger of accidentally interpreting None as bool.
This is not the same as saying that all of those should change to a sentinel that raises an exception. Just that there is some danger in ALL of them, so the option of using a sentinel that raises an exception has some value in ALL of them.

Your suggested “better point of comparison” fails to take into account all of the cases where people haven’t yet considered the danger, or haven’t been bitten by it yet. So that is not a better point of comparison.

sirosen · October 25, 2024, 4:47pm

An alternative statement of the same facts: every case of bool | None also has the feature that you can interpret the value as truthy/falsy if that’s appropriate.

Given that bool | None is not about to be removed from the language, danger of misuse will remain present. Sentinel behaviors will only impact developers experienced enough to consider whether or not None is an appropriate to their usage, and are interested in creating dedicated sentinel values.

zuo · October 29, 2024, 5:33pm

Thanks for proposing and designing this new exciting feature!

But the key difference between Enum+namedtuple vs. proposed Sentinel is that the latter relies on the module name to determine the identity an instance obtained by calling the Sentinel() constructor – which gets the instance from the global registry or creates a new one and places it in that registry.

I believe such mechanisms should not rely on a procedure that allows for any uncertainty.

Also, in the case of Sentinel(), I don’t find the “get-existing-or-create” behavior appealing. Personally, I’d always refer to one explicitly created variable anyway, importing it if necessary; that would be “boring”, yet reliable, explicit and resistant to any refactoring-introduced mess (especially, when code is moved from module to module).

It’s probably quite late to change the design; yet, what I believe would be worth considering is:

using the aforementioned global registry mechanism to just forbid the creation of more than one instance with the same key;
making the Sentinel()'s call signature be something along the lines:
```
(name,
 /,
 module_name,
 *,
 fullrepr=False, ...any other arguments...)
```
– where:
- module_name – is here required;
- fullrepr – if true, sentinel’s repr would be f"{module_name}.{name}"; if unspecified/false, sentinel’s repr would be f"<{name.split('.')[-1]}>";
- the tuple (name, module_name) would be the key in the registry.