PEP 661: Sentinel Values

ncoghlan · January 30, 2025, 11:52am

I think this is the key point here.

Defining __bool__ as raising an exception on sentinels keeps all of the design options open for future consideration:

never support truth checking on sentinels
be false like None
be true like object(), NotImplemented, or ...
allow users of the library to define the desired truth value when defining the sentinel

It does mean choosing the bikeshed colour for which exception to raise (TypeError was the first one that came to mind for me, but then I thought NotImplementedError could be a good option, since that would still be valid in a future where this behaviour is configurable)

Edit: I’ll note that any(...) and all(...) will trigger the exception, which would potentially cause problems for sentinels-as-return-values use cases. However, even for those use cases, truth value checking only works if the sentinel’s truth value is reliably different from the truth values of all other possible results, and the simplest way to be 100% certain of that is to use an identity check.

bwoodsend · January 30, 2025, 1:38pm

For what it’s worth, numpy arrays are (mostly) unboolable and we’ve been happily tossing those around into Python’s various containers for years ^[1] so I’d say that numpy has shown this (understandable) caution to be unwarranted.

the only times I ever run afoul of its unboolable-isms was when not complaining would have led to a logic bug ↩︎

ImogenBits · January 30, 2025, 1:55pm

Oh yeah, you’re totally right. In that case I’ve fully changed my mind and also side with the always raising option.

kapinga · January 30, 2025, 4:22pm

My primary use of Python is for data analysis, so I regularly work with numpy arrays and/or DataFrames, which raise if __bool__ is called. In my experience, the only times I run into that exception is if I’ve mis-coded something and accidentally provided an array when I meant to provide a scalar value. In those instances, I am glad to have the raise occur, so I can fix the code (rather than have all arrays silently evaluate as True).

I’ve never encountered a time where a std lib collection implicitly checked the truthiness of such structures (although I can’t say that I’ve knowingly used all of them).

I tend to agree, although I still am struggling to think of a way where it would be obviously beneficial for a sentinel to be “Truthy.” The main argument in favor of this seems to just be “this should replace using object() for sentinels, and object() is truthy.” That’s fair enough, but we’re proposing a new API - this is the chance to change that behavior.

My first vote (which likely doesn’t mean much) would be to have __bool__ raise, as this allows users switching to this API to catch instances where their sentinel’s truthiness was evaluated (perhaps by mistake) and force a change to the surrounding code.

If the SC/community feels that it’s not appropriate to have a type that raises on __bool__ in the std lib, then giving the user a choice is the next best option. Drop in replacements for object() would then use a Truthy sentinel, while newly created “None alternatives” (e.g. MISSING) would likely favor using a Falsey sentinel.

bwoodsend · January 30, 2025, 7:33pm

… although it occurs to me that, since numpy arrays are also unhashable, they’ve never been exposed to being in a set or a dict’s keys or such like. But unless someone can think of a scenario where some hash-based container would index objects internally factoring in their truthiness, I can’t see this being significant?

Monarch · January 30, 2025, 9:14pm

The more I think about it, the more I prefer raising an exception in __bool__. It’s the more correct behaviour with the only downside being it won’t replace as many of the existing object() sentinels. I think a newer API being more correct by default should take precedence over compatibility with existing solution. It might even help uncover some bugs after people replace their objects() with the new sentinels and get an appropriate error where a check was wrongly relying on __bool__ instead of is.

oscarbenjamin · January 30, 2025, 9:50pm

Dicts and sets use __hash__ and __eq__. They don’t use __bool__.

If you want a real life example of a type that is hashable and whose __bool__ raises an exception then SymPy’s Boolean type is there:

In [4]: from sympy.abc import x, y

In [5]: e = x + y > 2

In [6]: e
Out[6]: x + y > 2

In [7]: bool(e)
...
TypeError: cannot determine truth value of Relational: x + y > 2

This raises an exception because it could true or false depending on the values of the symbols x and y.

There is no problem putting these expressions into sets or dicts:

In [8]: {x > 1, y > 3}
Out[8]: {x > 1, y > 3}

In [9]: (x > 1) in {x > 1, y > 3}
Out[9]: True

techdragon · February 6, 2025, 3:20pm

I think this has an extremely easy answer… Since Sentinels will be new to the standard library, give them a new standard library exception to raise… my personal suggestion would be raising something like ‘SentinelComparisonError’ which is either a subclass of ValueError, or a subclass of SentinelError which could potentially be a subclass of ValueError.

Since existing code using sentinels will likely (in my experience) be using existing custom exceptions, and code adopting sentinels once they are in the standard library will have no established exception yet, it seems most straightforward to define a new exception to go with the new functionality. The bikesheding here feels like it should be only be if “ValueError” is appropriate as a parent class due to potential interaction with existing code.

techdragon · February 6, 2025, 3:41pm

This is the core of my argument for needing some ability to define a sentinel that errors on Boolean comparisons.

There is a significant body of python code that functions by passing kwargs dicts, and while it’s easy enough for a developer to introspect the call chains and decide if None, object(), etc is best… the problem is that they aren’t in control of third party code, a point update could restructure some comparisons assuming nothing is different and introduce inadvertent changes to what would happen with None/False/True comparing values…

Since this will be new to the standard library, let’s add the minimal extra complexity to allow choosing or make the explicit decision to raise the exception. The effort to keep it “minimal” feels like trying to decide if a sheet of paper is too thick… technically a valid decision… and understandable given the significance of trying to get new code into the standard library… but it feels like a somewhat foolhardy line of thinking given how with or without the options, the code will still be extremely minimal and have limited impact on existing code compared to many successful new additions to core python in recent years.

Lucas_Malor · February 6, 2025, 7:45pm

My impression is that all the fear about misuse of sentinels is highly overrated. AFAIK sentinels are used by seasoned programmers, that knows very well they have to use is. Someone can search on real code how much time sentinels are misused?

That said, I see no real use case for doing if _sentinel, so for me new sentinels can also cause a segfault on __bool__ Just please add a meaningful error message if you really want to have an exception.

jpgoldberg · February 20, 2025, 4:38pm

I realize that I am four years late for this discussion, but only learned the term “sentinel” and about PEP 661 a few days ago. I wish to voice my support for RFC 661 and don’t know where else to do so.

I have found myself using (or wanting to use) None in four different ways, three of them are sentinels.

None as none. That is a non-sentinel value. This is usually for specifying thresholds or limits for things like iteration cutoffs or timeouts, and such. These have default values that are not None, but for which None is a sensible value.
None for Not Given, as discussed above and in the PEP
None for Missing, as discussed above and in the PEP
None for Not Yet Computed. I probably should just be using cached_property instead, but I didn’t know about that at the time.

In each case if I find some conflicting use of None within some context, I can work around it easily enough, but adopting PEP 661 would mean that I (and others) wouldn’t have to find such work-arounds. More importantly it would allow me (and others) to actually have our code say what we mean.

barry · February 27, 2025, 10:51pm

The Python Steering Council recently discussed PEP 661, Sentinel Values. This post is on behalf of the SC.

We’d like to thank Tal for writing this PEP and leading a fruitful discussion on DPO. Although we are not yet pronouncing on the PEP, the SC is generally supportive of the idea of adding official sentinel values to Python.

As the use of object() for unique sentinels shows, the motivation for sentinels is clear, and we agree with the rationale provided in the PEP. We think that sentinels should be at least as easy to use as object(), while providing clear benefits over that simple and existing idiom, and we agree the implementation should be as simple as possible to achieve these goals.

Here is some additional feedback on the PEP. We’re not implying that acceptance of the PEP is conditional on accepting our recommendations, but we want to give you a clear idea of what we’re thinking, and we’d like to encourage you to take advantage of Steering Council office hours if you’d like to discuss our feedback in person.

The PEP proposes that sentinels within the same module are unique, and implements a global registry to ensure this. We think that this is “magical” behavior that isn’t supported strongly enough by the rationale and motivations in the PEP. Why is this implicit behavior necessary? Wouldn’t defining a module global sentinel explicitly and then referring to it by variable name be sufficient? We suggest that removing this behavior would keep the implementation simpler while still achieving the goals of the PEP.

Along those same lines, the PEP proposes to auto-discover the module that the sentinel is defined in, while still providing an optional module_name argument to the constructor for cases where the auto-discovery can’t or doesn’t work. While this parallels the design of Enum and namedtuple, we aren’t convinced that this is useful for sentinels, especially if the global registry feature is removed as suggested above. We think a single name argument, which would be used directly in the repr, should be sufficient to achieve the goals of the PEP. If the user wanted to include the module name in that argument, it’s easy enough to do so explicitly.

We think it’s fine for sentinels to evaluate as truthy in boolean contexts.

The PEP proposes a new module called sentinellib which will contain a Sentinel class. We note however that in the section titled “Typing”, the examples use a module called sentinels. In either case, we think that to improve adoption by making sentinels as easy to use as the current object() idiom, a builtin type / callable called sentinel should be provided. With a simplified implementation, it may not even be necessary to implement sentinels as Python code living in a new standard library module (e.g. it could be implemented in C).

Thanks again Tal, and we’re looking forward to further discussions.

taleinat · March 1, 2025, 10:01am

Regarding the registry, I took some time to recall how I arrived at that design after many iterations, and realized that there is one main reason: Sentinel objects are often not defined on the module top-level, but I’d still like them to “just work” in those cases.

Searching online Python code for object(), e.g. on GitHub, reveals that a significant portion of its use is inline in function/method signatures, and another significant portion is in code inside functions/methods. In the latter case especially, it is often not easily possible to move the definition to the module top-level. Sometimes, these are intentionally intended to be different in every function call / loop iteration.

To illustrate, here’s an example from the stddlib heapq module:

def nsmallest(n, iterable, key=None):
    """Find the n smallest elements in a dataset.

    Equivalent to:  sorted(iterable, key=key)[:n]
    """

    # Short-cut for n==1 is to use min()
    if n == 1:
        it = iter(iterable)
        sentinel = object()
        result = min(it, default=sentinel, key=key)
        return [] if result is sentinel else [result]

    ...

I think that if we add a sentinel to the stddlib, and even moreso If we add a builtin, this will be seen as encouragement to use it in such cases. I feel strongly that it should be something that can be used simply and with confidence, something that will “just work” without weird caveats and edge-cases. The registry is the best approach I’ve found to achieve that.

In my mind, the suggestion to rely on being defined in a module top-level for copying/unpickling to work as expected, means unnecessarily leaving room for things break subtly.

(Side-note: a real implementation with a registry would likely use weakrefs to allow such objects to be garbage-collected.)

Jelle · March 1, 2025, 3:48pm

Is it likely that you need to both declare the sentinel in a local scope, and be able to pickle it?

As an analogy, function objects can also be created in local scopes, but if so they cannot be pickled successfully. I think it would make sense for sentinels to work the same way.

barry · March 1, 2025, 5:05pm

Speaking not from the SC, that’s my question as well. Rewriting the example:

def nsmallest(n, iterable, key=None):
    """Find the n smallest elements in a dataset.

    Equivalent to:  sorted(iterable, key=key)[:n]
    """
    # Short-cut for n==1 is to use min()
    if n == 1:
        it = iter(iterable)
        missing = sentinel('nsmallest')
        result = min(it, default=missing, key=key)
        return [] if result is missing else [result]

In this case, missing doesn’t need to be pickled, right? What am I missing^[1]?

pun intended! ↩︎

pf_moore · March 1, 2025, 5:39pm

Also, presumably a registry could be added at a later date if it turned out to be needed, couldn’t it?

avylove · March 2, 2025, 10:19pm

I’m not sure the registry would add much value and would be a negative for me. When I use sentinels, I put them at the top of the module or, if need in multiple modules, in a package utility module. It makes it very easy to know their scope. With a registry, the scope is no longer obvious. Not only would this require additional mental processing for a user, it would make adding static checking more complex.

barry · March 14, 2025, 3:22pm

Hi @taleinat - the SC would really like to move forward with the PEP; this is a friendly ping that if you’d like to discuss the feedback in detail and F2F, please do schedule an office hours visit with us!

Note that the schedule is a little discombobulated at the moment while our members are differently saving^[1] daylight, but that should settle down by the end of March.

or not! ↩︎

dg-pb · March 14, 2025, 5:20pm

To me the question is whether identity has to be kept on serialization.

If no, then mental model is very clear:

"It behaves in almost identical manner as object() except for __repr__ (and maybe customizable truth value).

If yes, then it is a bit of an issue as it either:
a) Adds a fair amount of complexity
b) Needs a different spec

As for (b), the simplest way to achieve it that I have found is:

class NULL(Sentinel):
    """Multiline doc for free"""
    # False / True / None (NotImplementedError)
    truth = True

# Sentinel is just convenience baseclass
class Sentinel(metaclass=SentinelMeta):
    pass

# Thus, it could alternatively be:
class NULL(metaclass=SentinelMeta):
    ...

I quite like it personally as it keeps things simple, offers all basic features including serialization identity retainment (when defined at module level) without “black magic” and implementation is 12 lines long.

So I use above in places where I was using object() before.

Serialization identity retainment allows it to be used for few more cases, but I am not sure if that is needed. If not, then the above is inferior to this PEP as class creation is much more costly than instantiation.

Thus, maybe it would be simpler to keep this in line with object() with few additional basic features:

__repr__
customizable truth value via truth arg (maybe)
customizable __doc__ via doc argument (maybe)
Shorthand Sentinel.NAME instantiation. (maybe)

And leave all complexities for a separate concept, which (among other benefits) would allow users to define Sentinels for global/framework wide usage. As per Singletonobject.c and unification of `singletons` and `singlenels` - #7 by Alex-Wasowicz

emily · March 20, 2025, 5:26pm

@taleinat The SC’s office hours are now updated and available to book again