Sentinel values in the stdlib

As far as I can tell, the main advantage is that it would be simpler.

The main disadvantage is that it would less obviously be correct in various edge-cases. A nice thing about private, single-use sentinel values is that they are easy to reason about and use with confidence.

Another nice property of one-off sentinels is that they can be given names which are meaningful in their specific context.

With cheap, easy, readable single-use sentinels, one could define a specific sentinel value for each relevant function argument, rather than being limited to one (or a few) per module. So I’m not sure that “such is life”.

My point is: there will always be places a particular sentinel can’t be used. No sense having more of those places than you reasonably need. The worst case would be a Python-wide sentinel for (say) MISSING.

I voted for both idiom with custom repr and public class. I am thinking specifically of a class in types with a name parameter for the repr and whatever other code that would be useful.
import types; sentinal = types._Sentinal('skip')

I use sentinels for indicating workers to stop in my multiprocessing library, passed via a queue. In order for that to work, I have all instances of a class compare equal.

class _Sentinel:
    def __eq__(self, other):
        return isinstance(other, self.__class__)

sentinel = _Sentinel()

This means that sentinel checks use equality == instead of identity is. This is fine for my use-case with iter(queue.get, sentinel)

Yes, simpler, and more readable: “There should be one— and preferably only one — obvious way to do it.”

The advantage I see is that every time someone sees a function signature with “MISSING” (or whatever it’s called) it means the same thing. And I’m not doing much with typing, but it strikes me that it would be helpful there, too.

Note that I do not advocate any kind of enforcement – if you have a compelling reason to make a custom sentinel, then by all means do so.

Anyway, I certainly agree that a standard sentinel or two is less compelling than a standard way to make a custom sentinel.

Thinking more, I think dataclasses._MISSING_TYPE is an unusual special case.

If there were a standard MISSING sentinel, then a signature like:

def fun(x=MISSING):

would clearly mean what it means :slight_smile: and like the currently common use of None for this, calling the function with:

fun(MISSING)

would have the same effect as not passing that parameter.

In the common cases, using None can be problematic, because
None has many uses other than specifying a missing value, so one might want to make a distinction.

But dataclasses.field is unique – it is used to build a function signature itself – so you may need to pass through a “standard” MISSING, and thus need a custom Sentinel in that case.

I’m having trouble coming up with that use case, but I’m not very imaginative.

Again, I am by no means suggesting that a standard MISSING should be the only one anyone uses. As for the original reason for this thread: dataclasses._MISSING_TYPE is a good case for a nicer repr:

Signature:
dataclasses.field(
    *,
    default=<dataclasses._MISSING_TYPE object at 0x7f972f956dc0>,

That is not pretty :frowning:

“MISSING” to me sounds like an error.

“This parameter has been deliberately left unspecified” is better (or perhaps a shorter version that implies the same :wink: )

I’ve used sentinel = object() in my code, and yet I’m hesitant on the proposal.

  1. Common use could lead to anti-patterns, for example, the idea that the code in question tries to infer “who” called it. Some logging libraries inspect stack to determine the call site, and while that works in naive cases, that prevents wrapping the functions in this libraries.

  2. There will be a minor gotcha if someone reloads the module: the id of the object, or possibly type will be different generation-to-generation, something that calling code is not aware of.

  3. Should there be a common “UNSET” for all, or one for stdlib (but not user code), or one per module or one per function? It’s something that the developers will have to remember, for cases like this:

# module a
def foo(x, y=UNSET): ...

# module b
def bar(baz, qux=UNSET):
    ...
    foo(42, qux)

OT: perhaps Python needs to gain ruby-like or es6-like symbol type?

2 Likes

Heh. I did exactly the same thing. I wonder how many others reinvented this particular wheel.

The next step would be to change the dict update and function argument assignment mechanics. Thus

>>> def defaulted(foo=123, bar=42):
...     print(foo, bar)
... 
>>> defaulted(NotGiven, bar=NotGiven)
123, 42
>>> 

I’d love to get rid of the incessant

if foo is NotGiven:
    foo = 123

boilerplate code, which has the additional disadvantage that it hides the default values from introspection.

Thanks for bringing up these issues! They haven’t been mentioned in these conversations before.

In the stdlib, logging, namedtuple and Enum all inspect the stack frame to determine the call site. Is that a significant problem?

Some sentinel implementations don’t have this issue though, e.g. a meta-class:

class NotGivenType(metaclass=SentinelMeta):
    name = 'NotGiven'
NotGiven = NotGivenType()

With reload() no longer being a built-in, and with reload() being known to cause various issues such as this, is gracefully handling that a significant consideration?

WARNING: Wall of text thing incoming!

TL;DR
  • I think that the proposed specification will cause unnecessary long code that will just be unwanted boilerplate for the users
  • I personally want proper typing support so I currently use single-value Enum for sentinels over object()
  • I think that having the sentinel class be both the type and the actual sentinel value would make it simpler to use
  • I think that preferably both simple factory and class-based syntax should be supported in some way
  • From what I can tell, the proper support from type checkers will require changes to them regardless of the chosen path

The nice thing about NotGiven = object() is that it’s short, I feel like the currently proposed sentinel decorator still results in a lot of boilerplate (especially when you consider that most code styles would require more whitespace than shown in the PEP).

But, object() plays badly as a sentinel when it comes to type checkers so I usually have to resort to the usage of Enum:

import enum

# regular class syntax can be used here but it's longer
# which IMO makes the whole thing more distracting
_MissingType = enum.Enum("MissingType", (("MISSING", "MISSING"),))
MISSING = MissingType.MISSING


def func(arg: Union[int, _MissingType] = MISSING) -> None:
    if arg is MISSING:
        print("arg is missing!")
    else:
        reveal_type(arg)  # should be int
        print(f"arg + 5 = {arg+5}")

I think that other than the automated __repr__ (and the benefit of it possibly becoming widely used standard), sentinel doesn’t buy you much and I feel it could buy you more.
For one, it’s IMO not that nice to use because it requires you to define both the type and instance of it and that could be prone to errors (perhaps I’m overstating a bit though). Personally, I think that cls.__new__ = lambda cls: cls would work alright here and would greatly simplify this for the user.
It would require a special support from the type checkers (it would need to be supported to put sentinel class in Literal[] or make it possible for it to directly act as a final non-subclassable type that can be used in type annotations) but so does current proposal as far as I can tell.

Another thing which was brought up before already, but I want to reinstate it - I think it would be nicer if sentinel could be used both as a simple factory:

# perhaps there could be some sensible default?
MISSING = sentinel("MISSING", "<MISSING>")


# MISSING could be used as a proper type, or be used in `Literal`,
# I don't really know which option is better, Enum supports the latter
def func(arg: Union[int, Literal[MISSING]] = MISSING) -> None:
    if arg is MISSING:
        print("arg is missing!")
    else:
        reveal_type(arg)  # should be int
        print(f"arg + 5 = {arg+5}")

As well as be subclassable if the user needs more from it:

class MISSING(sentinel):
    def __getstate__(self):
        ...

    def __setstate__(self):
        ...


# the rest of code the same as in the simple factory example above

Similar pattern is used for typing.NamedTuple, typing.TypedDict as well as enum.Enum.
According to PEP’s rejected ideas, it was preferred to avoid it but I feel like it shouldn’t be rejected that quickly. IMO due to existing examples of such usage in Python, it could be easy for users to learn this interface somewhat naturally and would perhaps be easier to teach if the users already know how NamedTuple and/or Enum works.

There are probably other ways of achieving factory + subclass behavior that wouldn’t require a metaclass, one way would be allowing to use @sentinel as decorator while also allowing it to act as a factory and there are probably other options and they might be worth exploring too.

// TL;DR can be seen at the top

For those wondering what “proposed specification” and “the PEP” are that Jakub is referring to, I put up an early draft of a PEP on a branch in the PEPs repo (Note: This link will break once the temporary branch is deleted.)

I wrote it to summarize the discussions, organize my thoughts and explore the options. I stopped working on it late at night and sent a link to a few people to get some opinions. I intentionally didn’t make it public yet, but the cat is out of the proverbial bag now…

I was also thinking that. I’ve got a reasonable working implementation of MISSING = sentinel("MISSING"), but one which generates a class and an instance, rather than just using a class object as you suggest. AFAICT, the only drawbacks of this are using the same mechanism as Enum, logging and namedtuple of inspecting stack frames to find the module where it is defined, and using a naming hack instead of trying to set the class as a module attribute. I also didn’t like that the type isn’t directly available and would potentially have an awkward name, but being able to use Literal[MISSING] possibly makes that irrelevant.

I’m still not sure what would be preferable. (I originally intended to first think about it some more and get thoughts from a few initial reviewers.)

For those wondering, here is the (simplified) code:

import sys
if hasattr(sys, '_getframe'):
    get_parent_frame = lambda: sys._getframe(2)
else:
    def get_parent_frame():
        """Return the frame object for the caller's parent stack frame."""
        try:
            raise Exception
        except Exception:
            return sys.exc_info()[2].tb_frame.f_back.f_back

def sentinel(name):
    """Create a unique sentinel object."""
    repr_ = f'<{name}>'

    # This is a hack to get copying and unpickling to work without setting the
    # class as a module attribute.
    class_name = f'{name}.__class__'
    class_namespace = {
        '__repr__': lambda self: repr_,
    }
    cls = type(class_name, (), class_namespace)

    # For pickling to work, the __module__ variable needs to be set to the 
    # name of the module where the sentinel is defined.
    try:
        module = get_parent_frame().f_globals.get('__name__', '__main__')
    except (AttributeError, ValueError):
        pass
    else:
        cls.__module__ = module

    sentinel = cls()
    cls.__new__ = lambda self: sentinel

    return sentinel

Ahh, sorry about that, someone made a PR to that branch which triggered my GitHub notifs, and I (wrongly) assumed that I probably must have missed a mention of it in the email topic.

No worries :slight_smile:

Ah, but now I see that typing.Literal currently can’t be used with such sentinels.

FYI, I’ve created a second version of the draft PEP, and put up a GitHub repo with the PEP and a reference implementation.

2 Likes

It’s been nearly two weeks, so I’ve closed the poll.

This is now PEP 661: Sentinel Values.

Further discussion should be conducted here:

1 Like