Sentinel values in the stdlib

Hi,

Following some discussion on the python-dev mailing list, I’d like to poll your preferences on how we define sentinel values for optional function arguments.

This is a minor detail, so ISTM most important that we reach a reasonable decision quickly, even if that decision is that nothing should be done.

To quickly recap: The question is how to implement a sentinel value for an optional function argument where None isn’t suitable. There are many of these in the stdlib, with some recently added examples used in the dataclasses module and in traceback.print_exception(). Many of these, especially those defined using the common SENTINEL = object() idiom, suffer from some drawbacks:

  1. Unclear and long repr which makes functions’ signatures long and hard to read
  2. Impossible to create a strict type signature for, due to having no dedicated type
  3. Get a new, distinct object after pickling and unpickling

The last of the above (surviving pickling) can perhaps be considered unimportant; see additional discussion on this related bpo issue.

POLL

From now on, without necessarily changing existing stdlib code…

How should we define user-visible sentinel values for optional function arguments in the stdlib?

  • Consistent use of an idiom: Minimal dedicated class
    class _SENTINEL: pass
    SENTINEL = _SENTINEL()
    
  • Consistent use of an idiom: Minimal dedicated class with custom repr
    class _SENTINEL:
        def __repr__(self):
            return f'{self.__class__.__module__}.SENTINEL'
            # or: return '<SENTINEL>'
    SENTINEL = _SENTINEL()
    
  • Consistent use of an idiom: a single-value enum
    class _SENTINEL(Enum):
        SENTINEL = 'SENTINEL'
    SENTINEL = _SENTINEL.SENTINEL
    
  • Consistent use of a new, dedicated sentinel factory / class / meta-class
    from sentinels import make_sentinel
    SENTINEL = make_sentinel('SENTINEL')
    
  • Consistent use of a new, dedicated sentinel factory / class / meta-class, also made publicly available in the stdlib
  • Consistent use of Ellipsis (a.k.a. ...)
  • Consistent use of a single, new, additional sentinel value (e.g. Sentinel)
  • Something else
  • The status-quo is fine / there’s no need for consistency in this

0 voters

2 Likes

I have long been using a singleton called NotGiven (defined in my mx.Tools) for keyword arguments without a reasonable default value. This is similar to None and NotImplemented we already have.

3 Likes

Not sure if this has been raised, but it’d be nice to have some way to import the return type of make_sentinel (if that route is chosen). Type hinting would be difficult otherwise.

1 Like

Indeed, this is a good point! If we go this route we should make that reasonably nice.

(There are several way to achieve that, but let’s hold back on bike-shedding for now.)

Exactly – and I’m sure you’re not the only one. I think it would be a good idea to have one (or others) standard sentinels that are in a common place in the stdlib. NotGiven or MISSING at least.

They wouldn’t need to have special status in the language like None, and no one would have to use them, but if you have multiple libraries all defining something similar, it just makes sense to have a common way to do it.

I don’t think we should have a stdlib-wide MISSING value. What if you wanted a dataclass with a field whose default value is some other module’s MISSING value? What’s the advantage of making this stdlib-wide?

1 Like

Of course I know that a module’s own MISSING couldn’t be used as a value where that sentinel could be expected, but such is life. I think limiting a cross-module version of that problem is a good thing.

As far as I can tell, the main advantage is that it would be simpler.

The main disadvantage is that it would less obviously be correct in various edge-cases. A nice thing about private, single-use sentinel values is that they are easy to reason about and use with confidence.

Another nice property of one-off sentinels is that they can be given names which are meaningful in their specific context.

With cheap, easy, readable single-use sentinels, one could define a specific sentinel value for each relevant function argument, rather than being limited to one (or a few) per module. So I’m not sure that “such is life”.

My point is: there will always be places a particular sentinel can’t be used. No sense having more of those places than you reasonably need. The worst case would be a Python-wide sentinel for (say) MISSING.

I voted for both idiom with custom repr and public class. I am thinking specifically of a class in types with a name parameter for the repr and whatever other code that would be useful.
import types; sentinal = types._Sentinal('skip')

I use sentinels for indicating workers to stop in my multiprocessing library, passed via a queue. In order for that to work, I have all instances of a class compare equal.

class _Sentinel:
    def __eq__(self, other):
        return isinstance(other, self.__class__)

sentinel = _Sentinel()

This means that sentinel checks use equality == instead of identity is. This is fine for my use-case with iter(queue.get, sentinel)

Yes, simpler, and more readable: “There should be one— and preferably only one — obvious way to do it.”

The advantage I see is that every time someone sees a function signature with “MISSING” (or whatever it’s called) it means the same thing. And I’m not doing much with typing, but it strikes me that it would be helpful there, too.

Note that I do not advocate any kind of enforcement – if you have a compelling reason to make a custom sentinel, then by all means do so.

Anyway, I certainly agree that a standard sentinel or two is less compelling than a standard way to make a custom sentinel.

Thinking more, I think dataclasses._MISSING_TYPE is an unusual special case.

If there were a standard MISSING sentinel, then a signature like:

def fun(x=MISSING):

would clearly mean what it means :slight_smile: and like the currently common use of None for this, calling the function with:

fun(MISSING)

would have the same effect as not passing that parameter.

In the common cases, using None can be problematic, because
None has many uses other than specifying a missing value, so one might want to make a distinction.

But dataclasses.field is unique – it is used to build a function signature itself – so you may need to pass through a “standard” MISSING, and thus need a custom Sentinel in that case.

I’m having trouble coming up with that use case, but I’m not very imaginative.

Again, I am by no means suggesting that a standard MISSING should be the only one anyone uses. As for the original reason for this thread: dataclasses._MISSING_TYPE is a good case for a nicer repr:

Signature:
dataclasses.field(
    *,
    default=<dataclasses._MISSING_TYPE object at 0x7f972f956dc0>,

That is not pretty :frowning:

“MISSING” to me sounds like an error.

“This parameter has been deliberately left unspecified” is better (or perhaps a shorter version that implies the same :wink: )

I’ve used sentinel = object() in my code, and yet I’m hesitant on the proposal.

  1. Common use could lead to anti-patterns, for example, the idea that the code in question tries to infer “who” called it. Some logging libraries inspect stack to determine the call site, and while that works in naive cases, that prevents wrapping the functions in this libraries.

  2. There will be a minor gotcha if someone reloads the module: the id of the object, or possibly type will be different generation-to-generation, something that calling code is not aware of.

  3. Should there be a common “UNSET” for all, or one for stdlib (but not user code), or one per module or one per function? It’s something that the developers will have to remember, for cases like this:

# module a
def foo(x, y=UNSET): ...

# module b
def bar(baz, qux=UNSET):
    ...
    foo(42, qux)

OT: perhaps Python needs to gain ruby-like or es6-like symbol type?

2 Likes

Heh. I did exactly the same thing. I wonder how many others reinvented this particular wheel.

The next step would be to change the dict update and function argument assignment mechanics. Thus

>>> def defaulted(foo=123, bar=42):
...     print(foo, bar)
... 
>>> defaulted(NotGiven, bar=NotGiven)
123, 42
>>> 

I’d love to get rid of the incessant

if foo is NotGiven:
    foo = 123

boilerplate code, which has the additional disadvantage that it hides the default values from introspection.

Thanks for bringing up these issues! They haven’t been mentioned in these conversations before.

In the stdlib, logging, namedtuple and Enum all inspect the stack frame to determine the call site. Is that a significant problem?

Some sentinel implementations don’t have this issue though, e.g. a meta-class:

class NotGivenType(metaclass=SentinelMeta):
    name = 'NotGiven'
NotGiven = NotGivenType()

With reload() no longer being a built-in, and with reload() being known to cause various issues such as this, is gracefully handling that a significant consideration?

WARNING: Wall of text thing incoming!

TL;DR
  • I think that the proposed specification will cause unnecessary long code that will just be unwanted boilerplate for the users
  • I personally want proper typing support so I currently use single-value Enum for sentinels over object()
  • I think that having the sentinel class be both the type and the actual sentinel value would make it simpler to use
  • I think that preferably both simple factory and class-based syntax should be supported in some way
  • From what I can tell, the proper support from type checkers will require changes to them regardless of the chosen path

The nice thing about NotGiven = object() is that it’s short, I feel like the currently proposed sentinel decorator still results in a lot of boilerplate (especially when you consider that most code styles would require more whitespace than shown in the PEP).

But, object() plays badly as a sentinel when it comes to type checkers so I usually have to resort to the usage of Enum:

import enum

# regular class syntax can be used here but it's longer
# which IMO makes the whole thing more distracting
_MissingType = enum.Enum("MissingType", (("MISSING", "MISSING"),))
MISSING = MissingType.MISSING


def func(arg: Union[int, _MissingType] = MISSING) -> None:
    if arg is MISSING:
        print("arg is missing!")
    else:
        reveal_type(arg)  # should be int
        print(f"arg + 5 = {arg+5}")

I think that other than the automated __repr__ (and the benefit of it possibly becoming widely used standard), sentinel doesn’t buy you much and I feel it could buy you more.
For one, it’s IMO not that nice to use because it requires you to define both the type and instance of it and that could be prone to errors (perhaps I’m overstating a bit though). Personally, I think that cls.__new__ = lambda cls: cls would work alright here and would greatly simplify this for the user.
It would require a special support from the type checkers (it would need to be supported to put sentinel class in Literal[] or make it possible for it to directly act as a final non-subclassable type that can be used in type annotations) but so does current proposal as far as I can tell.

Another thing which was brought up before already, but I want to reinstate it - I think it would be nicer if sentinel could be used both as a simple factory:

# perhaps there could be some sensible default?
MISSING = sentinel("MISSING", "<MISSING>")


# MISSING could be used as a proper type, or be used in `Literal`,
# I don't really know which option is better, Enum supports the latter
def func(arg: Union[int, Literal[MISSING]] = MISSING) -> None:
    if arg is MISSING:
        print("arg is missing!")
    else:
        reveal_type(arg)  # should be int
        print(f"arg + 5 = {arg+5}")

As well as be subclassable if the user needs more from it:

class MISSING(sentinel):
    def __getstate__(self):
        ...

    def __setstate__(self):
        ...


# the rest of code the same as in the simple factory example above

Similar pattern is used for typing.NamedTuple, typing.TypedDict as well as enum.Enum.
According to PEP’s rejected ideas, it was preferred to avoid it but I feel like it shouldn’t be rejected that quickly. IMO due to existing examples of such usage in Python, it could be easy for users to learn this interface somewhat naturally and would perhaps be easier to teach if the users already know how NamedTuple and/or Enum works.

There are probably other ways of achieving factory + subclass behavior that wouldn’t require a metaclass, one way would be allowing to use @sentinel as decorator while also allowing it to act as a factory and there are probably other options and they might be worth exploring too.

// TL;DR can be seen at the top

For those wondering what “proposed specification” and “the PEP” are that Jakub is referring to, I put up an early draft of a PEP on a branch in the PEPs repo (Note: This link will break once the temporary branch is deleted.)

I wrote it to summarize the discussions, organize my thoughts and explore the options. I stopped working on it late at night and sent a link to a few people to get some opinions. I intentionally didn’t make it public yet, but the cat is out of the proverbial bag now…