I’ve had time to think on this since I need to pickle sentinel objects within my own projects. I’ve been experimenting on the typing-extensions
implementation.
I ended up with solutions to handle missing and anonymous sentinels, but now I feel like most of these solutions were overengineered and can be ignored. They handle anonymous sentinels (and sentinels becoming anonymous due to their original definitions going missing) by preserving their data instead of crashing:
Importing sentinels manually
First a function which fetches the obj at module_name.name
. Uses importlib.import_module
to get the module then operator.attrgetter
to support a qualified name
. This was as concise as I could get it.
def _import_sentinel(name: str, module_name: str, /) -> Any:
"""Return object `name` imported from `module_name`, otherwise return None."""
try:
module = importlib.import_module(module_name)
return operator.attrgetter(name)(module)
except ImportError:
return None # When module_name does not exist
except AttributeError:
return None # When module_name.name does not exist
Ironically None
is not the best return value here, but a plain object()
sentinel could work or these exceptions can be handled inline. This also returns anything even if the object at module_name.name
isn’t a Sentinel. I think this behavior could be beneficial because this allows for forward compatibility between third-party sentinel types.
Typical usage of sentinels involves the sentinel being defined once at import-time and then reusing that single defined object for all cases, but any unpicking happens at run-time so there needs to be a way to fetch an existing sentinels identity using a function. Using a registry works for this and the above _import_sentinel
function also works, but these fulfill slightly different needs.
_import_sentinel
fails if the object doesn’t yet exist including when the sentinel instance is being initially created. This means two Sentinel
’s with the same name
and module_name
can have different identities. So _import_sentinel
has problems on its own.
A registry always works for ensuring that a Sentinel
with the same name
and module_name
have the same identity, but it does not ensure that the identity of the returned object is the same as the one at module_name.name
. If module_name.name
was pickled and then later replaced by a third-party sentinel then the identity of objects will be split in two: the run-time identity (created by pickle) and the import-time identity.
A registry also handles cases where the sentinel definition has gone missing or is anonymous.
So my suggestion with _import_sentinel
is to add it to Sentinel.__new__
including the registry so that a sentinel of a different type at module_name.name
can be registered as the identity.
Strict keyword to verify anonymous sentinels
Sentinel.__new__
can now return an object with a type other than Sentinel
. This can cause multiple obvious problems. My solution to that is a strict
keyword. This defaults to True
and adds a runtime check to test that the object with the returned identity matches what would normally be returned from the parameters (including a theoretical bool=
or repr=
) given to Sentinel
.
class UNSET(): ...
UNSET = Sentinel("UNSET", strict=True) # TypeError: object <class> at module_name.UNSET is not a Sentinel type
assert Sentinel("UNSET", strict=False) is UNSET # Okay, registers and returns UNSET class as the sentinel identity
Sentinel("UNSET", strict=True) # TypeError because the registered object was not a Sentinel type and this is always checked when strict=True
MISSING = Sentinel("MISSING", bool=NotImplemented)
Sentinel("MISSING", bool=False, strict=True) # TypeError: can not redefine 'bool' in existing Sentinel
Note that strict=True
enforces the return of a Sentinel
type but strict=False
returns Any
.
Pickling anonymous sentinels with parameters
With all of that established, here is my proposed reduce function so far. A custom unpickle function which takes the module/name of the sentinel as normal but also has an options
dictionary which holds any given parameters for that sentinel.
class Sentinel:
...
def __reduce__(self):
"""Record where this sentinel is defined and its current parameters."""
options = {}
return (
_unpickle_sentinel,
(
self._name,
self._module_name,
options,
),
)
def _unpickle_sentinel(
name: str,
module_name: str,
options: dict[str, Any],
/,
) -> Any:
"""Unpickle Sentinel at 'module_name.name'."""
return Sentinel(name, module_name, strict=False)
Alternatively, throw most of that out and use pickle’s singleton functionality for __reduce__
. This was mentioned a lot but no examples were given and the library documentation for pickle doesn’t explain how modules are handled so I had to look at the source code to be sure (then I also tested this). Pickle looks for a __module__
attribute from the instance to determine which module the object was defined in. It is extremely simple in practice:
class Sentinel:
...
@property # Or assign to self.__module__ directly
def __module__(self) -> str:
"""Return the module this instance was defined for."""
return self._module_name
def __reduce__(self) -> str:
"""Reduce this instance to a singleton."""
return self._name # self.__module__ is used here
While this version doesn’t handle as many edge cases it is much simpler and will work for the typical use cases while still being forward compatible with any future methods of pickling or defining the object. Looking at them, I prefer this reduce method due to it behaving much more predictably compared to my overengineered alternative. A sentinel going missing has the same problems and solutions as any other pickled singleton going missing so it’s less of a issue because the workarounds for missing singletons are well known.
As long as the registry is kept then anonymous sentinels still work, but will raise pickle.PicklingError
on any attempt to pickle them.
Another option is to drop support for anonymous sentinels which will simplify the current implementation dramatically:
class Sentinel:
def __init__(self, name: str, module_name: str | None = None) -> None:
self._name = name
if module_name is None:
module_name = sys._getframemodulename(1)
if module_name is None:
module_name = __name__
self.__module__ = module_name
def __repr__(self) -> str:
return self._name
def __reduce__(self) -> str:
return self._name
But I don’t think the implementation is the issue with anonymous sentinels, it’s the syntax. It’s easy to not qualify the name of an anonymous sentinel in a scope and that could lead to name clashes or even clashes with existing top-level names within the module. There are run-time costs to catch only some of these mistakes.