Adding a `contextlib.ModificationContext` base class

(Extracted from the umask thread)

In discussing the possibility of adding a stdlib context manager to support applying and reverting temporary umask changes in a process, we identified a common implementation pattern across the following contextlib context managers:

  • contextlib.redirect_stdout
  • contextlib.redirect_stderr
  • contextlib.redirect_stdin
  • contextlib.chdir

The 3 stream redirection CMs already share a common base class (contextlib._RedirectStream), but contextlib.chdir is implemented independently.

The current implementation of contextlib.chdir looks like this:

class chdir(AbstractContextManager):
    """Non thread-safe context manager to change the current working directory."""

    def __init__(self, path):
        self.path = path
        self._old_cwd = []

    def __enter__(self):
        self._old_cwd.append(os.getcwd())
        os.chdir(self.path)

    def __exit__(self, *excinfo):
        os.chdir(self._old_cwd.pop())

The key pieces of the pattern are:

  • on entry, the existing value is stored, while the new value is applied
  • on exit, the previous value (stored on entry) is reapplied
  • old values are stored in a list to make the CM re-entrant

Additionally, while chdir doesnā€™t currently return anything from __enter__, the stream redirection CMs return the new stream.

While this pattern isnā€™t hard to use once youā€™re aware of it, thereā€™s no obvious path to learning it. We could add it to the contextlib documentation purely as a recipe, but I think we can go a step further and offer an abstract base class that makes implementing such contexts even easier.

Proposed API (combining my preferred naming with @jb2170ā€™s suggested public API from the other thread):

class ModificationContext(ContextDecorator, AbstractContextManager):
    """Context manager ABC to change a target value on entry and revert it on exit.

    Reentrant and reusable as both a context manager and function decorator.
    """

    def __init__(self, value):
        self._applied_value = value
        self._previous_values = []

    @property
    def applied_value(self):
        return self._applied_value

    def __repr__(self):
        return f"{type(self).__qualname__}({self._applied_value!r})"

    def __enter__(self):
        self._previous_values.append(self.apply())
        return self._applied_value

    def __exit__(self, *exc_info):
        self.revert(self._previous_values.pop())

    @abstractmethod
    def apply(self):
        """Apply the change and report the previous value to be restored.""
        raise NotImplementedError

    @abstractmethod
    def revert(self, previous_value):
        """Revert the change, restoring the given previous value.""
        raise NotImplementedError

Given that base class, contextlib.chdir would become:

class chdir(ModificationContext):
    """Non thread-safe context manager to change the current working directory."""

    @property
    def path(self):
        return self._applied_value

    def apply(self):
        previous_path = os.getcwd()
        os.chdir(self._applicated_value)
        return previous_path

    def revert(self, previous_cwd):
        os.chdir(previous_cwd)

The following example recipe should also be added to the documentation to illustrate storing extra state on a modification context (whether or not to add it to contextlib as a simpler, non-test-specific alternative to unittest.mock.patch can be a separate discussion):

class replace_attr(ModificationContext):
    """Non thread-safe context manager to replace an attribute on the given target."""

    def __init__(self, target, attr, value):
        self._target = target
        self._attr = attr
        super().__init__(value)

    def __repr__(self):
        return f"{type(self).__qualname__}({self._target!r}, {self._attr!r}, {self._applied_value!r})"

    def apply(self):
        previous_value = getattr(self._target, self._attr)
        setattr(self._target, self._attr, self._applied_value)
        return previous_value

    def revert(self, previous_value):
        setattr(self._target, self._attr, previous_value)

Additional notes:

  • The kinds of changes this API is intended to apply to shouldnā€™t require blocking IO operations, so Iā€™m not proposing to add an asynchronous variant of this API. We can always add AsyncModificationContext later if someone demonstrates the need.

  • In my original proposal, the public apply() and revert() methods exactly mirrored __enter__() and __exit__(), with subclasses implementing private _apply() and _revert() functions. @jb2170ā€™s API change was to make the subclass methods the public apply() and revert() methods themselves. I decided this made for a more flexible API than my version, since API consumers can decide for themselves whether to use the native context management support (either directly or via ExitStack), or some other mechanism of their own for passing state between the apply() and revert() calls.

  • There were a few potential names suggested for this base class in the previous thread, but Iā€™m going to play the ā€œcontextlib co-maintainer`ā€ card on this particular name (there are enough defensible candidates that I donā€™t think true general consensus is a likely possibility, but a combination of maintainer fiat + ā€œEh, thatā€™s good enoughā€ consensus seems achievable).

    ModificationContext is broad enough to cover pretty much any plausible use case (unlike SubtitutionContext which is sometimes stretching the terminology for in-place mutation operations), but not so broad as to almost certainly be confusing (unlike ChangeContext, which has that problem due to ā€œchangeā€ being a synonym for both ā€œmodifyā€ and ā€œmodificationā€, not just the latter). We also donā€™t need the Abstract prefix here - AbstractContextManager only has the prefix to distinguish it by more than letter casing from the contextmanager generator decorator.

3 Likes

Working on a branch for this as I type :eyes:

Ah thatā€™s good to know. The class name was getting a little long :sweat_smile:

1 Like

I think this will only work for values that are truly global, such as the umask and pwd. However I often find I write context managers that changes some setting that is thread-local. If we want this to be a generally reusable class, maybe we need to have some kind of support for storing the previous state in a ContextVar?

As long as the CM instance itself remains local to the thread or coroutine, the unwinding wonā€™t get confused. That limitation already exists when you implement this pattern directly.

It does mean the base class should carry the same ā€œThis is not thread-safeā€ caveat that the other CMs in contextlib already have.

contextlib.chdirā€™s existing documentation acknowledges its ā€˜non parallel-safeā€™ behaviour, and Iā€™ve made sure to copy that over to the (pending) shutil section for if/when umask is added. :slight_smile:

I havenā€™t done any work with contextvars sorry (other than briefly seeing them in Flask) so Iā€™d just be guessing.

Iā€™m about to head off for the night, just thought Iā€™d give an update: Iā€™ve created an Issue on GitHub for this ModificationContext ABC, and hereā€™s what it looks like WIP.

Iā€™ve a few questions about the class such as:

  • Should it inherit from ContextDecorator
  • Should _apply and _revert begin with an underscore as we donā€™t expect the user to call them from outside the class, or should they be ā€˜publicā€™ to indicate beyond the docstrings that they need implementing?

I guess since ModificationContext needs to subclassed anyway, folks can opt-in to the ContextDecorator behaviour if they want it, and leave it out if they donā€™t. So youā€™re right, probably better to inherit just from AbstractContextManager, and let subclasses make that behavioural choice.

In regards to making the apply and revert APIs public, it now feels to me that this is a case where ā€œconsenting adultsā€ applies: yes, by making them public, we provide opportunities for people to mess up using them (by failing to restore the returned previous value later), but we also provide opportunities for people to use the class in more ways than just the basic behaviour.

For example, defining a modification context that temporarily reverts a change without messing up its internal state management:

class revert_modification(ModificationContext):

    def __init__(self, reverted_cm):
        self._reverted_cm = reverted_cm
        super().__init__(reverted_cm.previous_value) # See note below

    def apply(self):
        self._reverted_cm.revert(self._applied_value)
        return self._reverted_cm.applied_value

    def revert(self, _previous_value):
        self._reverted_cm.apply()

This example would need an addition to the public ModificationContext API though:

    def previous_value(self):
        previous_values = self._previous_values
        if not previous_values:
            raise RuntimeError(f"No previous value currently recorded in {self!r}")
        return previous_values[-1]

Such an addition would raise the question of whether we should drop the argument to revert() in favour of subclasses calling self.previous_value, though (the same way they use self._applied_value when applying the change).

Alternatively, subclassing consistency could be obtained by passing the value to be set to both subclass APIs, so subclasses never have to go poking at the base class storage. apply() and revert() would still be defined as instance methods though, since some modifications may involve additional state (like the CM reversion and attribute setting examples).

class revert_modification(ModificationContext):

    def __init__(self, reverted_cm):
        self._reverted_cm = reverted_cm
        super().__init__(reverted_cm.previous_value)

    def apply(self, applied_value):
        self._reverted_cm.revert(applied_value)
        return self._reverted_cm.applied_value

    def revert(self, previous_value):
        self._reverted_cm.apply(previous_value)

The symmetry and simplicity of this last variant is definitely appealing, since subclass implementers donā€™t need to know anything about how the base class stores anything, just:

  • pass the value to be passed to apply to the base class initializer
  • return the value to be passed to revert from apply

I wonder:
Should ModificationContext be generic over the type of its value, so class chdir(ModificationContext[pathlib.Path]): ā€¦ ?
Otherwise I think we cannot properly tell typing tools the type of the changed value.

1 Like

Yes, I expect typeshed will type it as a generic.

Bringing over some highlights from @jb2170ā€™s post at https://discuss.python.org/t/add-umask-to-contextlib/75809/65:

  • modification context (as proposed) can only model outright target replacement, it cannot model incremental deltas that apply a change in the previously set value rather than substituting a new value directly
  • making the base class sophisticated enough to model both scenarios will make it harder to use than the ā€œappend restoration values to a storage listā€ idiom that weā€™re aiming to simplify

These are good points, which suggests that my first idea (describing the implementation pattern in the recipe section of the contextlib documentation) is a better way to go than trying to provide a general purpose abstraction.

How about we make it a generator decorator Ć  la contextmanager so itā€™s easier to use than subclassing an abstract class?

The example below omits any error handling for brevity:

class _ModificationContext:
    def __init__(self, func, value):
        self._func = func
        self._applied_value = value
        self._previous_generators = []

    def __enter__(self):
        generator = self._func(self._applied_value)
        self._previous_generators.append(generator)
        return next(generator)

    def __exit__(self, *_):
        next(self._previous_generators.pop(), None)

def modification_context(func):
    def wrapper(value):
        return _ModificationContext(func, value)
    return wrapper

so that:

@modification_context
def chdir(path):
    previous_path = os.getcwd()
    os.chdir(path)
    yield previous_path
    os.chdir(previous_path)

chroot = chdir('/')
chtmp = chdir('/tmp')
os.chdir('/etc')
print(os.getcwd())
with chroot:
    print(os.getcwd())
    os.chdir('/usr')
    print(os.getcwd())
    with chtmp:
        print(os.getcwd())
        with chroot:
            print(os.getcwd())
        print(os.getcwd())
    print(os.getcwd())
print(os.getcwd())

outputs:

/etc
/
/usr
/tmp
/
/tmp
/usr
/etc

Demo here

Or a slightly more helpful version that uses a yield expression to retrieve the previous value, which still supports the original usage:

@modification_context
def chdir(path):
    previous_path = os.getcwd()
    os.chdir(path)
    os.chdir((yield previous_path))

where _ModificationContext is modified to:

class _ModificationContext:
    def __init__(self, func, value):
        self._func = func
        self._applied_value = value
        self._exit_handlers = []

    def __enter__(self):
        generator = self._func(self._applied_value)
        value = next(generator)
        self._exit_handlers.append(lambda: generator.send(value))
        return value

    def __exit__(self, *_):
        with suppress(StopIteration):
            self._exit_handlers.pop()()

Demo here

This second version shouldnā€™t be necessary though, since the previous value should always be readily available in the same scope of the decorated generator function, as demonstrated in the first example.

I am for this - but also for including a ā€œthread and concurrency safeā€ version using ContextVars -

It may not work out of the box for every resource . (cwd being a ā€œhardā€ one), but for some others it could be rather feasible.

For example. I wrote a task-aware redirection of stdout a few days ago -
I didnā€™t need to reimplement contextlib.redirect_stdout there, rather, I just built a stream-proxy class on top of it. For an S.O. answer which is here: Suppress stdout of a coroutine without affecting other coroutines in python - Stack Overflow )

Existing modification contexts in the standard library either yield None (chdir) or the applied value (redirect_stdout and friends). The previous value isnā€™t made readily available to the caller in either case.

We usually donā€™t need the return value of __enter__ for most usage since the caller usually already knows the target applied value, but yes, we can make __enter__ return self._applied_value instead just like your implementation does indeed.

When I said ā€œthe previous value should always be readily availableā€ I mean that the generator function decorated by @modification_context should always maintain a reference to the previous value with which it can perform restoration after resuming from yield, such as the previous_path variable in my example for chdir:

@modification_context
def chdir(path):
    previous_path = os.getcwd()
    os.chdir(path)
    yield previous_path
    os.chdir(previous_path)

or for redirect_stdout, the old_target variable:

@modification_context
def redirect_stdout(new_target):
    old_target = sys.stdout
    sys.stdout = new_target
    yield old_target
    sys.stdout = old_target

so the previous value is readily available in both cases as you see, but for sure we can go with the second version using the yield expression just for extra convenience.

Right, my point was that none of the existing stdlib CMs make that saved-for-restoration value externally accessible (and Iā€™m not aware of us receiving any requests for them to do so), so thereā€™s no compelling reason to give examples that do that. Any examples should behave the same way the existing stream redirection CMs do, as in:

@modification_context
def chdir(path):
    previous_path = os.getcwd()
    os.chdir(path)
    yield path  # Note: NOT previous_path
    os.chdir(previous_path)
@modification_context
def redirect_stdout(new_target):
    old_target = sys.stdout
    sys.stdout = new_target
    yield new_target # Note: NOT old_target
    sys.stdout = old_target

That said, I think this variant is getting too far into the territory of hiding flow control, even moreso than the subclass based API. The behaviour of both of the above examples gets substantially more obvious when just using contextmanager directly:

@contextmanager
def chdir(path):
    previous_path = os.getcwd()
    os.chdir(path)
    try:
        yield path
    finally:
        os.chdir(previous_path)
@modification_context
def redirect_stdout(new_target):
    old_target = sys.stdout
    sys.stdout = new_target
    try:
        yield new_target
    finally:
        sys.stdout = old_target

Yeah that does look more consistent with the current behavior. As you see I didnā€™t think much about this part since I donā€™t think anyone is using the return value of __enter__ for chdir or redirect_stdout when the caller is the one giving the context manager the new value (so it must have the new value already and doesnā€™t need it returned).

But yes we can make the decorated generator function yield the given new value instead like in your example, and the aforementioned @modification_context decorator will work just as well. Itā€™s really the generator function thatā€™s responsible for reverting to the previous value in the same scope anyway.

But @contextmanager isnā€™t reentrant, which I believe is the whole point of your modification context. Iā€™m just trying to simplify the usage of your idea with the convenience of @contextmanager while keeping the context manager reentrant.

The impact of re-entrancy on modification contexts is subtler than that.

The contextmanager decorator deals with it by raising an exception if you try to re-enter them, forcing you to make a new one each time. (their interaction with ContextDecorator then uses a private API that lets __call__ recreate the CM each time)

Files (and similar resource management CMs) make re-entry have no effect, but clean up the resource on the first exit (and then the additional exits also have no effect).

Finally, the list based CMs avoid the runtime overhead of using contextmanager, but that then means they have to come up with their own way of handling the re-entrancy problem. Since the process state manipulation CMs werenā€™t thread-or-coroutine safe anyway, the simplest way of dealing with that was to push-or-pop values from a list.

The one thing you do NOT want to implement is a class based CM that stores a single value to restore, but doesnā€™t prevent re-entry, as thatā€™s a recipe for folks accidentally losing the original value that should have been restored when they incorrectly re-enter a single instance of the CM.

So the re-entrancy in the class based CMs was more a matter of having to deal with the re-entrancy question somehow, and ā€œmake it workā€ being the chosen option over ā€œmake it failā€. Both are valid options, and contextmanager already provides the second one.