Add `umask` to `contextlib`

Intro

A umask context manager is useful for when one requires files created by a section of a program to have a restricted set of permissions. For example it’s no good to

$ touch new-secret-file
$ chmod g=,o= new-secret-file

with a default umask of 0o022 active, as there’s a race condition present: anyone has read-access to new-secret-file before the chmod takes place. Therefore setting the umask to 0o077 before this whole block of code, and afterwards unconditionally restoring the umask back to whatever it was before (eg 0o022), is the correct practice (and renders the chmod-ing of each individual file unnecessary). This makes for a nice context manager.

I find myself writing / importing my own umask context manager in several projects and it would be nice to have it as a standard library import. I’ve seen other people do the same. A naive implementation is

@contextmanager
def umask(mask):
    umask_old = os.umask(mask)
    try:
        yield
    finally:
        os.umask(umask_old)

Implementation

But wait, we can make it even better! We make umask a class (like chdir is), rather than just a @contextmanager decorated generator. This allows a umask instance itself to be passed around as a re-usable object, rather than passing around the argument(s) to construct a new one each time it’s required. The proposed implementation is as follows:

class umask(AbstractContextManager):
    """Non thread-safe context manager to change the process's umask."""

    def __init__(self, mask):
        self.mask = mask
        self._old_mask = []

    def __enter__(self):
        self._old_mask.append(os.umask(self.mask))

    def __exit__(self, *excinfo):
        os.umask(self._old_mask.pop())

Remarks, Quirks, and Teaching

The style of having a list of _old_mask makes the context manager reentrant-safe as has been discussed here for chdir. I hadn’t originally considered this, and it’s a good example of why joining a discussion site like this is so useful: instead of rushing an implementation, first look at previous discussions!

As mentioned several times previously, this is in the same vein as the chdir context manager which was eventually added in 2021 per bpo-25625. The degree of contaminating side effects of umask(2) on sibling threads, or asyncio sibling tasks, is the same as that of chdir(2), and any objections / arguments seem to have already been discussed. umask(2) is a process-wide effect, and so long as this is sufficiently acknowledged in the contextlib documentation, contextlib.umask seems as acceptable as contextlib.chdir is.

The os import is already present in contextlib so there’s no more modules to be imported, which is nice.

11 Likes

+1

Why _old_mask is a list? I see also in chdir is the same:

Because you could nest usage of the CM. It’s normal to construct and use a CM in one shot, but it’s theoretically possible to do this:

dir = contextlib.chdir("/tmp")
with dir:
    with dir:
        pass
3 Likes

Hi Lucas,

The use of a list _old_cwd is making the context manager reentrant-safe.

Consider a poorly implemented bad_chdir() context manager as below:

import os
from contextlib import AbstractContextManager

class bad_chdir(AbstractContextManager):
    def __init__(self, path):
        self.path = path
        self._old_cwd = None

    def __enter__(self):
        self._old_cwd = os.getcwd()
        os.chdir(self.path)

    def __exit__(self, *excinfo):
        os.chdir(self._old_cwd)

Then re-entering a bad_chdir instance multiple times overwrites the _old_cwd variable, leading to incorrect return chdir-ing. An example is given below:

etc = bad_chdir("/etc")
usr = bad_chdir("/usr")

with etc:
    # etc._old_cwd = "/my/repo/dir"
    # cwd /etc
    with usr:
        # usr._old_cwd = "/etc"
        # cwd /usr
        with etc:
            # etc._old_cwd = "/usr"; uh oh, it got overridden!
            # cwd /etc
            pass
        # cwd /usr
    # cwd /etc
# cwd /usr; uh oh, now in the wrong directory!
# we should be in "/my/repo/dir"

To counter this, we use a list of _old_cwd as a stack of cwds that we push-onto and pop-off-of when entering and exiting the context manager.

Mutatis mutandis the same applies to umask.

2 Likes

The idea of the context manager is reasonable, but I don’t like the idea of adding even more shutil-like functionality to contextlib.

I know there’s precedent in chdir (which in turn cited the stream redirection CMs as precedent), but it’s still a big conceptual step from “tools for making and manipulating context managers” to “context managers for manipulating the current operating system process state”.

Perhaps this would be a good opportunity to change course and start adding these kinds of process state manipulation CMs to shutil instead of contextlib?

14 Likes

Hi Alyssa,

I’ve had a browse through man 2 syscalls to see if any more context manager-worthy situations appear, and none of them stand out as too pressing to me:

  • fchdir: effectively already implemented: as of Python 3.3 chdir (and other os functions like chmod and chown) accept file descriptor ints, in which case the f- prefixed version (fchdir, fchmod, fchown) is called for the user automatically. So one can simply use the chdir context manager.

  • set{,e,re,res}{u,g}id and friends: irrelevant / irreversible / too specific: the setuid bit doesn’t work for scripts, whose shebang causes the dropping of escalated privileges, so that’s not needed. And as for scripts which start off running as root, and later drop down to a normal user’s id, they do so in a non-reversible way, so a context manager is not needed.

  • flock / fcntl locks: too platform specific: Locking is a very permuted area, and use cases vary wildly as to whether one is using an exclusive lock, shared lock, whether it should be reentrant safe to avoid deadlocks… It’s not worth the headache.

So for the minute I think there won’t be any more os-mutating-feature-creep in contextlib. shutil doesn’t immediately stand out to me as the correct module for a umask context manager, and given that chdir and umask are siblings in their implementation it seems unnecessary to separate them. I can appreciate what you mean though, contextlib being about how to implement a context manager, with base classes / decorators, and maybe not being a place for a collection of implementations. We do however already have the os import in contextlib.py as I mentioned, so it’s not like there’s any extra overhead. If anything, adding umask to shutil would add more overhead as it currently doesn’t import contextlib.

So I’d suggest we add umask to contextlib? Let me know.

Happy new year from GMT+0 !

I second the notion that this would be better housed in os or shutil. (I assume shutil to avoid impacting interpreter startup time and allow the implementation to use contextlib`.)

I don’t think it has anything to do with implementing context managers, so I don’t see any strong argument for putting it in contextlib. IMO the chdir one is misplaced and that we shouldn’t follow that precedent.

All that said, I’m enthusiastically in favor of this addition! I’ve written my own versions many times over the years, and it will be nice to stop doing that.

(And right back at ya, Happy New Year!)

8 Likes

Browsing os — Miscellaneous operating system interfaces — Python 3.13.1 documentation rather than the syscall list itself may give a better sense of the potential feature creep that bothers me. It may be the case that all the current os.set* APIs for process groups and namespaces are a one way trip with no way to reverse the changes made, but I’m not sure of that, and even if it is the case right now, I’m not sure it will be the case forever (since OS APIs evolve over time, and sometimes quite rapidly in the case of Linux).

The reason shutil makes sense as a location for APIs like umask (and chdir) is in the name: this is a module for utilities related to providing shell equivalent functionality to Python scripts. Where accessing os is akin to making libc calls (or even making system calls directly), using shutil is more like calling a command in a shell (in this case, similar to the umask command, but with the ergonomics improved relative to the existing os.umask interface).

Adding contextlib.chdir was a genuine design mistake that missed a relevant technical difference with the stream redirection context managers: the latter do not affect the operating system process state, they only manipulate the Python level sys module attributes. Even with that technical difference, the stream redirection CMs themselves may still have ended up in shutil if they hadn’t been given as examples in the original PEP (so my recollection is that we didn’t think too hard at the time about whether contextlib itself was really the right home for them).

I’m genuinely not sure if the hassle of aliasing those existing APIs in shutil would be worthwhile, but I am convinced that we should avoid digging the design hole we have created for ourselves any deeper.

8 Likes

Mostly OT, but I would want to do it only if coupled with a clear timeline to deprecate and remove the old ones.

I wonder if we can find a new general approach for stdlib deprecations, based on how the user community has shifted in recent years. The threads about SyntaxWarning made me feel like somehow the user facing presentation of changes is getting garbled.
I’m trying to think if the PEP process can be used in a way similar to “dead batteries” to collect a large batch of cleanup changes, renames, incorrect defaults, etc, and then land them methodically in 3 or 4 releases. If I can figure out what my idea even is (!), I’ll start a pre-PEP thread.

Another reason for not adding umask to contextlib is that it’s OS-specific. There’s no concept of umask on Windows. The os, and to a lesser extent shutil, modules are where people expect to find os-specific functions, not in an otherwise portable module like contextlib.

8 Likes

Okay there’s quite a strong consensus within the likes and comments in this thread that shutil seems like the correct place, so I concede / agree. I’ll begin drafting the docs for umask under shutil, not contextlib.

I am convinced that we should avoid digging the design hole we have created for ourselves any deeper

So the feature has already ‘crept’ so to speak lol! contextlib.chdir is indeed the only context manager in contextlib that uses the os import, making it a one-off mis-step that can be ‘nipped in the bud’ now. Therefore perhaps it’s worth making another PR or discussion parallel to this ‘Add umask context manager to shutil’ one:

‘Move contextlib.chdir to shutil’: This would remove the os import from contextlib and add from contextlib import AbstractContextManager to shutil which keeps import-overhead level. It also keeps chdir and umask together, and sets a precedent of where possible future process-state-manipulating context managers belong.

contextlib’s chdir could then be deprecated by replacing it with something along the lines of

# import os   # not needed any more

def chdir(path):
    from shutil import chdir as chdir_shutil
    # print deprecation warning to stderr
    return chdir_shutil(path)

or whatever the formal process of re-housing stdlib content is :slight_smile:

2 Likes

If this umask context manager is added to shutil it would make sense to alias the existing ones into it, even without a deprecation timeline. It would of course be better if coupled with deprecation.

1 Like

Isn’t that what the “opener” argument of open() is for?

import os

# Custom opener function to set 0o600 permissions
def user_only(path, flags):
    return os.open(path, flags, mode=0o600)

# Open a file for writing using the custom opener
with open("secret_file.txt", mode="w", opener=user_only) as f:
    f.write("don't tell")

No. The mode in os.open(..., mode) gets masked out by umask by the the kernel. eg if the current umask were 0o777, then no matter what mode is passed to os.open the file will be created with no perms.

os.umask(0o777)

def custom_opener(path, flags):
    return os.open(path, flags, mode = 0o644)

with open("foo", "w", opener = custom_opener):
    pass

# ---------- foo

Also passing around a custom opener to every open call would be impractical, and pathlib.Path.open doesn’t support that, hence the need for a context.

2 Likes

That was my initial thought as well, but it turns out Microsoft implemented a basic version of it for C compatibility reasons, so Python’s own os.umask is cross platform.

I’m not entirely sure what it does (especially on NTFS), but it seems the capability exists.

This is the origin of my uncertainty:

  • I’m entirely convinced that the churn imposed by formal programmatic deprecation of the misplaced names isn’t worth it. Folks aren’t going to fail at writing Python programs in general because a handful of convenient context managers aren’t in the same place as their newer counterparts, so the usual motivation of making the language easier to learn is pretty weak in this case.
  • Taking the “zero overhead indefinite aliasing” approach (where shutil imports the implementations from contextlib) means that users of the APIs will need to choose their import location based on the minimum version of Python they want to support. Aliasing in the other direction (via lazy imports in contextlib) imposes a slight runtime overhead no matter what so we’d only do it that way if we were injecting runtime deprecation warnings.

Writing out the second option makes me more OK in using that “soft deprecation” approach for this situation, though. We could just mention in the shutil documentation for the relocated APIs that versions prior to 3.14 can be supported by importing the helpers from contextlib instead (and drop that note once 3.14 is no longer supported), in addition to keeping short notes in the contextlib documentation for each of the relocated APIs.

1 Like

Okay for ‘add umask to shutil’ I’ve added the implementation, tests, and documentation on this branch on GitHub here. Let me know what you think / if I should open a draft PR at this time for any tweaks.

Further on from that for ‘move chdir from contextlib to shutil’ I’ve created a branch here. This includes:

  • move chdir from contextlib.py to shutil.py

  • add a compatibility chdir function to contextlib.py in its place, with a deprecation warning

  • move existing tests for chdir from test_contextlib to test_shutil,

  • add a test to test_contextlib to make sure the compatibility contextlib.chdir function still works

  • update documentation for shutil and contextlib, mentioning the moving of chdir

May this latter branch warrant another discussion thread to follow on, since the title of this thread only indicates adding new content, not moving existing content? Let’s take it one step at a time :smiley:

3 Likes

I think that it was a mistake to make contextlib.chdir() reusable. The contextmanager generator implementation is simpler, and reusability is not usually needed for such kind of context managers. It is still not thread-safe and not interleaving-safe, so it does not solve much problems. It created a bad precedence.

For the case when you need a reusable context manager, we could provide a general helper which creates a context manager on-fly for every enter.

cm = contextlib.reusable(contextlib.chdir, path)
with cm:
    with cm:
        ...

And I think that contextlib should be a home for such helpers which allow you to easily create new context managers with different properties. For example, it would be useful to add helper to make an asynchronous context manager from synchronous context manager and helper which makes __enter__ and __exit__ thread-safe.

12 Likes

Your use case in the original post was creating a 600 file with an umask of 022. A 777 umask isn’t a reasonable configuration. The opener is more flexible: I typically use it to set both the permissions and file group. I haven’t found passing around a custom opener impractical, myself. You could use functools if you’d like:

i

import os
import functools

def _adm_open(path, flags):
    """
    Custom opener that sets file permissions to 0o640 and changes the group to gid=4.
    """
    fd = os.open(path, flags, mode=0o640)  # Open the file with specified permissions
    os.fchown(fd, -1, 4)  # Change the group of the file to gid=4 (-1 means user ID unchanged)
    return fd

# Create a partial function for opening files with the custom opener
admin_open = functools.partial( open, opener=_adm_open)

# Example usage
with admin_open('example.txt', 'w') as f:
    f.write('Hello, world!')

Again, there’s a race condition between calling os.open and os.fchown in which the old group has read access to the file for a moment. This is a security vulnerability and the incorrect way to do this. Perhaps one would be lucky and the default group id before chowning the file is also the user’s group, but that’s not for certain. That _adm_open should open(..., mode=0o600), chown, fchmod(..., mode=0o640) if the idea is to gift a sensitive file to the correct group.

My example was using a temporary umask of 0o077 to create a file with perms 0o600. A umask context manager is useful to avoid open, chmod race conditions / vulnerabilities, and also means no chmods or mode=...s are necessary within the context manager’s body, leaving code agnostic. This seems like the correct approach.

1 Like

There’s a few nice reasons for reusable / reentrant context managers, most of all:

  • If the context manager needs 20 arguments to construct it, ctx = MyContext(age=42, name="foo", dry_run=True, ...) it sure is easier to pass around a reusable ctx object than all the 20 arguments required to construct a single-use @contextmanager decorated try-yield-finally version every time it’s needed.
  • The user therefore passes around the instance / concept of the context itself, rather than passing around the arguments to create the context. This seems more Pythonic.
  • If one tried to pass around a @contextmanager context manager it breaks when trying to reenter:
@contextmanager
def dummy():
    try:
        yield
    finally:
        pass

ctx = dummy()
with ctx:
    with ctx:
        pass # error

Citation needed. It doesn’t hurt at all to add reentrant behavior, and since chdir() has already been deployed as reentrant safe since Python 3.11, we can’t break compatibility now, as we must assume that there are users out there already relying upon this behaviour. Let’s make umask the same.

I’ve had a think about this and I can think of a few types of reentrant context managers, and TLDR it would confuse the developer too much and is more a pain to refactor than monolithic __enter__ and __exit__ methods.

  • Firstly is an AbstractIdempotentReentrantContextManager, which peforms an action upon the first entry, and last exit, and otherwise does nothing. This would be useful say for acquiring a file lock without worrying about platform-dependent deadlocking if one tried to call flock again. Or a greeter, that should only say “Hello” and “Goodbye” once.
from contextlib import AbstractContextManager

class AbstractIdempotentReentrantContextManager(AbstractContextManager):
    def __init__(self):
        self.entry_depth = 0

    def __enter__(self):
        if self.entry_depth == 0:
            self.on_first_enter()
        self.entry_depth += 1
        return super().__enter__()

    def __exit__(self, *excinfo):
        self.entry_depth -= 1
        if self.entry_depth == 0:
            self.on_last_exit()
        return super().__exit__(*excinfo)

    def on_first_enter(self):
        pass

    def on_last_exit(self):
        pass

class GreetMeOnce(AbstractIdempotentReentrantContextManager):
    def on_first_enter(self):
        print("Hello")

    def on_last_exit(self):
        print("Goodbye")

ctx = GreetMeOnce()
with ctx:
    # Hello
    with ctx:
        pass
    # Goodbye

This is a bit like a C++ shared pointer, or Java reference counting? The only problem is one can do

with ctx:
    pass
    # Hello
    # Goodbye

with ctx:
    pass
    # Hello
    # Goodbye

which causes the context manager to call on_first_enter again. This may be undesired for some usecases, and so

  • An AbstractOneshotReentrantContextManager may be desirable
class AbstractOneshotReentrantContextManager(AbstractContextManager):
    def __init__(self):
        self.started = False
        self.finished = False

    def __enter__(self):
        if not self.started:
            self.on_first_enter()
            self.started = True
        return super().__enter__()

    def __exit__(self, *excinfo):
        if not self.finished:
            self.on_last_exit()
            self.finished = True
        return super().__exit__(*excinfo)

    def on_first_enter(self):
        pass

    def on_last_exit(self):
        pass

class NoReallyGreetMeOnlyOnce(AbstractOneshotReentrantContextManager):
    def on_first_enter(self):
        print("Hello")

    def on_last_exit(self):
        print("Goodbye")

ctx = NoReallyGreetMeOnlyOnce()
with ctx:
    # Hello
    with ctx:
        pass
    # Goodbye

with ctx:
    pass
  • And then there’s the stack based context managers which need to do something every time
class StackableReentrantContextManager(AbstractContextManager):
    def __init__(self, state):
        self.state = state
        self._old_states = []

    def __enter__(self):
        self._old_states.append(self.on_enter())

    def on_enter(self):
        raise NotImplementedError

    def on_exit(self):
        raise NotImplementedError

    def __exit__(self, *excinfo):
        self.on_exit(self._old_states.pop())

class chdir(StackableReentrantContextManager):
    def on_enter(self):
        old_state = os.getcwd()
        os.chdir(self.state)
        return old_state

    def on_exit(self, old_state):
        os.chdir(old_state)

chdir_etc = chdir("/etc")
chdir_usr = chdir("/usr")

print(os.getcwd()) # home
with chdir_etc:
    print(os.getcwd()) # /etc
    with chdir_usr:
        print(os.getcwd()) # /usr
        with chdir_etc:
            print(os.getcwd()) # /etc
        print(os.getcwd()) # /usr
    print(os.getcwd()) # /etc
print(os.getcwd()) # home (this is the place to check reentrant mistakes)

What a mess. Java-tier class name lengths and C++ level fiddly-ness. Not worth it.

Oh, and then there’s making asyncio task-level context managers. I’m not going to even bother trying to implement that one, with error handling.

So there’s some food for thought, but it seems to me the abstract classes would be way too OTT :rofl:

Let’s stick with the reentrant safe version of umask I’ve implemented.

2 Likes