Add `umask` to `contextlib`

mikeshardmind · January 7, 2025, 2:25am

I think this is better off in third party. The behavior people are showing here doesn’t actually match what I would want.

With nested use, I’d want the mask to be as restrictive as if all umasks had been applied, because each level of nesting had it’s own reason to restrict.

ncoghlan · January 7, 2025, 3:04am

Is there any reasonable way to enforce that in general?

Querying the current umask isn’t supported separately from changing it, and even if one API implemented that behaviour, anything else calling os.umask (or its C equivalent), would just be setting the umask directly rather than bitwise-ORing it with the current mask.

mikeshardmind · January 7, 2025, 3:13am

depends on your definition of reasonable I guess. If code convention requires that this be the one true way to change umask and you audit dependencies for it, you’re covered. I’d find it a little odd for libraries to be setting umask to begin with, that’s usually an application choice that library code would enable, not do automatically. It’s actually weird to me that someone would need this to be nestable or reentrant, or even revert on exit as a context manager, but from the perspective of what you’re conceptually asking python to do with the context… yeah, I’d want the bitwise or of all currently entered contexts.

For my own projects I just set umask to the least permissions I need to write files with at start and never change it again because it’s just there to prevent mistakes, not malice. (you can set more restrictive policies from outside the process if this is meant to be a hardened measure)

ncoghlan · January 7, 2025, 3:06pm

@erlendaasland and I just closed the associated GitHub issue for this proposal. Quoting my closure comment in full (with some links tweaked for the different forum):

Unfortunately, despite my initial enthusiasm for the idea, I’m coming to the same conclusion as @erlendaasland. Any addition to the standard library needs to address the mantra of “not every 3 line function needs to be a builtin” (or its far less snappy corollary, “not every 6 line class needs to be in the standard library”).

My initial enthusiasm was based on considering this as a way to influence the file creation of libraries and subprocesses where it isn’t feasible to alter their file creation mode directly, and it instead needs to be done indirectly.

However:

when it is possible to control the file and directory creation calls, setting the mode when creating the file is a better option than setting the process level umask

even on POSIX systems, if a directory has an ACL set, the process umask will be ignored, so this CM isn’t a guaranteed filter (the files being created need to be going into a directory without an ACL set)

even though os.umask is technically available on Windows, it isn’t a common approach to permissions management there (and we’re not even sure it works the same way it does elsewhere, since every NTFS folder has an ACL set, so the umask legitimately won’t have any effect)

This makes the proposed context manager far more use case specific than I originally thought it would be, which means it more appropriately belongs in an application or domain specific utility library than it does in the generally cross-platform and cross-domain standard library.

I did find the discussion valuable (and I genuinely appreciate your patience @jb2170), but I think the only lasting stdlib impact is going to be the pattern extraction exercise that resulted in the contextlib.ModificationContext proposal.

jb2170 · January 8, 2025, 1:14am

A comment on the GitHub issue also points out how ACL masks are restrictive (&=, not =).

These two comments have made me re-think, and I agree to the closing of the issue: The required functionality would need two separate contexts: set_umask and increase_umask.

My initial usecases were:

setting the umask to a ‘higher value’, say 0o077 for private-file creation
setting the umask to 0o000 for public socket creation. I don’t think I mentioned this one but it was in my head all along.

But what if one wanted to further restrict only the ‘world’ permissions to 0, ie increase the umask to 0oXY7, whilst leaving the group umask field Y untouched? One could naively guess that 0o022 is the default pre-existing umask, and so Y=2 leading to 0o027 could be sensible, but what if the pre-existing umask is 0o077? Setting the umask to 0o027 actually weakens the umask! Mutatis mutandis setting X=0 is purely an assumption that it too was 0 before.

So two separate context managers would be required.

class set_umask(ModificationContext):
    """Non thread-safe context manager to set the process's umask."""

    def _apply(self):
        return os.umask(self._applied_value)

    def _revert(self, previous_value):
        os.umask(previous_value)

class increase_umask(ModificationContext):
    """Non thread-safe context manager to increase the process's umask."""

    def _apply(self):
        old_mask = os.umask(0)
        new_mask = self._applied_value | old_mask
        os.umask(new_mask)
        return old_mask

    def _revert(self, previous_value):
        os.umask(previous_value)

set_umask cannot perform the behavior of monotonically-increasingly restricting the umask, &=, as the comments point out.

increase_umask cannot perform the behavior of loosening the umask, =, as would be needed eg for making public sockets.

Both are required for specific usecases, and noobs may well use the wrong context manager; we could no longer say in the documentation that ‘this is a simple wrapper around umask’.

Further technical details, as man 2 umask acknowledges:

It is impossible to use umask() to fetch a process’s umask without at the same time changing it. A second call to umask() would then be needed to restore the umask. The nonatomicity of these two steps provides the potential for races in multithreaded programs.

If a KeyboardInterrupt occured between the umask calls in increase_umask._apply that seems like a mess too! That would need to be acknowledged.

These two belong in a separate PyPI package, not in the stdlib, which I’ll make myself

As for the addition of ModificationContext to contextlib, I’d be against it:

Since no shutil.umask is being added, and ModificationContext is only used by contextlib internally, is it worth exposing another ABC to save a few lines of code?
Noobs may say ‘I thought all context managers modified something to an extent?’
It may spread the same issue as umask; I have the following in one of my codebases:

@contextmanager
def flocked(f, operation = fcntl.LOCK_EX):
    """
    Context manager for locking a file via `flock`
    """

    fcntl.flock(f, operation)
    try:
        yield
    finally:
        fcntl.flock(f, fcntl.LOCK_UN)

If it were to be refactored into a ModificationContext, should flocked try to set the lock to operation, or increase to operation; If one called flocked(f, fcntl.LOCK_SH) should that replace an exclusive lock with a shared one, or do nothing if an exclusive lock is already present, and we want to keep flocked monotonically increasing in the strictness of its effect. (n.b. I only do file locking in one place in reality, anything else is asking for trouble, this is just for arguments sake and to illustrate the pattern emerging)

ModificationContext can’t implicitly convey whether the context performs an &= restriction, or = substitution. The same would occur for a replace_attr CM to an extent too: if one wanted to replace the timeout(ms) attribute of a class with timeout + 5000. In this case it’s ‘restriction-like’ but += not &=.

ModificationContext
|
`- IncrementalContext  # relatively set value / attribute;
|                      # `+=`, `-=`, `&=`, `|=`,
|                      # aka RestrictionContext
|
`- SubstitutionContext # absolutely set value / attribute;
                       # `=`
                       # aka replace_attribute

Not worth the clutter, and it’s anti-beginner friendly, making them learn all this. “Just give me my @contextmanager!”

jb2170 · January 8, 2025, 1:15am

So after this umask thread, the chdir one, the ModificationContext one, and (not yet) a sete{u,g}id one , I suggest ‘status quo ante ~~bellum~~ disputationem’.

contextlib.chdir remains the odd-one-out, and there’s no reason to touch it if nothing else is being added, as you pointed out at the end of the chdir thread.

I’ve made a separate GitHub repo / PyPI package called ShellContexts.

This has been a long discussion, but it’s been well worth having. It’s somewhere to check against in future discussions about context manager additions. And it’s been a learning experience

Taking it slowly, even when just debating class names, has proven very wise, as patterns / ABCs have emerged as time has gone on. Had we rushed, there may had been oversights and-or missed opportunities. I’d like to thank everyone who contributed to these threads.

Most of all I’d like to thank @ncoghlan for being the arbiter, maintainer of contextlib, and for donating a lot of her time!

gerardw · January 9, 2025, 7:35pm

For the sake of completeness:

Securely opening a file via Python probably requires using the opener option of open() with the dir_fd parameter of os.open(path , flags , mode=0o777 , *** , dir_fd=None )

I think this because a deep dive into the C open man page states:

Rationale for openat() and other directory file descriptor APIs openat() and the other system calls and library functions that take a directory file descriptor argument (i.e., execveat(2), faccessat(2), fanotify_mark(2), fchmodat(2), fchownat(2), fspick(2), fstatat(2), futimesat(2), linkat(2), mkdirat(2), mknodat(2), mount_setattr(2), move_mount(2), name_to_handle_at(2), open_tree(2), openat2(2), readlinkat(2), renameat(2), renameat2(2), statx(2), symlinkat(2), unlinkat(2), utimensat(2), mkfifoat(3), and scandirat(3)) address two problems with the older interfaces that preceded them. Here, the explanation is in terms of the openat() call, but the rationale is analogous for the other interfaces. First, openat() allows an application to avoid race conditions that could occur when using open() to open files in directories other than the current working directory. These race conditions result from the fact that some component of the directory prefix given to open() could be changed in parallel with the call to open(). Suppose, for example, that we wish to create the file dir1/dir2/xxx.dep if the file dir1/dir2/xxx exists. The problem is that between the existence check and the file-creation step, dir1 or dir2 (which might be symbolic links) could be modified to point to a different location. Such races can be avoided by opening a file descriptor for the target directory, and then specifying that file descriptor as the dirfd argument of (say) fstatat(2) and openat(). The use of the dirfd file descriptor also has other benefits: • the file descriptor is a stable reference to the directory, even if the directory is renamed; and • the open file descriptor prevents the underlying filesystem from being dismounted, just as when a process has a current working directory on a filesystem.

Secondly the release notes for Python 3.3 note:

To avoid race conditions like symlink attacks and issues with temporary files and directories, it is more reliable (and also faster) to manipulate file descriptors instead of file names. Python 3.3 enhances existing functions and introduces new functions to work on file descriptors (bpo-4761, bpo-10755 and bpo-14626).

The os module has a new fwalk() function similar to walk() except that it also yields file descriptors referring to the directories visited. This is especially useful to avoid symlink races.

The following functions get new optional dir_fd (paths relative to directory descriptors) and/or follow_symlinks (not following symlinks): access(), chflags(), chmod(), chown(), link(), lstat(), mkdir(), mkfifo(), mknod(), open(), readlink(), remove(), rename(), replace(), rmdir(), stat(), symlink(), unlink(), utime(). Platform support for using these parameters can be checked via the sets os.supports_dir_fd and os.supports_follows_symlinks.

The following functions now support a file descriptor for their path argument: chdir(), chmod(), chown(), execve(), listdir(), pathconf(), exists(), stat(), statvfs(), utime(). Platform support for this can be checked via the os.supports_fd set.

I did not know any of this when the thread started. The security implications of the dir_fd argument are not mentioned in the Python docs.