TemporaryFile contextmanager that allows creating a directory entry on success

In the current implementation, if the OS supports O_TMPFILE, TemporaryFile has a very convenient behaviour: it creates an unnamed temporary regular file. Such file can be created in an arbitrary directory with the already_existing “dir” param.

It is technically possible ( see open(2) - Linux manual page for O_TMPFILE ) to link such unnamed temporary file in the filesystem.

Turns out, this is quite convenient, quite often; if I need to atomically create a file with a certain content in a certain path, the typical approach is:

  • Create a named temporary file in the same filesystem in a temporary directory;
  • Once the file content is written, flush and rename (or create an additional hardlink in the target destination)
  • Periodically, sweep the temporary directory and cleanup leftover temporary files (e.g. left there if the process working on them crashed hard)

This requires a bit of juggling, and it isn’t always easy, from the perspective of an app developer, to find a temporary directory which is mounted in the same filesystem, let alone think about a good approach to sweeping that directory.

O_TMPFILE solves these problems. The file is unnamed, and it can be directly created in the very same directory as the destination file. Once an operation is successful, it is enough to flush it, then link it before closing, and we get an atomic behaviour - file either exists on the fs with full content, or it doesn’t exist at all. If the process crashes the hard way, there’s no need to cleanup anything ( “Anything written to the resulting file will be lost when the last file descriptor is closed, unless the file is given a name.” - straight from manpage)

The current Python TemporaryFile implementation, if O_TMPFILE is available, uses the O_EXCL flag as well when opening; this prevents a user from manually doing that, and should be removed, but it seems it’s accidental since some flags are being reused, and shouldn’t be a matter (I’m opening a separate issue for that).

What I’m suggesting, though, is something a bit more advanced and that enforces the use of a context manager; I’d add something like

tempfile.ConditionallyLinkedFile(link_to_if_successful: Path, mode='w+b', buffering=- 1, encoding=None, newline=None, *, errors=None):

no dir, suffix, prefix, because “dir” would be the same dir as link_to_if_successful parent.

With the desired behaviour, so, when the contextmanager exits cleanly the file is flushed and linked, while it’s deleted otherwise.

Feedback requested:

  • Do you think that’s useful? It’s at least the 3rd time I need something like this in my programmer’s life;
  • Do you think naming is appropriate?
  • What to do in platforms that don’t support O_TMPFILE? How should we graceful degrade this behaviour? Should we degrade it at all, or just raise an exception, and let the user handle it, since it’s a very specific behaviour?
  • Anything else?

By Alan Franzoni via Discussions on Python.org at 14Sep2022 19:15:

In the current implementation, if the OS supports O_TMPFILE,
TemporaryFile has a very convenient behaviour: it creates an unnamed
temporary regular file. Such file can be created in an arbitrary
directory with the already_existing “dir” param.

It is technically possible ( see open(2) - Linux manual page for O_TMPFILE ) to link such unnamed temporary file in the filesystem.

Turns out, this is quite convenient, quite often; if I need to atomically create a file with a certain content in a certain path, the typical approach is:

  • Create a named temporary file in the same filesystem in a temporary directory;
  • Once the file content is written, flush and rename (or create an additional hardlink in the target destination)
  • Periodically, sweep the temporary directory and cleanup leftover temporary files (e.g. left there if the process working on them crashed hard)

This requires a bit of juggling, and it isn’t always easy, from the perspective of an app developer, to find a temporary directory which is mounted in the same filesystem, let alone think about a good approach to sweeping that directory.

Ignoring “crashed hard” for a moment, I always use the target directory
with the dir= parameter as you mention in your opening paragraph. This
solves the “it isn’t always easy, from the perspective of an app
developer, to find a temporary directory which is mounted in the same
filesystem” side of things.

Personally I’ve accept the need to sweep old temp files as part of the
cost because “crashed hard” is normally very rare (any exception is not
“crashed hard”) and just use NamedTemporaryFile. I can see that using
O_TMPFILE would be nice for an unnamed temporary file where available.

I’ve got a personal context manager for rewriting a file using that
approach:

  • with NamedTemporaryFile
  • link the successfully generated temp file to the target on exit

and a similar one for directories (make an entire directory tree using
TemporaryDirectory and rename to the target official name on exit).

So I certainly see the use case.

Cheers,
Cameron Simpson cs@cskk.id.au