pathlib.Path: add __format__() with some common escaping options

When using strings for paths, one can conveniently generate error or debugging messages using f"{pathname!r}" and conveniently get quoting and escaping. When using PosixPath, doing that will (correctly) get something like "PosixPath('/etc/passwd')".

Unfortunately, multiple format conversions (i.e. f"{path!s!r}" or f"{path!sr}"), which would be a possible shortcut, are not allowed.

This hints at the possibility of Path-specific format options for escaping, which could cover common use cases besides tidy user messages:

  • shlex.quote equivalent (shlex.quote("'") → '\'\'"\'"\'\''), convenient when building shell constructs
  • ls equivalent (' becomes '"\'"'), convenient when mentioning filenames in messages
  • repr(str(path)) equivalent that always adds the quotes, covenient when one always wants quotes around filenames in messages

So, assuming the three above examples would get formatting strings like :s, :q, :Q, you could do things like:

cmd = ["open", path]
subprocess.run(cmd)
log.info("Ran command: %s", ' '.join(f"{c:s}" for c in cmd))
print(f"Opening file {path:q}...")
raise RuntimeError(f"Input file {path:Q} has the wrong size")

While there is nothing here we couldn’t do without, it seems like a way to encourage cleaner ways of dealing with the unexpected that can be found in filesystem paths.

2 Likes

Do this maybe?

:>>> x = pathlib.Path('.bashrc')
:>>> x
PosixPath('.bashrc')
:>>> str(x)
'.bashrc'
:>>> print(f'This is a path {str(x)} repr')
This is a path .bashrc repr
:>>> print(f'This is a path {str(x)!r} repr')
This is a path '.bashrc' repr

I no longer rely on !r for putting quotes around things in messages. If you want quotes around your path in a string, put them there explicitly. The code remains just as simple, and the intent becomes obvious.

raise UsageError(f"The destination '{path}' must already exist.")

Neither of the other quoting options seem like they should be in output strings. You’re typically displaying the string to a user, who will be confused by the extra escape characters (which are shell specific, not Python specific). They can always use the shell’s own quoting if they want to paste it: command 'path with spaces'. Python’s subprocess doesn’t need that quoting either, it already handles ["command", "path with spaces"] correctly.

1 Like

It is strings that contain quotes that are the problem and why you see people use repr as it handles these cases nicely.

1 Like

Tagging @barneygale , the resident pathlib expert, for his take on this one.

1 Like

Somehow I’ve avoided learning about __format__() all these years, so take this reply with a large pinch of salt…

I’m open to the idea of a formatting shortcut for repr(str(path)), which is a common incantation for logging or raising exceptions. For simple filenames it produces a user-friendly result like 'foo.txt', but for filenames involve e.g. newlines or quote characters it still produces something reasonable and unambiguous.

I’m less sold on a shortcut for shlex.quote() or the like. It’s too niche to belong in pathlib IMO.

Q: does this apply to pathlib or os.PathLike more generally? I wonder if we could add two presentation types for path-like objects, like this:

  • {path:s} calls os.fspath(path)
  • {path:r} calls repr(os.fspath(path))
1 Like

We’d need to decide if we want some like str and int formatting, which have a strict set of codes and nothing else, or something more like datetime’s formatting, with some replacement codes, and everything else shows up in the output verbatim. Personally, I like the datetime’s take on it, especially if the format string will be user provided (say, in a log file configuration file).

Just spitballing here, but say you want to have %s be the string, %S be the repr, %p be the parent, and %P be the repr of the parent:

import os
from pathlib import Path, PosixPath


# Obviously incomplete implementation.
def format(self, spec):
    return (
        spec.replace("%s", os.fspath(self))
        .replace("%S", repr(os.fspath(self)))
        .replace("%p", os.fspath(self.parent))
        .replace("%P", repr(os.fspath(self.parent)))
    )


# Pay no attention to the man behind the curtain.
PosixPath.__format__ = format

p = Path("/foo/bar")
print(f"{p:result %S directory %P}")

Would print “result '/foo/bar' directory '/foo'”.

Of course in f-strings this is pointless, it’s in str.format calls where it’s powerful.

Why not allow multiple format conversions (f"{path!s!r}")?

I did have the same problem with other types, like yarl.URL.

1 Like

Yes, that’s another option (for another thread) to deal with some of the issues. But adding Path.__format__ might still be useful. I can see wanting to call repr(os.fspath(obj)) instead of repr(str(obj)) to catch things like None, where the second would succeed and the first wouldn’t.

Or rather, format(obj, “%s”) would fail on None but succeed on a Path, assuming ”%s” was valid for a Path object. It’s the same reason to use f”{obj:2d}" instead of f”{obj!s:>2}” if you know obj will always be an int.

1 Like

[…]

print(f"{p:result %S directory %P}")

The datetime format works that way to mirror C’s strftime. It’s useful for C where concatenating strings is a pain, but in Python you can easily do:

print(f"result {p:S} directory {p:P}")
"result {0:S} directory {0:P}".format(p)
"result {p:S} directory {p:P}".format(p=p)