Add support for pathlib.Path.format()

The pathlib.Path classes aim to provide a replacement for str and os.path... to handle paths. The templating capabilities of str.format() allow to define paths such as p = "/a/b/c.{ext}" and later on use p.format(ext="csv") or p.format(ext="json") to replace "{ext}" with the corresponding extension.

I find that ability to provide path templating capabilities convenient, and I believe a path object could easily be extended with a .format method.

A simple implementation monkey-patching pathlib follows:

import pathlib

def _format(self, *args, **kwargs):
    """
    Return a formatted version of the path, using substitutions from args and kwargs.
    The substitutions are identified by braces ('{' and '}').
    """
    cls = self.__class__
    return cls(str(self).format(*args, **kwargs))

pathlib.PurePath.format = _format

And then we can do:

my_file_template = pathlib.Path("/{onedir}/{name}.{ext}")
my_file = my_file_template.format(onedir="dir1", name="base", ext="csv")
# PosixPath('/dir1/base.csv')

I’m writing here to find out:

  • If others find this feature useful
  • If there are drawbacks I’m not considering
  • If this feature is worth a pull request with the implementation.

Any feedback is appreciated.

Thanks and apologies if this is not the best place to discuss this.

1 Like

To be honest, I don’t see a need for this to be built into pathlib.It’s rare that I’ve needed anything like this but if I did, I’d just implement a standalone wrapper in my own code. It’s only a couple of lines.

I can see why you’d think “it would be neat if this was built in” but in reality, there’s a lot more to adding something to the stdlib.

Adding it to pathlib would need to worry about all sorts of edge cases like what the code should do about my_file_template.format(onedir="but/you/said/one/directory", name="../ha/fooled/you", ext="csv/../../betyoucantworkoutwherethisis.txt").

Nobody would ever do this in reality, and for a personal implementation, “it doesn’t matter” is a perfectly fine response. But stdlib code has to care (or we get CVEs raised!) and yet there’s no really good one-size-fits-all answer, so the debates go on endlessly and even if something gets decided, the code is way mode complicated than the functionality warrants.

So -1 from me. Just write your own when you need it.

7 Likes

Thank you for your fast reply.

I assumed inputs to format() would be trusted, as it currently happens with strings. However as you mention, it would be opening a can of worms of possible issues and additional requests, so not worth the pull effort.

Thanks for your time!

Have you looked at the with_* methods of pathlib.Path? They allow you to do, afaict, exactly what you want.

(Writing on my phone so the following code is untested)

from pathlib import Path

a = Path("abc.txt")
b = a.with_suffix("py") # Path("abc.py")
c = a.with_name("c") # Path("c.txt")

These methods are probably my favorite part of pathlib, they make it super convenient to create a bunch of related file paths.

1 Like

How is that an improvement over this?

my_file_template = "/{onedir}/{name}.{ext}"
my_file = pathlib.Path(my_file_template.format(onedir="dir1", name="base", ext="csv"))
# PosixPath('/dir1/base.csv')

The only difference is which line calls pathlib.Path()

The main advantage I see of adding a pathlib.Path.format() method is semantic. The method defines a general formatting/templating system for filesystem Path-like objects.

Having a well-defined format() method in Path-like objects would imply that other modules offering Path-compatible objects would have an incentive to support such method as well. For instance, zipfile.Path() objects in the standard library are not so easily back-and-forth coercible to str. The proposed .format() method could be implemented in zipfile.Path objects to cover both the root zip file path and the inner path within the zip file.

The main drawback, as @pf_moore was saying, is that this templating method would eventually require supporting additional validation, to cover and control path injection possibilities. That could be a nice-to-have feature, but from my point of view not a necessary requirement, as the current alternative (using strings for paths) is not offering any kind of path-validation injection.

Semantically, pathlib.Path("/{onedir}/{name}.{ext}") is not a path.

If it it is a path, what is its parent path? It is not pathlib.Path("/{onedir}"), because name and ext can contain path component separator. Before substituting you cannot even split it on components.

2 Likes

This depends on how we define Path.format() substitutions. If the definition has the same flexibility of str.format() then you are correct. However, if the implementation of Path.format() didn’t allow path separators nor .. in substitutions (which would be reasonable) then I believe pathlib.Path("/{onedir}/{name}.{ext}").parent would be pathlib.Path("/{onedir}") without ambiguities.

So I see that as a reason for considering having a Path.format() method different than the str.format().

I’d like to hear @barneygale ’s opinion.

(Forgive the previous deleted post; my discourse-foo is weak tonight)

It’s an interesting idea. My initial impression is that with_name() and with_suffix() already cover the majority of use cases. They’re also fairly robust against the sorts of attacks @pf_moore identified: with_name() checks that the given name is non-empty, doesn’t contain path separators, and wouldn’t parse as a naked drive on Windows (i.e. p.with_name('c:') is an error); with_suffix() additionally checks that its argument starts with a dot and at least one more character. Users can also use relative_to() to remove (and then perhaps replace) a known prefix. Together, these features have always been sufficient for my needs. If there’s a way to express them more beautifully using format(), without losing validation or adding a complex implementation, I’d be interested to hear more, but I suspect that’s a very difficult bar to clear.

5 Likes

Dunno if this is a good idea or not, but it could be possible to combine parent[idx] and with_ to create a with_parent(idx, new_parent) that would behave something like:

a = Path("path/going/nowhere/fun.txt")
b = a.with_parent(0, "somewhere") # Path("path/going/somewhere/fun.txt")

Maybe this is closer to the suggestion in the OP?