Bikeshedding opportunity: help name a pathlib method

Background

As of Python 3.12.0 alpha 7, it’s possible to subclass from pathlib.PurePath and Path, plus their Posix- and Windows-specific stablemates.

A major use case for extending these classes is to implement embedded or remote filesystems - for example, a pathlib-like interface to .tar files or S3 buckets. This requires that underlying tarfile.TarFile or botocore.Resource objects to be shared between path objects, such as when such path objects are generated via path.parent or path.iterdir(). I’m looking to introduce an instance method that is called whenever such paths are created. The default implementation would look like this:

    def newpath(self, *pathsegments):
        """Construct a new path object from any number of path-like objects.
        Subclasses may override this method to customize how new path objects
        are created from methods like `iterdir()`.
        """
        return type(self)(*pathsegments)

As mentioned in the docstring, subclasses can customize this method to pass information around, e.g.:

class SessionPath(pathlib.Path):
    def __init__(self, *pathsegments, session_id):
        super().__init__(*pathsegments)
        self.session_id = session_id

    def newpath(self, *pathsegments):
        return type(self)(*pathsegments, session_id=self.session_id)

etc = SessionPath('/etc', session_id=42)
etc_hosts = etc / 'hosts'
print(etc_hosts.session_id)  # 42

The new method is mostly called internally (e.g. by the implementations of parent and iterdir()), but could also be called directly in user code:

from tarfile import TarFile, TarPath
readme_path = TarPath('README.txt', tarfile=TarFile('blah.tar.gz'))
license_path = readme_path.newpath('LICENSE.txt')

Question

What should the method be named? @AlexWaygood suggests newpath(). I quite like makepath().

Links

1 Like

I like both of the suggested names.

It kind of looks and feels like a classmethod to me, so from_path could also work. Is there a reason it’s not a classmethod?

It enables passing information from one path instance to another. In the SessionPath example above, that’s self.session_id. That would not be possible if it were a classmethod!

2 Likes

Ok, I read the OP more thoroughly and now I get it, thanks!

I hate to say it, but I don’t much like either newpath or makepath. In general I’m not a fan of the verbs do or make in method names; they are so bland and generic as to add no nutritious information to the name. And we don’t use new in Python, calling the class itself serves as a constructor.

Which leads me to my next point: this method is a constructor. The cited default implementation is literally equivalent to the constructor. The novel feature you propose is secretly propagating state information from one instance into the newly created instance. Explicit is better than implicit, so, how about we make it explicit? Add a context or state parameter to the base class constructors, then pull that context or state off the instance and pass it in explicitly. Or maybe, instead of passing in the state, you pass in the existing instance you want the new instance to be similar to, in which case the constructor argument would need a different name (sibling? prototype?).

[edit]
And isn’t this new method equivalent to instance.joinpath('/', *segments), at least on non-Windows?

4 Likes

An opaque “context” or “state” object doesn’t seem any more explicit. I’d prefer that subclasses are free to accept and store relevant stuff in their __init__() methods, e.g. session_id or tarfile in the examples above.

This I quite like, I’ll have a play, thanks! :slight_smile:

That would make relative paths absolute on non-Windows.

This was exactly my initial reaction as well, which makes me worry that this is an unintuitive design :confused:

I’d be interested to see an implementation building off @larry’s ideas, to see if it seems more intuitive :slight_smile:

1 Like

Spoiler alert: it’s way more intuitive. Always listen to Larry, folks! I’ll adjust the PR soon. We can then argue about “prototype” vs “sibling” vs “template” vs whatever :grin:

5 Likes

I’ve updated the PR to use Larry’s idea, and it’s much improved. In the PR I used “template” as the argument name, as it seems a tiny bit less jargon-y than “prototype”. OTOH it’s less precise and could perhaps be misunderstood as a format string, or something like that. I’m perfectly happy with either. Any thoughts/suggestions?

2 Likes

I later thought, maybe this could be a variant of copy or clone? You construct a clone, then overwrite the actual path but keep the rest. But I can’t think of any examples of specifically “clone and also modify as part of the same method call” in the standard library–all the clone and copy methods I know of don’t take arguments. So that would be novel. Anyway, if you’re happy with template (which wfm) I say go for it.

5 Likes

So, this creates a new instance of the same type, conceptually a modified copy of the original instance, where the (path) segments have changed, but everything else matches the original?

My standard rules directly tell me the name for such a thing: with_segments. (Similarly, a @classmethod creating the instance from scratch would be from_segments.)

3 Likes

I suppose there’s dataclasses.replace() which is along the same lines?

1 Like

Here’s a nonbinding straw poll to see how close we are to consensus. Open to all! Thanks :slight_smile:

  • New method: with_segments() / newpath() / clone() / etc
  • New initialiser argument: template / prototype / etc

0 voters

Maybe you could give us sample code, illustrating what each looks like? You said the template parameter to the constructor cleaned things up, so, let us see the before and after.

Here’s the SessionPath example from the original post, adjusted to use the new initialiser argument:

class SessionPath(pathlib.Path):
    def __init__(self, *pathsegments, template=None, session_id=None):
        super().__init__(*pathsegments, template=template)
        if template:
            self.session_id = template.session_id
        else:
            self.session_id = session_id

etc = SessionPath('/etc', session_id=42)
etc_hosts = etc / 'hosts'
print(etc_hosts.session_id)  # 42

And the adjusted TarPath example:

from tarfile import TarFile, TarPath
readme_path = TarPath('README.txt', tarfile=TarFile('blah.tar.gz'))
license_path = TarPath('LICENSE.txt', template=readme_path)
2 Likes

I like the initializer-argument approach as well. It seems very elegant. I worry, though, if it’s Pythonic - I struggle to think of a really clear precedent for doing things this way.

If this approach is chosen, I think prototype is a much more accurate parameter name than template. A “template” connotes something with semantically, deliberately blank or placeholder values that will be filled in; the template is not properly usable as-is. A “prototype”, on the other hand, connotes an example that will be copied or emulated with modifications, which is what we have here.

My only concern with “prototype” is that, in everyday usage, it tends to mean something like “an early sample or model built to test a concept or process”. E.g. “my prototype nuclear reactor is suspiciously warm”. I might be overthinking it!

3 Likes

Shouldn’t you also pass template to SessionPath’s parent’s initialiser as well? In case it’s used in diamond inheritance.

class ContextPath:
    def __init__(self, *p, template=None, context=None):
        super().__init__(*p, template)
        self.context = (template and template.context) or context

class SessionContextPath(SessionPath, ContextPath):
    pass

path = SessionContextPath()
path.session_id = 42
path.context = {"foo": "bar"}
child = path / "baz"
print(child.context)
1 Like

Yep good point - I’ve edited my post.

1 Like

I was specifically thinking of the Javascript concept of a “prototype”. But it isn’t directly analogous, and in any case Python isn’t Javascript, and the reason for the name may not be obvious to first-time readers of the API, and I don’t have a strong opinion about the right spelling of the name of the parameter.