Bikeshedding opportunity: help name a pathlib method

I think template= is a good choice in this case. Certainly better than prototype= IMO.

For my own curiosity, I checked through a couple more words and didn’t find anything better. Out of several discarded alternatives (see below), the only other long-shot candidate I could think of is mimic=, which is short, to the point and unburdened by other interpretations (like “string template”), but OTOH it’s a pretty advanced verb and we might not want to be that creative.

Some more alternatives considered, all inferior to template IMO:
  • adhere[_to]=
  • blueprint=
  • guide=
  • model=
  • outline=
  • pattern=
  • precursor=
  • stencil=

H. Vetinari said:

Some more alternatives considered, all inferior to template IMO

Larry said:

you pass in the existing instance you want the new instance to be similar to

Karl said:

I like the initializer-argument approach as well.

So the suggestions I will make are from the conversational descriptions
of the argument:

similar
similarTo
like
congruent

initial
initialize
initializeFrom

inherit

Not that template is bad, but it is overused for other things. So
is inherit, although

Am I right that this is intended for internal usage of the Path subclass. For users of the subclass, all use cases I can think of are served by:

  • tarpath.joinpath("/", *segments)
  • tarpath.joinpath(*segments)
  • TarPath(*segments, tarfile=other_tarpath.tarfile)

I think we should encourage users using the 3 options above, rather than calling the new method/argument. That should be only the interface between the superclass and subclass. And we should focus on what the person writing the subclass needs.
If that sounds reasonable, the idea in the original post looks pretty good:

class TarPath(pathlib.c):
    def __init__(self, *p, tarfile):
        ...
    def newpath(self, *segments):
        # called by PurePath.parent and such
        # this is the piece that "knows" what to propagate to the constructor
        return type(self)(*segments, tarfile=self.tarfile)

And while newpath (or even __newpath__!) is as bland as __new__, like __new__ it’s an internal detail of the class. It should be bland!

1 Like

I prefer the private-ish API in Petr’s last post, but if it was going to be an initialiser argument I’d put up based_on= or copy_from= as alternate names.

FWIW, hTemplateFile is used in the Win32 API for when you want to create a file that has the same attributes as an existing file. So template doesn’t bother me at all (not that I’ve ever used that parameter before!)

2 Likes

I’m really torn. I’m beginning to appreciate the downsides to adding an initialiser argument, such as:

  1. pathlib.PurePath and Path have gained an initialiser argument which does nothing, except in subclasses.
  2. The prospective TarPath class has a more complicated signature, where users must provide either of the template or tarfile arguments (individually optional). Whereas in the original proposal the initialiser has a single required tarfile argument.

The with_segments() or __newpath__() name suggestions are quite compelling I think. Compared to newpath() or makepath(), it’s more obvious that they’re not simple constructors and could not be made classmethods.

5 Likes

FYI, hTemplateFile has always been documented incorrectly. CreateFileW() copies extended file attributes (i.e. name/value items) from the template file, but not normal file attributes (e.g. not readonly, hidden, system, etc). Extended file attributes are pretty much unused outside of the kernel and device drivers. The Windows API provides no function to get or set them, so hTemplateFile isn’t generally useful in practice.

A more common and practical example is the lpTemplateDirectory parameter of CreateDirectoryExW(), from which the system copies file attributes, extended file attributes, the case-sensitive attribute, alternate data streams, and symlink or mountpoint (junction) reparse data. If Python had a copylink() function, on Windows it could use CopyFileExW() w/ COPY_FILE_COPY_SYMLINK for file symlinks and CreateDirectoryExW() for directory symlinks and mountpoints.

1 Like

That one is actually great, unburdened by the implications of prototype (a kind of object orientation, seen in lua, javascript or python named tuples) or template (a string concept). But this does not matter much for an internal name!

3 Likes

I still lean towards adding a new parameter to the constructor. Apart from all the other points mentioned above, I like the simplicity of having just one method to worry about. Adding a new method means we have to think a little harder with every Path construction about whether we should be constructing it the “obvious” way (via a direct call to Path()), or via the new makepath/newpath/__newpath__ method. I feel like the solution of adding a new parameter to the constructor will help keep pathlib more maintainable. This may be an “internal” concern, but end users are ultimately also well served by keeping pathlib easy to understand and maintainable, as it improves the likelihood of pathlib bugs being fixed promptly.

I’m not sure I agree that the added parameter would be mostly for CPython-internal use cases. If I understand the proposal correctly, the idea is partly to make it easier for third-party projects to subclass and extend pathlib.Path. That means that we have to document the parameter, which makes it very much public.

Having said that, if we want to emphasise that the parameter should generally be ignored by end users constructing their Paths, one option could be to prefix the parameter name with an underscore — _template or _blueprint.

I also really like “blueprint” as a name, actually! I probably prefer it to “template”.

A synthesis of the two methods could be to expose something like a __context__ property which is a dict passed to the constructor.

class SessionPath:
    @property
    def __context__(self):
        return dict(session_id=self.session_id)

# share context
newpath = type(path)(segments, **path.__context__)
1 Like

Well, you need to think either way. With an argument, do you call the constructor with blueprint or with a class-specific data (like tarfile=...)?

The API should be easy for the dev who writes classes like TarPath. And then, the decision is easy: use the subclass-specific API, e.g. TarPath(..., tarfile=...). The only time you need __newpath__ or blueprint is when implementing the constructor(s).
The API should be also easy for users of TarPath. There, the decision is even easier: forget about __newpath__ or blueprint, you don’t need them.

But if you’re writing pathlib itself, or generic code that needs to work with several different flavours of Path subclasses, then you need to call __newpath__ or pass blueprint. I argue that this should be very rare outside pathlib, and that if you’re in that deep, it’s OK if you need to think a bit harder.

No, it wouldn’t. Both CPython and the subclass (like TarPath) would use it. But the users of TarPath should not.
Like __new__ or __init__.

2 Likes

To be honest I don’t see much difference between “remember to call __newpath__()” and “remember to supply the template argument”. If anything, a different method seems more likely to stand out in code review than an additional argument!

I reckon that working with several Path-ish objects will become more common, and that we may eventually add a pathlib.AbstractPath class as a shared base for pathlib.Path, tarfile.TarPath, etc. Users could write functions that accept pathlib.AbstractPath, and we could even do that in standard library modules like shutil. Perhaps:

# Upload to S3 from a `.tar` archive
source = TarPath('src', tarfile=...)
target = S3Path('mybucket', '1.2.3', 'src', credentials=...)
shutil.copytree(source, target)

… but I still agree with the thrust of your post. Even if a future user function accepts AbstractPath objects, it’s unlikely to need a method that derives a new path object with all segments replaced. The method is rarely relevant to users unless they’re defining a subclass.

For me, the clincher is the relative simplicity of the initialiser signatures:

TarPath(*pathsegments, blueprint=None, tarfile=None)  # new argument
TarPath(*pathsegments, tarfile) # new method

This is the bit that users will see, and the latter is much easier to explain IMHO.

Edit: here’s a fuller side-by-side:

# with new argument
class TarPath(pathlib.AbstractPath):
    def __init__(self, *pathsegments, blueprint=None, tarfile=None):
        super().__init__(*pathsegments, blueprint=blueprint)
        if blueprint:
            self.tarfile = blueprint.tarfile
        elif tarfile:
            self.tarfile = tarfile
        else:
            raise TypeError("Must provide tarfile or blueprint.")

# with new method
class TarPath(pathlib.AbstractPath):
    def __init__(self, *pathsegments, tarfile):
        super().__init__(*pathsegments)
        self.tarfile = tarfile

    def __newpath__(self, *pathsegments):
        return type(self)(*pathsegments, tarfile=self.tarfile)
2 Likes

Of course. But my thinking is that if you want to construct a Path object in a new method, the natural place to look for docs on constructing Path objects is, well, the docs for the constructor. So you go to the API reference for the constructor and immediately see that there’s a blueprint argument. Whereas if it’s a completely different method, you might not find the docs for it if you’re looking for information on constructing Paths. (Though I suppose we could cross-reference the new method from the docs for the Path constructor.)

1 Like

To me, the key thing here is that (if I’m understanding correctly) the end user will never use the blueprint argument to the TarPath constructor. That says to me that TarPath shouldn’t have that argument, as the constructor is the user-visible method of creating objects, and documentation, IDEs, etc will all show the signature with an argument that’s irrelevant to the user.

So for me, that makes a separate method the clear winner (which is the conclusion I think you’d come to anyway, so I guess this is just me saying “I agree” :wink:)

4 Likes

I’d say “almost never” rather than “never”. If I had to write a function that accepted an instance of an unknown subclass of PurePath with unknown segments, and return an object with the same type and context but ‘README.txt’ as its segments, I’d need something like:

# with new argument
def fn(path):
    return type(path)('README.txt', blueprint=path)

# with new method
def fn(path):
    return path.__newpath__('README.txt')

… but this should be very rare. A user will normally call path.joinpath(), path.parent, etc, to manipulate the path’s existing segments, rather than throwing them away.

(Otherwise I agree with everything you wrote)

Another straw poll to see if we’re close to a decision.

These are implemented in two draft PRs, if you want to see the code:

Thanks!

  • New initialiser argument, e.g. blueprint
  • New method, e.g. __newpath__()

0 voters

(Note that the discussion on the PR adding a method made the name evolve from __newpath__ to _newpath_ to newpath or _newpath to the current name with_segments; read details there)

Thank you everyone for helping with this.

Assuming we go for a new method (which seems to be the direction of travel), here are my thoughts on possible names:

  • __newpath__():
    • +ve: Conveys to users that they shouldn’t usually need to call this method, but that they might want to override it in a subclass.
    • -ve: Would need to be called via type(self).__newpath__(self, ...) as dunder methods are always looked up on the type
    • -ve: Inappropriate: dunder methods are used to support protocols across a wide range of types, whereas this method is only relevant in pathlib.PurePath
    • Tentatively ruled out.
  • _newpath_():
    • +ve: Again, conveys to users that something special is going on, and that this isn’t just a way to create a path.
    • -ve: Inappropriate: sunder methods are used only when a user-defined attribute might conflict, e.g. in enum and ctypes.
    • Tentatively ruled out.
  • newpath() / makepath():
    • +ve: aligns closely with joinpath(), which has a very similar purpose and identical signature.
    • -ve: likely to be mistaken for a classmethod
    • -ve: likely to be discovered and misused by users googling “pathlib new path”.
    • -ve: reading path.newpath(...) aloud, it isn’t clear what role the path object plays.
    • Tentatively ruled out.
  • with_segments():
    • +ve: aligns closely with with_name(), with_suffix(), etc, which have similar purposes.
    • +ve: unlikely to be mistaken for a classmethod, as the “with” conjunction indicates that the path object is used, and not its type.
    • +ve: unlikely to be discovered and misused by users googling “pathlib new path”

Thoughts? Does anyone feel that with_segments() would be a mistake?

3 Likes

I’m still not a massive fan of with_segments :confused:

To me, the with_foo naming convention says “this is basically a copy of the same object, but with a few things tweaked/changed slightly”. But if you’re changing the segments of the path – that feels like the most fundamental aspect of the path object. It’s less “basically a copy of the same object”, more “an entirely new object that happens to share some incidental state from the previous object”.

3 Likes

I still like the idea of __newpath__. Some responses to the negatives you list above:

I know @merwok disagrees here, but I’m still not convinced that there is a hard-and-fast rule that dunder methods should always be looked up on the type. I can find many examples in the stdlib where this is not the practice. Are all of these examples really bugs?

I find this more persuasive, but there is prior art in the stdlib of “dunder protocols” that are specific to certain modules. __subclasshook__, for example, only makes sense to be defined on classes that subclass abc.ABC or use abc.ABCMeta as their metaclass.

2 Likes