Bikeshedding opportunity: help name a pathlib method

Leaning into that, what about with_path?

1 Like

I like that a lot more!

Would we continue to name the argument *pathsegments? If so, I think I like this.

1 Like

Sounds reasonable to me :slight_smile:

You could shorten it to just segments if you wanted, but this argument name has far more weight as documentation than anything functional so we should go with whatever is clearest.

2 Likes

I’ll rephrase what has been said, but: Yes. This is completely intentional.

I cannot see a use case where this would be the correct operation. In every use case I can think of, either you know the kind of path you’re dealing with, or you want joinpath (possibly with an absolute path).

I’d love to be corrected (which is why I’m re-hashing the argument). But if I’m not, I would like to actively discourage users from using with_segments or blueprint (whatever the name is) if they’re not extending Path (or implementing a specialized unforeseen use case, after having read all the docs).
Calling with_segments or blueprint can be a trap: you’re thinking you’re being a good citizen and writing useful generic code, but it’s not the operation you actually want to use.

3 Likes

Do we have an attribute that reliably returns the “root” of the path?

I’m wondering whether it makes sense to treat the drive (or SMB share) on Windows as one of these transferable attributes, rather than strictly part of the path. By extension, if you think about URLs, which of username, password, host and port are “path” or “transferrable attributes”?

(I know about .anchor, but haven’t thought through if it has all the right semantics for this. It may?)

In that case, p.new_root_attribute_or_maybe_anchor / new_path does the right thing.

Importantly, it probably does the right thing when passed an incomplete new path, in that it will likely copy more of the “attributes”.

Using a hypothetical UriPath (but I believe this would work today using Windows paths):[1]

> p = UriPath("https://example.com/dir/file.txt")
> p.anchor / "otherfile.txt"
https://example.com/otherfile.txt
> p.anchor / "//example.org/another.txt"
https://example.org/another.txt
> p.anchor / "//example.com"
https://example.com/
> p.anchor / "http://example.org/another.txt"
http://example.org/another.txt

[Later] Maybe an attribute called .template that is itself a Path instance of the same type with no segments? So p.template / new_path gives you an instance of the same type with transferred attributes and new segments.


  1. Hopefully the hypothetical isn’t too distracting. There’s a ton of work to figure out proper semantics for joining URI segments, just like we’ve done for handling paths on Windows. But I figured the latter would be less familiar for most readers, even though it’s already implemented. ↩︎

3 Likes

I do understand this point, and I agree — this is why I quite liked the idea of giving it a dunder or sunder name (e.g. __newpath__ or _newpath_ ) if it does have to be a separate method. (But @merwok objects to both of those ideas.) And calling the parameter name _blueprint rather than blueprint might be better, if it ends up being a new argument to the default constructor.

It sounds like I just weigh this concern slightly less heavily than you. And that I’m maybe a little more concerned about the risk of user (and maintainer) confusion from having two separate public-API methods for constructing Path objects that have subtly different behaviour.

AFAIK, there are different kinds of “root”, and everyone needs a different one. Especially when you get into exotic path variants.
IMO, joinpath should be reasonably well-specified in enough cases. For URLs, it should do what you’d do when you find a (possibly relative) link on a Web page. For TarPath, it should tell you where a (sym)link found in the archive would point to.

But generic “root” is much more vague. It seems to me that the behaviour you want is genuinely specific to your flavour of paths.

In a generic Web app, you probably want a separate “prefix” – root of the application – which can even include path segments. Inside a blog app hosted at https://myhost.example/blog, the path /article/42 would mean https://myhost.example/blog/article/42.

Right, which is why it’s a property of the path object, and not an independent algorithm.

>>> p = WebAppPath(root="https://myhost.example/blog") / "/article/42"
>>> p
https://myhost.example/blog/article/42
>>> p.anchor / "page/3"
https://myhost.example/blog/page/3

It just may require better definition of what anchor means in general, rather than specifically for POSIX/Windows filesystems.

2 Likes

It’s pretty difficult to reconcile HTTP URLs and path objects, despite their superficial similarity. Many web apps don’t implement clean URLs and so operations like p / "page/3" don’t work. Fragments and query parameters further complicate matters.

It might be better to consider a protocol like WebDAV or FTP. In these cases I think URIs could be supported via dedicated methods. So:

class FTPPath(pathlib.AbstractPath):
    def __init__(*pathsegments, ftpobj):
        super().__init__(*pathsegments)
        self.ftpobj = ftpobj

    def with_path(self, *pathsegments):
        return type(self)(*pathsegments, ftpobj=self.ftpobj)

    @classmethod
    def from_uri(cls, uri):
        ... # parse 'ftp://' URI, create ftplib.FTP object, etc.

    def as_uri(self):
        ...

IMO we should not include the scheme and hostname in drive, root or anchor

“Clean URLs” as a concept is really a symptom rather than the problem here. I’d word this as: Fragments and query parameters make URLs awkwardly different from paths, and many web apps use them extensively.

Really, there’s only one part of a URL that is path-like, and that’s, well, the path. Given a base (“current”, if you like) URL of https://user:pass@some.site.example:12345/path/to/resource?spam#/app/subpage and a relative URL of, say, ../ham?foo, the work of constructing a new URL is barely connected to the Pathlib tools.

4 Likes

I do object to dunder, as these are generally reserved to implement operators, and should be looked up on the class (the instances you found of instance lookups are mostly bugs IMO). For double single underscore, my thought is that they’re not needed, one leading underscore would be enough – but as a signal of «this is special, read the docs», double single would be fine if you chose it!

I still feel there’s a distinction between a path object and its segments that’s worth maintaining, otherwise we end up using “path” to mean too many different things. Each path may have a name, stem and suffix, and hence we can replace those parts using with_name(), with_stem() and with_suffix(). Each path is constructed from any number of path-like segments, and so using with_segments() to replace the path’s segments makes sense to me. But using with_path() to replace the path’s path makes my head hurt a bit.

2 Likes

Constructor is usually not a part of interface requirements, implementations are free to choose any. It would be very inconvenient if we fix __init__'s signature and it’s difficult to do this in backward compatible way.

Using path (the name of class) seems redundant. Maybe derive() or some synonym of it?

1 Like

I’m still leaning towards with_segments(). Would anyone like to stop me? :supervillain:

  • with_segments() is fine
  • Stop you maniac!

0 voters

I wanna say “with_segments() is fine, you maniac!” :slight_smile:

3 Likes

If replace wasn’t already a synonym for rename then I’d try to make you stop, but since we can’t have the obvious name, with_segments is fine.

2 Likes

On Windows, replace() is actually different from rename(). Prior to Python 3.3, os.rename() called WinAPI MoveFileW(), which is equivalent to calling MoveFileExW() with the flag MOVEFILE_COPY_ALLOWED. This deviates from POSIX rename() in two significant ways.

  • It allows the system to copy a file to a different file system and then delete the source file, such that a new file is created that has a different file ID.
  • It does not support replacing an existing file or empty directory.

In Python 3.3, os.replace() was added, which calls MoveFileExW() with the flag MOVEFILE_REPLACE_EXISTING. This is closer to POSIX rename(); however, it still does not support replacing an existing empty directory[1], as required by POSIX. Note that the flag MOVEFILE_COPY_ALLOWED is not used by either os.replace() or os.rename() in 3.3+. This changed the behavior of os.rename(), which was never documented.

I don’t understand why pathlib.Path.rename() wasn’t modified to call os.replace() when pathlib was added to the standard library in Python 3.4. Since it was effectively a new module in the standard library, there was no concern for supporting older scripts on Windows that assume rename() fails if the destination exists.


  1. Starting with Windows 10, a POSIX rename is possible via NTAPI NtSetInformationFile(): FileRenameInformationEx with the flags FILE_RENAME_REPLACE_IF_EXISTS | FILE_RENAME_POSIX_SEMANTICS – assuming the file system supports FileRenameInformationEx, which at least NTFS does. This request allows replacing an empty directory and also allows replacing an open file if it’s open with FILE_SHARE_DELETE sharing. For some reason, WinAPI MoveFileExW() has yet to be updated to support the new POSIX rename capability. ↩︎

1 Like

I find all of prototype, template and blueprint to be particularly cryptic. We’re talking about concrete filesystem paths, not some obscure abstraction inside a Java object-oriented framework.

The based_on suggestion is quite good and explicit IMHO.