Leaning into that, what about with_path
?
I like that a lot more!
Would we continue to name the argument *pathsegments
? If so, I think I like this.
Sounds reasonable to me
You could shorten it to just segments
if you wanted, but this argument name has far more weight as documentation than anything functional so we should go with whatever is clearest.
Of course. But my thinking is that if you want to construct a
Path
object in a new method, the natural place to look for docs on constructingPath
objects is, well, the docs for the constructor. So you go to the API reference for the constructor and immediately see that there’s ablueprint
argument. Whereas if it’s a completely different method, you might not find the docs for it if you’re looking for information on constructingPath
s. (Though I suppose we could cross-reference the new method from the docs for thePath
constructor.)
I’ll rephrase what has been said, but: Yes. This is completely intentional.
I cannot see a use case where this would be the correct operation. In every use case I can think of, either you know the kind of path you’re dealing with, or you want joinpath
(possibly with an absolute path).
I’d love to be corrected (which is why I’m re-hashing the argument). But if I’m not, I would like to actively discourage users from using with_segments
or blueprint
(whatever the name is) if they’re not extending Path
(or implementing a specialized unforeseen use case, after having read all the docs).
Calling with_segments
or blueprint
can be a trap: you’re thinking you’re being a good citizen and writing useful generic code, but it’s not the operation you actually want to use.
or you want
joinpath
(possibly with an absolute path).
Do we have an attribute that reliably returns the “root” of the path?
I’m wondering whether it makes sense to treat the drive (or SMB share) on Windows as one of these transferable attributes, rather than strictly part of the path. By extension, if you think about URLs, which of username, password, host and port are “path” or “transferrable attributes”?
(I know about .anchor
, but haven’t thought through if it has all the right semantics for this. It may?)
In that case, p.new_root_attribute_or_maybe_anchor / new_path
does the right thing.
Importantly, it probably does the right thing when passed an incomplete new path, in that it will likely copy more of the “attributes”.
Using a hypothetical UriPath
(but I believe this would work today using Windows paths):[1]
> p = UriPath("https://example.com/dir/file.txt")
> p.anchor / "otherfile.txt"
https://example.com/otherfile.txt
> p.anchor / "//example.org/another.txt"
https://example.org/another.txt
> p.anchor / "//example.com"
https://example.com/
> p.anchor / "http://example.org/another.txt"
http://example.org/another.txt
[Later] Maybe an attribute called .template
that is itself a Path
instance of the same type with no segments? So p.template / new_path
gives you an instance of the same type with transferred attributes and new segments.
Hopefully the hypothetical isn’t too distracting. There’s a ton of work to figure out proper semantics for joining URI segments, just like we’ve done for handling paths on Windows. But I figured the latter would be less familiar for most readers, even though it’s already implemented. ↩︎
I’d love to be corrected (which is why I’m re-hashing the argument). But if I’m not, I would like to actively discourage users from using
with_segments
orblueprint
(whatever the name is) if they’re not extendingPath
(or implementing a specialized unforeseen use case, after having read all the docs).
Callingwith_segments
orblueprint
can be a trap: you’re thinking you’re being a good citizen and writing useful generic code, but it’s not the operation you actually want to use.
I do understand this point, and I agree — this is why I quite liked the idea of giving it a dunder or sunder name (e.g. __newpath__
or _newpath_
) if it does have to be a separate method. (But @merwok objects to both of those ideas.) And calling the parameter name _blueprint
rather than blueprint
might be better, if it ends up being a new argument to the default constructor.
It sounds like I just weigh this concern slightly less heavily than you. And that I’m maybe a little more concerned about the risk of user (and maintainer) confusion from having two separate public-API methods for constructing Path objects that have subtly different behaviour.
Do we have an attribute that reliably returns the “root” of the path?
AFAIK, there are different kinds of “root”, and everyone needs a different one. Especially when you get into exotic path variants.
IMO, joinpath
should be reasonably well-specified in enough cases. For URLs, it should do what you’d do when you find a (possibly relative) link on a Web page. For TarPath
, it should tell you where a (sym)link found in the archive would point to.
But generic “root” is much more vague. It seems to me that the behaviour you want is genuinely specific to your flavour of paths.
In a generic Web app, you probably want a separate “prefix” – root of the application – which can even include path segments. Inside a blog app hosted at https://myhost.example/blog
, the path /article/42
would mean https://myhost.example/blog/article/42
.
But generic “root” is much more vague. It seems to me that the behaviour you want is genuinely specific to your flavour of paths.
Right, which is why it’s a property of the path object, and not an independent algorithm.
>>> p = WebAppPath(root="https://myhost.example/blog") / "/article/42"
>>> p
https://myhost.example/blog/article/42
>>> p.anchor / "page/3"
https://myhost.example/blog/page/3
It just may require better definition of what anchor
means in general, rather than specifically for POSIX/Windows filesystems.
It’s pretty difficult to reconcile HTTP URLs and path objects, despite their superficial similarity. Many web apps don’t implement clean URLs and so operations like p / "page/3"
don’t work. Fragments and query parameters further complicate matters.
It might be better to consider a protocol like WebDAV or FTP. In these cases I think URIs could be supported via dedicated methods. So:
class FTPPath(pathlib.AbstractPath):
def __init__(*pathsegments, ftpobj):
super().__init__(*pathsegments)
self.ftpobj = ftpobj
def with_path(self, *pathsegments):
return type(self)(*pathsegments, ftpobj=self.ftpobj)
@classmethod
def from_uri(cls, uri):
... # parse 'ftp://' URI, create ftplib.FTP object, etc.
def as_uri(self):
...
IMO we should not include the scheme and hostname in drive
, root
or anchor
It’s pretty difficult to reconcile HTTP URLs and path objects, despite their superficial similarity. Many web apps don’t implement clean URLs and so operations like
p / "page/3"
don’t work. Fragments and query parameters further complicate matters.
“Clean URLs” as a concept is really a symptom rather than the problem here. I’d word this as: Fragments and query parameters make URLs awkwardly different from paths, and many web apps use them extensively.
Really, there’s only one part of a URL that is path-like, and that’s, well, the path. Given a base (“current”, if you like) URL of https://user:pass@some.site.example:12345/path/to/resource?spam#/app/subpage and a relative URL of, say, ../ham?foo
, the work of constructing a new URL is barely connected to the Pathlib tools.
I quite liked the idea of giving it a dunder or sunder name (e.g.
__newpath__
or_newpath_
) if it does have to be a separate method. (But @merwok objects to both of those ideas.)
I do object to dunder, as these are generally reserved to implement operators, and should be looked up on the class (the instances you found of instance lookups are mostly bugs IMO). For double single underscore, my thought is that they’re not needed, one leading underscore would be enough – but as a signal of «this is special, read the docs», double single would be fine if you chose it!
To me, the
with_foo
naming convention says “this is basically a copy of the same object, but with a few things tweaked/changed slightly”. But if you’re changing the segments of the path – that feels like the most fundamental aspect of the path object. It’s less “basically a copy of the same object”, more “an entirely new object that happens to share some incidental state from the previous object”.
I still feel there’s a distinction between a path object and its segments that’s worth maintaining, otherwise we end up using “path” to mean too many different things. Each path may have a name, stem and suffix, and hence we can replace those parts using with_name()
, with_stem()
and with_suffix()
. Each path is constructed from any number of path-like segments, and so using with_segments()
to replace the path’s segments makes sense to me. But using with_path()
to replace the path’s path makes my head hurt a bit.
Explicit is better than implicit, so, how about we make it explicit? Add a
context
orstate
parameter to the base class constructors, then pull thatcontext
orstate
off the instance and pass it in explicitly.
Constructor is usually not a part of interface requirements, implementations are free to choose any. It would be very inconvenient if we fix __init__
's signature and it’s difficult to do this in backward compatible way.
What should the method be named? @AlexWaygood suggests
newpath()
. I quite likemakepath()
.
Using path
(the name of class) seems redundant. Maybe derive()
or some synonym of it?
I’m still leaning towards with_segments()
. Would anyone like to stop me?
-
with_segments()
is fine - Stop you maniac!
0 voters
I wanna say “with_segments() is fine, you maniac!”
If replace
wasn’t already a synonym for rename
then I’d try to make you stop, but since we can’t have the obvious name, with_segments
is fine.
On Windows, replace()
is actually different from rename()
. Prior to Python 3.3, os.rename()
called WinAPI MoveFileW()
, which is equivalent to calling MoveFileExW()
with the flag MOVEFILE_COPY_ALLOWED
. This deviates from POSIX rename()
in two significant ways.
- It allows the system to copy a file to a different file system and then delete the source file, such that a new file is created that has a different file ID.
- It does not support replacing an existing file or empty directory.
In Python 3.3, os.replace()
was added, which calls MoveFileExW()
with the flag MOVEFILE_REPLACE_EXISTING
. This is closer to POSIX rename()
; however, it still does not support replacing an existing empty directory[1], as required by POSIX. Note that the flag MOVEFILE_COPY_ALLOWED
is not used by either os.replace()
or os.rename()
in 3.3+. This changed the behavior of os.rename()
, which was never documented.
I don’t understand why pathlib.Path.rename()
wasn’t modified to call os.replace()
when pathlib
was added to the standard library in Python 3.4. Since it was effectively a new module in the standard library, there was no concern for supporting older scripts on Windows that assume rename()
fails if the destination exists.
-
Starting with Windows 10, a POSIX rename is possible via NTAPI
NtSetInformationFile()
:FileRenameInformationEx
with the flagsFILE_RENAME_REPLACE_IF_EXISTS | FILE_RENAME_POSIX_SEMANTICS
– assuming the file system supportsFileRenameInformationEx
, which at least NTFS does. This request allows replacing an empty directory and also allows replacing an open file if it’s open withFILE_SHARE_DELETE
sharing. For some reason, WinAPIMoveFileExW()
has yet to be updated to support the new POSIX rename capability. ↩︎
I find all of prototype
, template
and blueprint
to be particularly cryptic. We’re talking about concrete filesystem paths, not some obscure abstraction inside a Java object-oriented framework.
The based_on
suggestion is quite good and explicit IMHO.