I think there are three important functions here that users ought to know how to use together (pathlib might be a special case, will come to that later), and the one that needs the most clarification is normpath
.
(For context, I’ve been thinking about this issue a lot recently as we work through a number of related bugs, and I presume Barney is in the same place.)
The only breakdown of responsibilities that seems to make overall sense (disregarding back-compat) is:
abspath
knows how to retrieve the current working directory
realpath
knows how to resolve each segment to find the actual final path (with filesystem access)
normpath
knows how to collapse segments to produce a probable path that is easy to read
Right now, abspath
does an implicit normpath
, which is where the problem actually arises, since a norm-ed path isn’t necessarily the real path.
If we didn’t have that, abspath("../file")
might return C:\Users\me\../file
instead of C:\Users\file
. Meanwhile, join(os.getcwd(), "../file")
returns the former, and normpath(_)
returns the latter.
So I think the fundamental question is whether abspath
is about path calculation or path display. normpath
is clearly about displaying paths, and so implicitly is abspath
(right now), but we could change that by simply removing the normalisation.
When we bring in compatibility, however, it is probably less impactful to leave it as it is and clearly document that abspath
may produce incorrect results for the sake of readability, and join(os.getcwd(), ...)
is recommended for correctness in the presence of symlinks or other name aliasing.
pathlib
is a bit more interesting. You could argue that Path.__str__
implies “for display” and so it should be normalised, while Path.__fspath__
implies “for use” and so it shouldn’t. Compatibility-wise, I’m pretty sure Path
already collapses segments on creation though, and so things could get quite messy if we change that (e.g. iteration over Path.parents
will return the same directory multiple times). I’m not sure how best to handle it.
My final thought is that I’ve never seen anyone do this on purpose, and I suspect that any user asking for join("root/A", "../B")
actually wants to remove “A” - the equivalent of join(dirname("root/A"), "B")
- rather than to navigate to “A” and then go up one level. I’d love to understand better whether this is true, but path manipulation generally seems to be understood in terms of modifying the path and only once that’s done do we try to find what it refers to.
So are our path functions about manipulating path strings or are they about navigating the filesystem? And more importantly, what do our users currently think they do?