Suggestion for pathlib: differentiate explicit and implicit local paths (pathlib.StrictPath?)

This point of view doesn’t make sense to me. The file argument of POSIX execlp() is either a relative or absolute pathname. The difference compared to a pathname in an open/create context is only in how a relative pathname gets resolved in some cases. If the file argument of execlp() has a slash in it, then the system tries to resolve it normally, relative to the working directory. If it doesn’t have a slash in it, then instead of trying the working directory, the system tries the sequence of directories in the PATH environment variable (which may include “.” to also try the working directory). This is just an extension of how relative paths get resolved in the open/create context.

2 Likes

Relative vs. absolute is not interesting.

It is only the check for a / that is used to change algorithm.
No / then search for filename in PATH.
Yes / open path.

Barry, I said as much in my last post, and in previous posts. But I’m also keeping the situation on Windows in mind. In the case of Windows, the system only special cases searching for a relative pathname that begins with a “.” or “..” component, in which case the pathname is resolved against the working directory. Otherwise, it doesn’t matter whether the pathname contains a slash or backslash, i.e. the system will try to resolve “spam\eggs” against each directory in the given search path.

1 Like

Fair enough, but that “special effort” doesn’t exist in the standard library (or its cookbook) as of now.
I’d also argue that combing Path with argparse or similar tools is a very common pattern (example below), that helps a lot. Should we consider this as a special case too? (I don’t think so)

import argparse
import pathlib
parser = argparse.ArgumentParser()
parser.add_argument("executable", type=pathlib.Path)
args = parser.parse_args(["sl"])
assert isinstance(args.executable, pathlib.Path)  # this is neat!
assert parser.parse_args(["sl"]) == parser.parse_args(["./sl"])  # but this isn't

# under windows, this will always execute built-in echo, not ./echo
subprocess.run((parser.parse_args(["./echo"], "blahblah"))

This risk is also a very valid point. But maybe with how type hint support has been growing, it shouldn’t be that big of a problem? How do you think of the following example?
I know that type hint is always optional, but still want to know your opinion on this.

assert issubclass(Path, StrictPath)
def something(p: Path):
   print(f"doing something with {str(p)}")

local_p = StrictPath("./a")
# mypy and IDEs will report argument type mismatch
something(local_p)

What I don’t understand is why there’s a need to create a Path instance, when the original string works with subprocess.run already. What exactly is the use case for manipulating a path and then caring about whether it has ./ prefixed?

  1. subprocess.run accepts Path. To me, that itself is a good reason.
  2. Path has some nice funtions, like .stem, .suffix, etc.
    We can use os.path.splitext(os.path.split(string)[1])[0] which is quite a bit cumbersome - the very reason pathlib was merged into the stdlib. (btw, os.path preserves . since it’s just str)
  3. Maybe I just want to log before executing, but not manipulate?

The API for running a subprocess is operating system dependent (thanks Eric for the Windows logic).

To use these APIs requires that you are aware of how they work.

I do not see why Path needs to know about subprocess creation, which is what you are asking to be implemented.
That is a design domain error as I see it.

That subprocess.run can take a Path does not imply that Path needs to know about process creation semantics.

3 Likes

My opinion is that anything which relies on a typecheck failing is still insufficiently safe.

Also, the necessary “special effort” does exist. Pass the string value to something like shutil.which, or just check for an initial ./ as a string, before converting to a path.

2 Likes

Thank you all for your input.

I’m fine with @pf_moore 's answer, is it OK to open a PR to update the documentation?

i.e. add a line saying use only for paths, check using shutil.which if needed in the pages of pathlib, and subprocess

Related discussion: Pathlib: preserve trailing slash

An exchange I found super informative:

Pathlib’s handling of . segments, trailing slashes, etc, isn’t some accident or optional feature - it’s at the very core of what pathlib is and does.

And so from my perch in Pedantry Corner, I might argue that Python doen’t really have an object-oriented version of os.path, and existing code using os.path can’t be safely ported to pathlib without some care being taken to avoid data loss. Honestly I don’t feel good about this, and I’m looking at whether @departure3560’s earlier suggestion that Path subclasses a new StrictPath class has legs. It fits rather well with my plans for the pathlib ABCs, which should result in a very straightforward/obvious implementation. I’ll share a prototype when I have one.

3 Likes

My first instinct was that we don’t want multiple nearly-the-same-but-subtly-different classes. But thinking further, it struck me that I don’t actually know precisely how (a) “an object-oriented version of os.path” and (b) “a library to handle filesystem paths” would differ. You mention handling of . segments, trailing slashes, etc., but can you actually describe how (a) and (b) would handle . segments? Or how they would handle trailing slashes? To put it another way, what exactly are the intended differences between the two classes? If you can’t answer that clearly in a single sentence, I’m inclined to stick with my original thought that we don’t want multiple nearly-the-same-but-subtly-different classes. If you can, maybe it’s worth exploring the idea further.

The reason I ask is that if we have both classes, I think it’s important to be able to have an understanding of what to expect from (for example) trailing slash handling, without having to dig into the documentation, so that people can quickly decide which is the appropriate class for their usage. We don’t have this problem with pathlib at the moment, as it’s the only class-based “filesystem path” abstraction, so you use it (and accept its choices) or don’t.

On a related note, I have no intuition about which of (a) or (b) above is “stricter” than the other, so I’m a strong -1 on calling either of them StrictPath. The name should reflect the core intention of the class, and “strictness” isn’t the motivating idea behind either of the two concepts we’re talking about here (as far as I can see…) If you can’t think of good, expressive, names for the 2 classes, that once again suggests to me that the concepts aren’t distinct enough to be useful.

3 Likes

PurePath and Path normalise paths as described in the docs (“Spurious slashes and single dots are collapsed […]”), whereas the new class would not perform this normalisation; trailing slashes, empty segments and dot segments would be handled exactly as they are in os.path, i.e. as not ignored or normalized away. Perhaps ConservativePath might be a better name?

(edit: so os.fspath(ConservativePath(some_string)) == some_string, always)

Thanks for your feedback though, it’s very helpful. To be clear I don’t know that this is the right way forward, and you’ve given me plenty of things to weigh up. It’s going to be a while before I can attempt an implementation and see if it feels OK in practice, and sufficiently useful and distinct from Path.

OO version of os.path: classes to query and manipulate files on the filesystem
library to handle filesystem paths: classes to work with paths themselves (strings with special rules), not the files

… but that distinction collapses as soon as you have symlink resolution, stat, etc.

os.path is a mix of pure lexical manip (splitdrive(), join(), dirname(), etc) and system stuff (realpath(), expanduser(), ismount(), etc). At least that’s the way I look at it. pathlib is the same, but it splits it into PurePath and Path, and adds a few Path methods that call functions from os.

That feels like you’re describing the consequences (i.e., the behaviour that’s the result of the differing class purposes) but not the underlying concept.

For me, a class needs to abstract the ideas that constitute a path - the lexical representation should be basically irrelevant. So a path object might have a drive, a sequence of directory components, and a file (leaf) name. There’s no place in that abstraction for the idea of “spurious slashes” because there’s no concept of a “slash” in the first place[1].

That’s where I have a problem with the idea of “an object oriented version of os.path”, because I’m not clear what the “objects” involved would be - "foo/" and "foo" are (sometimes?) different objects, but what precisely are they? The first is (maybe) a directory object, and the latter a “path” (no constraint on whether it’s a path or a file) object. The object structure gets complicated, because the logic in os.path is complicated - full of heuristics and special rules that are crucial and important in the real world, but don’t map to a clean object model.

The above is fairly idealistic, but that’s essentially my point - when there are two models to choose from, they have to have clear and intuitive summaries, so that people can easily choose between them. Whereas with just pathlib as the “object oriented” model, its quirks are acceptable, because there’s no other alternative that needs to be clearly distinguished from it. And os.path isn’t “an alternative” in this context because the higher level question “do I want a class-based or function-based API?” makes that choice for you.

I guess what I’m saying is that I’m happy with pathlib as a class-based abstraction of filesystem paths, in spite of its quirks, because it’s practically useful, easy to understand at a high level, and right there in the stdlib when I need it. If we had two class-based filesystem APIs in the stdlib, that would fall apart because deciding which one to use would focus my attention on the quirks and edge cases, and distract me from getting on with my application code. It’s a very clear (IMO) example of why “there should be one obvious way” is a good principle to work from.


  1. VMS pathnames were something like volume:[dir.dir.dir]file.name. It’s easy to map that to an abstract path object like I describe, but things like “spurious slashes” and “dot segments” have no sensible analogs. ↩︎

2 Likes

Thanks Paul.

Strings are the common unit of exchange for paths, and so the string representation of a path object is fundamental. A path library that didn’t support conversion to/from strings would be pretty useless, right? And so for me, the model of that string (in pathlib, .drive / .root / ._tail) is important only insofar as it efficiently supports operations like .name and .parent.

Occasionally pathlib is unsuitable because of its quirks, and so one’s preference of class-based or function-based APIs doesn’t matter. Eryk shared a good example (executable search paths). Maybe that’s just something we should live with :slight_smile:

Parsing strings to path objects, and serialising path objects to strings are fundamental, yes. But that’s true of most types - serialising floats is just as crucial, but we don’t suggest that 12.000 and 12.000000 should be different values (we do for Decimal, but that’s precisely because the abstract form of a Decimal includes a precision and trailing zeroes let us encode that in the string representation).

Yes. Pathlib’s not perfect, because the real world isn’t perfect. Although the replies to Eryk’s example (one of which was mine) apply here - the strings ./cowsay and cowsay aren’t simply paths, precisely because a leading ./ is significant.

I’m just nitpicking, though - practicality beats purity and I agree that this is simply something we should live with.

On 6/01/24 9:33 am, Paul Moore via Discussions on Python.org wrote:> My
first instinct was that we don’t want multiple>
nearly-the-same-but-subtly-different classes.

I think your instinct is correct, regardless of what the actual
differences are. It sounds like a recipe for endless annoyances
to me. What if some of the things you want to do need Path Object
1 and others need Path Object 2? Or a library you’re using require
one kind and you’re using another? Do you end up forever having to
convert back and forth between them?

I’m pretty sure we Don’t Want To Go There.

1 Like

The reason for exposing the separate “exactly as in os.path” behaviour is precisely so that people can migrate old code that uses os.path and expect exact compatibility, right?

How about LegacyPath?

Alternately: does the class really need to reserve information about the trailing slashes etc. that were present in the original path string? If not, is this really a separate class, or just a separate creation pattern?

Thank you very much, that conversation was very informative. (and also quite similar to this one)

There should be one obvious way.
But jokes aside, I’m split between myself wanting a object-oriented os.path co-existing with the current pahtlib, or just leave it as it is for overall simplicity and less confusion.

I guess my reason (and maybe Barney’s) for making this post was the feeling that pathlib was a drop-in replacement with an OO interface for os.path - which it isn’t.

At the very least, I’ll try to expand on the phrasing in the documentation, to be a bit more obvious what assumptions pathlib makes that os.path doesn’t.

doc link

See also: For low-level path manipulation on strings, you can also use the os.path module.