Pathlib: preserve trailing slash

Thank you!

FWIW, pathlib considers PosixPath('/etc'), PosixPath('//etc'), and PosixPath('/../etc') to be distinct paths, so there’s already a bit of room for error there. Still, no excuse for me to further complicate things :smiley:

Would there be an argument for:

  • not changing Path itself at all (so it always strips trailing slashes as current, and doesn’t allow this to be changed by some optional parameter, etc.), but instead
  • changing methods that take Path instances (whether free functions in os, or methods on Path instances that access the filesystem) to provide a require_dir keyword-only-parameter (default False) that causes the method to behave as if the provided Path instance had a trailing slash?

The exact name of the parameter could vary depending on the function (to correctly reflect the semantics of a trailing slash in that particular function), it would only be added for functions that actually had different semantics for this case, and some functions might take two such parameters (i.e. if the function takes two Path instances it might need a separate require_dir parameter for each).

There’s already a well-established, intuitive convention for asserting that a path is a directory at access: include a trailing slash. This is supported by OSs and by some bits of Python itself, e.g. in glob.glob(). Adding a novel way to specify the same thing would be rather confusing surely?

3 Likes

I find an explicit optional argument much less confusing that an implicit convention based on the presence of a single character at the end of the path (a character that is otherwise ignored). It’s also much less error-prone, because the single character might well be omitted and the code would still appear to work “correctly”.

2 Likes

The trailing slash IS explicit though. It is a character that says “this should be interpeted as a directory, not a file”. There is an actual character there. It’s the same as how, in Python source code, the literal 12345. is very different from the literal 12345 due to the presence of the explicit marker character (in this case, a decimal point).

2 Likes

For me it’s not just explicit but also highly intuitive. Years before I learned any programming language, I’d learned that:

  1. Files are organised using directories (I’m a bit too young to remember non-hierarchical filesystems)
  2. To address a file, I need to specify the intermediate directories, separated by slashes, like C:\windows\notepad.exe

From this, I think many people (not just computer programmers) would realise that, for any path, everything before the final slash must be some kind of directory or container for other files/containers.

(It’s all a bit academic though; we can’t change pathlib trailing slash elision, and I don’t want a proliferation of new method arguments either, I don’t think)

2 Likes

The only case where it is explicit is when the destination path is a literal. Which is not a very interesting case because, often, the destination path will then be a well-known directory.

In the more interesting case of a computed path, the trailing slash is not explicit in the source code. Compare:

os.rename(src, dst_dir, require_dir=True)

and:

os.rename(src, dst_dir)  # relying on previously computed `dir` to end with a slash

Of course you could write:

assert dst_dir.endswidth('/'), \
    "expecting a trailing slash so that os.rename errors out if not a directory"
os.rename(src, dst_dir)

but that’s, well, quite more annoying to type and I doubt anyone does this?

1 Like

Well, at some point you have to admit that “intuitive” is wildly subjective and relying on personal intuitions does not make for very easy-to-learn APIs.

Consider that there are people like me for whom this “intuition” is entirely alien.

Intuition is about building upon things one has already learned, and I stated the two facts upon which my intuition is built. My post was mostly trying to push back on your characterisation of trailing slashes as “arcane knowledge” or the province soley of “POSIX aficionados” - that’s simply not true.

6 Likes

It’s combining two completely independent bits of information together in the same argument, though. By your explanation, “foo/” is a path with the name “foo”, which should be interpreted as a directory, whereas “foo” is a path with the name “foo”, with no such constraint. Given that,

  1. Is equality of paths intended to model “is this the same path” or “is this the same path with the same interpretation”? The former suggests Path("foo/") == Path("foo"), whereas the latter suggests Path("foo/") != Path("foo").
  2. Does str() of a path give the path name? If so, str(Path("foo/")) == "foo". If not, how would you describe what str() means when applied to an arbitrary path object?

It’s very like the “separator vs terminator” question for semicolons - does the slash separate the names of the path components, or does it terminate a directory name? I see it as the former.

Also, if we have a trailing slash as “interpret as a directory”, why is there no way to express the converse - “interpret as a file that is not a directory”.

For me, trailing slash interpretation feels too close to a “do what I mean” behaviour. And given that enforcing it would be incompatible with current behaviour, I don’t think it’s a clear enough win to justify the breakage that would cause.

I don’t know about Antoine, but I will concede this as long as you are willing to concede that there was a bit of sleight of hand[1] in going from elements separated by slashes, to a terminating slash having a specific meaning. Separators and terminators are notorious for being different in subtle and sometimes confusing ways.

Trailing slashes are still very much something that people will find intuitive or confusing for personal and difficult-to-express reasons, though.


  1. which I’m not suggesting was deliberate misdirection! ↩︎

3 Likes

I’m in complete agreement with you here - specifically the change of how equality would work, and the str/fspath representation - would break too much. Folks understand directory separators differently anyway as is clear from this thread. However, that should not be taken as license to create some new convention for expressing the same idea, like adding ensure_dir arguments to relevant functions, at least in my opinion. The new convention would need to be substantially better to be warranted, and I suspect that’s an impossible bar. I don’t know. I might duck out at this point :slight_smile:

3 Likes

I would say that the trailing slash is inherently part of the path, so they would be unequal, and the str() would include the trailing slash. It’s not “two things in one”, it’s one thing that gives multiple pieces of information.

You’re right, there’s no easy way to say “this should be interpreted as a file”. But I’m not sure when I would even use that. What contexts would that be relevant to?

If it were purely a separator, Path("/a/b/c") should be equivalent to Path("a/b/c"), which clearly isn’t the case. A leading slash is important. So is a trailing slash.

Paths are for people to use, and all people are different. I get that. My idea of intuitive is based on my specific experiences, yours is based on yours, and we WILL end up finding some things easy and some things hard, based on that. We have an opportunity here to create something for Python that can be used by everyone. If we go with the trailing slash convention, everyone who has used that convention elsewhere will immediately understand it, and those who haven’t will have to learn it. If we go with a new convention, created especially for pathlib, EVERYONE will have to learn it. I suppose, in a kindergartener’s way, that would be “fair”; but I don’t see it as a benefit.

1 Like

Also, random note. The very next thread in my reading list is this one: ·stable diffusion keeps pointing to the wrong version of python and it seems that a trailing backslash caused Windows to interpret something as a directory instead of a file, so this doesn’t seem to be Unix-specific or anything.

2 Likes

Was that in the app code of stable diffusion or OS sys call interpreting?

In the case of the xcopy command you have to use the /I ir /-I` to force destination is or is not a directory.

1 Like

No idea, I don’t have a Windows to test with. But this was a batch file, so I would guess it’s an OS syscall.

The obvious (Monty) Python thing to do here is offend everyone on multiple levels by forgetting the trailing slash entirely and instead having directories end with a Johannine Comma.

3 Likes