Suggestion for pathlib: differentiate explicit and implicit local paths (pathlib.StrictPath?)

departure3560 · August 8, 2023, 9:08am

The Issue

Currently, pathlib doesn’t differentiate between Path("") and Path("."), which I think are very different.
In a similar way, Path("a") and Path("./a") are the same data once instantiated.

This is due to PurePath not having the data about leading . being treated as a _parts (_tail for 3.12).

Real world examples

POSIX execlp & execvp → shutil.which & subprocess.Popen

Very good example by @eryksun:
under POSIX.1-2017, execlp() or execvp(), argument file is being checked if it contains a slash character.

The argument file is used to construct a pathname … If the file argument contains a character, the file argument shall be used as the pathname for this file. Otherwise, …

Eryk Sun:

API calls that search PATH (or some other search path), such as os.execlp(), and Python’s shutil.which() and subprocess.Popen. For example:
>>> shutil.which('./ls') is None
True
>>> shutil.which(pathlib.Path('./ls'))
'/usr/bin/ls'

Windows win32api.SearchPath

Eryk Sun:

On Windows, the search path is used even if a name contains a slash or backslash. For example, the default search path begins with the application directory (i.e. the directory of the process executable), and for a typical system Python installation, the application directory contains a “Scripts” directory that contains “pip.exe”. Let’s create “Scripts/pip.exe” in the current directory, and see what WinAPI SearchPathW() finds in a Python process:
>>> os.getcwd()
'C:\\Temp'
>>> open('Scripts/pip.exe', 'w').close()
>>> win32api.SearchPath(None, 'Scripts/pip.exe')
('C:\\Program Files\\Python311\\Scripts\\pip.exe', 35)
As you can see, it found “Scripts/pip.exe” in the application directory since that comes before the current directory in the default search path. Now let’s prefix the name with “./”:
>>> win32api.SearchPath(None, './Scripts/pip.exe')
('C:\\Temp\\Scripts\\pip.exe', 16)
In this case, we’ve told SearchPathW() to skip searching and just check in the current working directory. Note that WinAPI CreateProcessW() calls SearchPathW() with a custom search path, and LoadLibraryExW() calls a similar internal function. So this behavior is quite common on Windows, and probably very surprising to a POSIX developer.

The Suggestion

Since there’s another thread talking about making pathlib extensible, maybe we could:

Let BasePath (suggested in the other thread) properly handle leading .
To not break compatibility, mimic current behavior for Path
Create a new Path class (StrictPath), and strictly handle implicit local paths

# some ideas for StrictPath

# empty pathes should not be treated as cwd
assert bool(StrictPath("")) is False

# those should not be the same
assert StrictPath("") != StrictPath(".")
assert StrictPath("a") != StrictPath("./a")

# maybe handle like this internally?
assert StrictPath("./a/b")._parts == [".", "a", "b"]  # ._tail for python3.12+
# with this, we can differentiate "./executable" and "executable"

# how should we handle resolving an implicit local path?
assert str(StrictPath("./a").resolve()) == "/path/to/local/a"
assert str(StrictPath("a").resolve()) == "a"  # or raise an exception?

# maybe we should make `Path` a subclass of `StrictPath`?
assert issubclass(StrictPath, BasePath)
assert issubclass(Path, StrictPath)

pf_moore · August 8, 2023, 9:38am

I’m curious. What’s the use case?

departure3560 · August 8, 2023, 10:07am

My use case is to be more explicit and prevent implicit conversions, leading to unwanted path resolving.
I was quite confused at first when Path() returned Path('.').

Currently, the following code will run just fine without any notice:

from pathlib import Path
def print_full_path(p: Optional[str] = None):
    path = Path(p) if not p is None else Path()
    print str(path.resolve())

print_full_path()
# prints /path/to/current/directory
# shouldn't we raise an exception, or at least not resolve?

While this is an example of bad coding practice, I hope you get my point that it’s fairly easy to do this kind of mistake.

And after all - explicit is better than implicit, isn’t it?

Edit: a better example

In Unix&Linux shells, using a leading ./ to explicitly state “the current directory” is a very common practice. (e.g. ./executable_script will execute that script, but executable_script will not)
With the current implementation, we cannot differentiate between those two.

pf_moore · August 8, 2023, 10:12am

That seems perfectly correct to me - a function called print_full_path should always resolve. So I don’t see this as a mistake at all.

Also, you’re not proposing to change how this works. You’re proposing a new StrictPath class, and there’s no clear reason to think that whoever wrote print_full_path is any more likely to think about using StrictPath than they are to think about adding if p is None: print("The CWD") to the code. Both “fix” the function, but the explicit test:

Is more explicit about the designer’s intent.
Doesn’t need a change to Python.
Can be adjusted based on the user’s needs, it’s not restricted to a single built in behaviour.

departure3560 · August 8, 2023, 10:39am

Thanks for the reply.

Maybe it’s just me, but I feel that Path() should not be treated as Path(".").
At least, not in the standard library.

Ok, another example:
In Unix&Linux systems, using a leading ./ to explicitly state “the current directory” is a very common practice. (e.g. ./script will execute, but script will not)
It’s impossible to differentiate those 2 right now.

That’s up to that user. but at least there will be an option.

I personally feel that it’s disturbing enough for me to want to change the Python standard library. I understand that changing code in CPython is a big thing, so I wanted to discuss first.

kknechtel · August 8, 2023, 10:42am

How, and why? What practical problem do you hope to solve by treating them differently?

What do you mean by “implicit local paths”? It seems like you consider that the relative path a is somehow a less explicit way to say “the file or folder a in the current directory” than ./a is. But in this case, why would the same logic not equally apply to a/b? Should that become ./a/b instead? And then, if we have to put ./ at the front of every relative path, what use is it?

departure3560 · August 8, 2023, 10:46am

Thanks for the reply.
I think the first example I gave was really bad, I’ve updated that post.

eryksun · August 8, 2023, 11:33am

On POSIX, a path that contains a slash is always relative to the current working directory, so it’s redundant to use “./a/b” (but it’s not redundant on Windows). However, “./a” is not redundant. If a filename argument gets evaluated in a search context (e.g. the file argument of exceclp() or execvp()), then a purely relative path that contains no slash is resolved against each directory in the PATH environment variable until an accessible file is found. In this case, if a filename must be resolved relative to the current working directory (whatever it happens to be at the time the API function is called), then the name has to be prefixed by “./”. Currently pathlib is incapable of storing a path with a leading “.” component. You’d have to store such a filename specially in order to know that it needs to be prefixed by “./”.

On Windows, the search path is used even if a name contains a slash or backslash. For example, the default search path begins with the application directory (i.e. the directory of the process executable), and for a typical system Python installation, the application directory contains a “Scripts” directory that contains “pip.exe”. Let’s create “Scripts/pip.exe” in the current directory, and see what WinAPI SearchPathW() finds in a Python process:

>>> os.getcwd()
'C:\\Temp'
>>> open('Scripts/pip.exe', 'w').close()
>>> win32api.SearchPath(None, 'Scripts/pip.exe')
('C:\\Program Files\\Python311\\Scripts\\pip.exe', 35)

As you can see, it found “Scripts/pip.exe” in the application directory since that comes before the current directory in the default search path. Now let’s prefix the name with “./”:

>>> win32api.SearchPath(None, './Scripts/pip.exe')
('C:\\Temp\\Scripts\\pip.exe', 16)

In this case, we’ve told SearchPathW() to skip searching and just check in the current working directory. Note that WinAPI CreateProcessW() calls SearchPathW() with a custom search path, and LoadLibraryExW() calls a similar internal function. So this behavior is quite common on Windows, and probably very surprising to a POSIX developer.

encukou · August 8, 2023, 11:52am

My interpretation is that that the presence of a slash determines whether it’s treated as a path or just a command name. I don’t think of things like echo, python or cd as paths!

eryksun · August 8, 2023, 12:12pm

The file argument of execlp() or execvp() must be a file. It isn’t a shell command such as cd. If the name contains a slash, then it gets resolved relative to the current working directory. If it has no slash, then it gets resolved against the directories in PATH, sequentially until an accessible file is found.

The decision to always normalize away a leading “.” component in os.path.normpath() and pathlib.Path can cause problems, unnecessarily. On POSIX, a leading “.” component can be safely omitted if the path has more than one component. On Windows, a leading “.” component should never be omitted automatically.

Another case on Windows is the need to access a named stream in a single-letter filename in the current directory, such as a file named “c” that contains a stream named “spam”. Opening “c:spam” will be resolved against the working directory on drive “C:”. One has to use “./c:spam” in order to avoid the ambiguity. This case has already been fixed for pathlib.Path in Python 3.12, but it’s still broken by design for os.path.normpath().

barneygale · August 8, 2023, 12:13pm

It’s true that foo and ./foo mean different things to a shell, but pathlib doesn’t have any shell-specific behaviour at the moment.

The key method here is PurePath._format_parsed_parts(). It could be made to:

Ensure all paths include at least one separator (e.g. foo becomes ./foo), or
Ensure all relative paths start with . (e.g. foo/bar becomes ./foo/bar)

This could perhaps be controlled via an argument to the PurePath initialiser - I’m already looking at adding some sort of keep_trailing_sep keyword-only argument to address GH-65238.

Or we could make the method public and allow users to define a ShellPath subclass? idk.

kknechtel · August 8, 2023, 12:18pm

This makes it seem like it would be better not to treat “absolute path” and “relative path” as a dichotomy, but instead include a third option for names/sub-paths that will be searched in some other list of paths. “dislocated path”, perhaps?

(But then we are stuck with the original question: what does .resolve do in this case? Does it need the option to give it a path-list to search?)

eryksun · August 8, 2023, 12:22pm

Note that I steered my discussion away from the shell. I’m talking about API calls that search PATH (or some other search path), such as os.execlp(), and Python’s shutil.which() and subprocess.Popen. For example:

>>> shutil.which('./ls') is None
True
>>> shutil.which(pathlib.Path('./ls'))
'/usr/bin/ls'

barry-scott · August 8, 2023, 9:53pm

The shell takes the command (executable_script) and looking it up on the PATH
if it does not contain a /.

It will not have tried any path normalising before doing that check.

If your use case cares about this distinction you must also check the string before passing to Path()

departure3560 · August 9, 2023, 12:38am

@eryksun Thank you very much for the detailed explanation! I could never have explained better than you!

Looking at that issue, I think the fundamental problem isn’t keeping the trailing / or not, but rather stripping information once passed to pathlib.
Also, I noticed that you commented in that gh issue that .. and // are differentiated for a reason - and I fully agree. I just want to expand it with ..

(Random thoughts - ~~can we treat the underlying path data as separator split strings? e.g. "//a/./b/../c/" → ['', '', 'a', '.', 'b', '..', 'c', '']~~ keeping the original string and indexes to the separator might be better)

Sure, one could and should write its own class or string validator that handles cross-platform path handling code to check the string input - but isn’t that what pathlib is for?

barry-scott · August 9, 2023, 5:37am

What i am saying is that the semantics of this shell like behaviour is nothing to do with path resolving or normalising.

Path() is the wrong code to use.

If you want code that changes behaviour based on what the user types then you MUST not process that input with anything that destroys important information. Path() destroys, by design, important information in this use case. It is the wrong way to code this, do not do it.

As a result it is not necessary to change Path().

departure3560 · August 9, 2023, 5:55am

@eryksun made a very good explanation that this isn’t only in shells.

Please read his 2 posts:

eryksun · August 9, 2023, 7:39am

Do most developers think that the use of a pathname in a search context, as opposed to an open/create context, deserves to be supported by pathlib.Path? If so, then a leading “.” should be retained if the instance was explicitly created with it. For example:

>>> f = open('cowsay', 'w')
>>> f.write('#!/bin/sh\necho moo')
18
>>> f.close()
>>> os.chmod('cowsay', 0o500)

Execute the above “cowsay” script that was created in the working directory:

>>> p = subprocess.run(['./cowsay', 'whatever'])
moo

Execute the system “cowsay” script because pathlib.Path removes the leading “.” component:

>>> p = subprocess.run([pathlib.Path('./cowsay'), 'whatever'])
 __________
< whatever >
 ----------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

pf_moore · August 9, 2023, 8:20am

Personally I think that all non-specialised code should treat ./a and a the same. That is, I think that code which wants to treat them differently should have to make a special effort.

I’m not sure a path subclass is sufficiently special - it feels like it would be too easy to pass it into code that wasn’t expecting it.

encukou · August 9, 2023, 8:23am

Well, I wouldn’t call the first argument to execlp a path at all. The sys docs already avoid the term for *p variants (which do the $PATH lookup).
This argument can, depending on whether there is a slash, either be a path or just a simple name (filename only for exec*, alias/function in a shell).

Correspondingly, I’d argue that subprocess.run should skip $PATH lookup when it gets a pathlib.Path as first arg. But I guess it’s too late to change that. And it might not be the right thing to do on Windows.