Have a `.realpath` classmethod in pathlib.Path

facundo · July 9, 2020, 7:38pm

The behaviour of doing

p = Path.realpath("somepath")

would be the same thing than

p = Path("somepath").expanduser().resolve()

Benefits:

it mimics the linux utility realpath
short and expressive
suitable to use in argparse (type=Path.realpath when creating an option) so the parsed argument is always a absolute and resolved and expanded path

Thanks!

pitrou · July 10, 2020, 8:18am

I find it interesting on the principle, but I’m not fond of the name realpath (if only because it conflicts with os.path.realpath which has slightly different semantics).

How about something such as Path.resolved or Path.expanded? (though both are also slightly misleading)

facundo · July 10, 2020, 11:09am

Thanks @pitrou!

os.path.realpath works like Path.resolve, yes it’s different. The difference is that it does not expands the ~.

However, we can argue that the command line utility realpath DO expand the ~, so even if Path.realpath would differ slightly from os.path.realpath, it would be a little like fixing an old bug (which we can not really do in os.path.realpath without compatibility issues).

I don’t like Path.resolved or Path.expanded because it would resemble just a shortcut of doing Path(...).resolve() or Path(...).expanduser().

Thanks again!

terrdavis · July 10, 2020, 4:42pm

What about:

resolveuser
exsolve
respand
expanduserandresolve

Or just call it realpath and add notes to the documentation in pathlib and os.path

Also, you’re proposing a novel way of calling Path methods, where self is implicitly treated as PathLike. I just checked and this doesn’t work for other Path methods. Has this been discussed as a way to broaden the API? If not I can start another discussion.

merwok · July 10, 2020, 7:12pm

It doesn’t! The shell does it.

facundo · July 10, 2020, 8:09pm

Oh, you’re right! So maybe realpath is not the best name for this we want.

facundo · July 10, 2020, 8:14pm

I’m not proposing any novel way of calling Path methods. I just want another classmethod, like the other Path has. A reference implementation could be:

    @classmethod
    def resolvuser(cls, path):
        return cls(path).expanduser().resolve()

facundo · July 14, 2020, 1:20am

So, as a kind of summary:

this new class method looks like a good idea
it’s a shortcut to create directly a Path from a string that is already expanded and resolved
it’s easy to implement
we’re not finding a good name for it

Am I right? If you concur, I’ll carry this to a more serious proposal.

Thanks!

ofek · July 14, 2020, 3:03am

Tangentially related… are we ever going to document this method? https://github.com/python/cpython/blob/b4cd77de05e5bbaa6a4be90f710b787e0790c36f/Lib/pathlib.py#L1151-L1167

uranusjr · July 14, 2020, 4:30am

I believe not, the devs (I forgot who) want users to reach for resolve() instead.

ofek · July 14, 2020, 1:54pm

That’s a bummer because resolve() is not the same as os.path.realpath:

>>> p1 = Path('foo.txt').resolve()
>>> p1
WindowsPath('foo.txt')
>>> p2 = Path('foo.txt').absolute()
>>> p2
WindowsPath('C:/Users/ofek/Desktop/foo.txt')
>>> os.chdir('..')
>>> p1.write_text('wrong location')
>>> p2.write_text('correct location')

I hit that so frequently that I always use a subclass:

class Path(pathlib.WindowsPath if os.name == 'nt' else pathlib.PosixPath):
    def resolve(self, strict=True) -> Path:
        try:
            return super().resolve(strict)
        except FileNotFoundError:
            return Path(os.path.abspath(self))

uranusjr · July 14, 2020, 2:12pm

This is due to a bug 🤷 https://bugs.python.org/issue38671

Edit: And that led me right back to the discussion I was thinking about when I said you should use resolve(): Pathlib absolute() vs. resolve()

pf_moore · July 14, 2020, 2:12pm

My understanding is that there are edge cases (maybe only in Windows?) where .absolute() doesn’t do the right thing. I’m not 100% sure there’s even a reasonable definition of what “the right thing” is (junctions, UNC paths, things like that were involved IIRC).

Rather than expose an API that would give wrong answers, the decision was taken not to expose it.

Sorry, that’s as much as I can recall. You’d need to go searching if you want the full background.

Edit: @uranusjr linked to the relevant bug/discussion. Which leads me back to what I thought, which is that “get the absolute version of this path” is not actually sufficiently well-defined in some edge cases. Whether “do the obvious thing where we can, and don’t worry too much about those edge cases” is the right answer, I’m less sure of. Windows Store Python seems to trigger a lot of odd edge cases with funny file types, so I’d want to check with someone who understands that use case before just assuming it won’t matter…

pitrou · July 15, 2020, 11:12pm

Path.absolute() will do the wrong thing, for example, if your path is /foo/../bar and /foo is really a symlink to /xyzzy/quux. The actual path would be /xyzzy/bar but Path.absolute() will return you /bar, which may point to a different file!

facundo · July 19, 2020, 2:24am

We probably should add a note in the method’s docstring. If you provide me the wording (you’re much into the details of this than me), I can do that.

pf_moore · July 19, 2020, 9:40am

Sorry, I don’t actually know all the details here.

@pitrou mentioned

See Issue 38671: pathlib.Path.resolve(strict=False) returns relative path on Windows if the entry does not exist - Python tracker and the associated PR for further discussion on that.

eryksun · July 28, 2020, 6:39pm

Path.absolute() is designed to not resolve ".." components, so it does the right thing in this case:

>>> pathlib.Path('/foo/../bar').absolute()
PosixPath('/foo/../bar')

In Windows, it usually doesn’t matter whether ".." components are resolved by Path.absolute, since the Windows API basically calls GetFullPathNameW on a file path before opening it. This normalizes the path as a string, according to DOS path rules, which includes naively resolving ".." components.

The only case where preserving a ".." component matters is in a relative symlink target. As in Unix, it gets resolved against the already parsed path. Thus if "foo" is a symlink, then a relative symlink that targets r"foo\..\bar" is completely different from one that targets just "bar".

Note that, in contrast to a symlink, the system doesn’t replace a mountpoint (aka junction) in the parsed path with its target while parsing a path. This is how Unix mountpoints behave, so it should come as no surprise. It’s especially important in UNC paths. An SMB server stops on a symlink and returns a symlink reparse error response to the client (i.e. the SMB redirector on the client side). So if the server is parsing a path such as r"\\?\UNC\server\share\mountpoint\symlink\spam", it will stop on "symlink" and return to the client the parsed path as r"\\?\UNC\server\share\mountpoint\symlink"; the unparsed path as r"\spam"; and the reparse data buffer of the symlink. It’s up to the client to actually evaluate the symlink (which may be denied by the system’s R2L or R2R policies for remote symlinks). Notice that parsed path doesn’t resolve "mountpoint" to its target path, such as "\\??\\Volume{GUID}\\symlink" or r"\??\E:\some\path\symlink". The mountpoint target on the server would be useless to the client.

sinoroc · July 28, 2020, 8:09pm

I also found myself in a situation where it wasn’t entirely clear what resolve() is supposed to do.

Would something like the following be meaningful, feasible?

Path.resolve(
    strict: bool = False,
    make_absolute: bool = True,
    resolve_symlinks: bool = True,
    expand_user: bool = False,
    expand_environment_variables: bool = False,
    # some more, maybe platform specific things
    # (Windows mount points, etc.)
)

Give the user the choice.

uranusjr · July 28, 2020, 9:54pm

The implementation does not have a choice what to follow and what is not. There is a platform API we can call, the choice is whether to use it or not, that’s it.

eryksun · July 29, 2020, 3:50pm

sinoroc:

Would something like the following be meaningful, feasible?

Path.resolve(
    strict: bool = False,
    make_absolute: bool = True,
    resolve_symlinks: bool = True,
    expand_user: bool = False,
    expand_environment_variables: bool = False,
    # some more, maybe platform specific things
    # (Windows mount points, etc.)
)

make_absolute exists already as the absolute method. In Windows, this method could be changed to normalize the path to resolve ".." components. In Unix, it has to retain them.

I’m not keen on expand_user in Windows. It’s never really the correct approach. Application configuration belongs either in the registry or in a subdirectory of ProgramData, LocalAppData, or AppData. Also, known folders such as “Documents” might be relocated from the default locations in the profile directory, either individually by the user or by group policy. I’d prefer to support known folders via SHGetKnownFolderPath, e.g. expanding “{Documents}\report.doc” to use the queried path for FOLDERID_Documents.

Regarding mountpoints, the way the underlying WinAPI GetFinalPathNameByHandleW call works is to get the filesystem path on the final device and optionally prefix it with the device path. The filesystem path can be either as opened, retaining any short names that were used, or normalized, with short names replaced by normal component names (e.g. “PROGRA~1” → “Program Files”). The optional device path can be returned as a DOS volume name (e.g. “\\?\E:”), GUID volume name (e.g. “\\?\Volume{GUID}”), or the native NT device path (e.g. “\Device\HarddiskVolume2”).

(Currently, nt._getfinalpathname requests the DOS device path concatenated with the normalized filesystem path. It does not support getting the filesystem path as opened; the filesystem path without the device path; the GUID volume name; or the native NT device path.)

Getting the DOS or GUID name starts with the native device path. For a local device, it queries the system mountpoint manager to get the canonical DOS and GUID device names and filesystem mountpoints. If the call requests a DOS name, then a DOS drive name is preferred, if one exists, and it will otherwise return the canonical folder mountpoint on some other drive (e.g. “\\?\C:\Mount\SpamDrive”).

For a redirected (UNC) path, it maps the native device path “\Device\Mup” (the Multiple UNC Provider) to the DOS device name “\\?\UNC”. A typical redirected mountpoint is “\\?\UNC\server\share”, where “server” is typically a remote system, but not necessarily.

Substitute drives and mapped drives (e.g. “Z:” → “\\server\share\spam\eggs”) are resolved out of the final path, since the first step in opening a drive path is to resolve the native device path, such as “\Device\Mup”. Reverse mapping the final path back to a mapped or substitute drive is not implemented. On the plus side, it makes sense to resolve such drives away, because normally they only exist for the user’s logon session. On the other hand, the persistent network connection associated with drive is no longer used, including possibly the credentials to access the server if they’re not saved in the user’s credential vault (that should be rare).

Folder mountpoints in a redirected path are not resolved to a canonical device path. For example, compare the following two cases. Say “C:\Mount\SpamDrive” is an alternate mountpoint for a volume that has a canonical DOS drive name “E:”. In this case, the final resolved path of “C:\Mount\SpamDrive” is “\\?\E:\”. But try it with “\\localhost\C$\Mount\SpamDrive”, which targets the same final volume device, and the result will instead be “\\?\UNC\localhost\C$\Mount\SpamDrive”. This behavior is intentional. A client is not expected to be able to use the target of a folder mountpoint such as “SpamDrive”. It’s a device path that’s local to the server.

OTOH, if “C:\Mount\SpamDrive” is a symlink to “\\?\E:\” instead of a mountpoint, and this symlink is accessed on a remote server as “\\server\C$\Mount\SpamDrive”, then the server will stop on the symlink and send the parsed path, remaining path, and symlink data back to the client. It’s up to the client-side redirector whether it’s going to evaluate the symlink target path as the local path “\\?\E:\”. The default remote-to-local (R2L) policy is to fail a request like this with ERROR_SYMLINK_CLASS_DISABLED (1463).