Deprecation of pathlib.PurePath.is_reserved()

OrangeDog · May 13, 2024, 11:03am

I note that this is being deprecated in 3.13 for removal in 3.15, in favour of os.path.isreserved().

However, this removed a major benefit of the former, which is that you don’t have to be running on Windows in order to use it.

If you are e.g. receiving files on a Linux machine that you know may be accessed by Windows at some point (e.g. they’re being uploaded to a server and then downloaded by arbitrary clients), it would be useful to ensure that they have portable names.

I think it would be a mistake to make this change. It should remain available on all platforms.

MegaIng · May 13, 2024, 11:07am

Unless I am missing something, can’t you just import ntpath directly, even in posix systems? That allows you to be explicit about checking for windows compatibility.

steve.dower · May 13, 2024, 11:16am

You can indeed import ntpath directly on any platform, and it mostly behaves the same.

Our main concern with isreserved and its predecessor is that they are incorrect And due to Windows regularly being updated to be more compatible with Linux, they only become less correct over time.

Functions that produce incorrect results but look authoritative are worse than not having them at all (especially when someone figures out an “exploit” involving them and we need to run security releases in order to fix something that’s typically not a real issue… see the email module for examples…).

In this case, if you really want the approximation, it’s available explicitly and clearly spelled out to be an approximation. The only realistic alternative would be to raise a warning every time the old function was used, which is worse than deprecating and replacing it with something that’s technically less useful.

The trade-offs for adding and removing functions are not easy to make! We take it pretty seriously though.

OrangeDog · May 13, 2024, 11:32am

Perhaps the deprecation should point you at ntpath.isreserved directly? Just following the links I hadn’t realised you can do that.

steve.dower · May 13, 2024, 11:57am

I don’t recall the exact deprecation text, but it could certainly be updated to show that. Issues/PRs to GitHub - python/cpython: The Python programming language

ronaldoussoren · May 13, 2024, 12:18pm

I don’t have a use-case for the API myself, but why deprecate the pathlib interface for this and keep the os.path interface? I’d prefer to get to a point where I can ignore the existence of os.path for most code.

steve.dower · May 13, 2024, 12:24pm

You can Most code should never use this function.

The pathlib one is an attractive nuisance, that can only really be used to make code less reliable. The ntpath one is specialised and comes with enough caveats (including that we may change the behaviour based on Windows releases rather than Python releases).

You should still ask for forgiveness when creating a file fails - this API could never be used to avoid that. At best, after the file has failed, you might be able to use this API on the path to suggest to the user that it’s because of the name, but the pathlib one would’ve been wrong on any up to date version of Windows, so you’re just misguiding your own users.

Really, both APIs belong on a “do not use” list, but that would just result in people copying the logic and never updating it again, so we at least keep enough control that code in ten years time will be more accurate than if they copied the old logic and stuck with it.

ronaldoussoren · May 13, 2024, 12:31pm

That’s the bit I don’t understand. The pathlib method delegates to os.path.isreserved and hence is as reliable as that function.

Note that this is an API have never used and likely will never use, all programming on Windows I’ve done was on controlled systems where I didn’t have to check for reserved names in the first place.

OrangeDog · May 13, 2024, 12:32pm

You still seem to be assuming that the only use-cases are if the code is running on Windows, but my original issue was that that is not true.

Regardless, as long as it is still available, and still available on non-Windows platforms, I am happy.

steve.dower · May 13, 2024, 1:21pm

It does now, but before it didn’t, and the API definition (i.e. the docs) didn’t allow for us to update it. So it had the ~Win7 version of “reserved” even though that’s out of date. Unfortunately, deprecation is the only option we have for correcting a poorly defined API (I don’t necessarily agree with removal, but others feel very strongly about removing anything that’s been deprecated and I don’t have enough energy to argue that as well as arguing the things that actually matter).

Apologies if it seems that way, but I’m not. The code ultimately is only relevant for Windows, and the previous API was available everywhere but didn’t do anything unless you did some obscure opting in. Now the opting in is more obvious.

FWIW, I assume the only use cases are to make poor predictions, and you can do that from any platform Cross-platform is hard, but there are easier ways to get it right than to use look-before-you-leap style functions.

pitrou · May 13, 2024, 5:00pm

Hmm, but how does it work to “ask for forgiveness” when creating a file fails? If it fails, it fails. Creating another file with another name usually doesn’t solve the problem. Also, historically this becomes more embarassing if trying to append to NUL or LPT (not sure those still work, but hey).

More generally, there are situations where it’s desirable to explicitly detect and avoid reserved names. For example, you might be creating a ZIP file that you want to be able to unpack on other systems (a Python wheel for example? ). Or you might be writing something like a GUI for git and would like to display a warning to the user if they are trying to check in a file that could not be checked out on Windows.

I understand that the Windows rules for what is reserved change accross time. But there is still value in providing a conservative function that returns False if the input string may be a reserved file name on any supported version of Windows (if that’s at all possible, of course).

steve.dower · May 13, 2024, 6:00pm

Which is why the function is still there, but you have to work a little bit harder to use it, and by using it you basically acknowledge that it will still be wrong some of the time. The deprecated one did not include this in its definition, and as I mentioned earlier, unfortunately deprecation is our only mechanism for this kind of API change.

eryksun · May 14, 2024, 1:58am

ntpath.isreserved() errs on the side of caution, but the rules have been relaxed a bit in Windows 11:

DOS device names with a “.” or “:” extension are no longer reserved. For example: “con.txt” or “nul:txt”
DOS device names are no longer reserved in the final component of a relative path or a DOS drive path if either has two or more components^[1] – except for “nul”. For example: “.\con”^[2] and “C:\Temp\con” are not reserved, but “.\nul” and “C:\Temp\nul” are reserved and refer to “\\.\nul”

To me, the exception for “nul” is a frustrating inconsistency. The following rules still apply in Windows 11:

Several characters are reserved in filesystem names, including the 5 wildcard characters (*?<>"), the separators (“/”, “\”, and “:”), “|”, and ASCII control characters 0-31.
- “:” is the separator for file stream names of the form “filename:streamname[:streamtype]”. For example, if the filesystem supports streams, “spam:eggs” refers to a file named “spam” with a data stream named “eggs”.
Dots and spaces are stripped from the end of all paths^[1:1]. For example: “spam. . .” → “spam”
DOS device names are reserved in unqualified relative paths. For example: “con” or “nul”

I would have preferred an option in the application manifest to disable legacy DOS path rules – i.e. disable reserved device names and disable stripping trailing spaces and dots. In this case, only explicit device paths would be supported in the process, such as “\\.\nul”. Those are the two biggest gotchas for developers, since the system ends up operating on a different path from what was passed to it. The next biggest gotcha is file streams, but that’s a fundamental feature of NT filesystems, not a legacy DOS behavior.

DOS device names are not reserved in UNC paths, nor, obviously, in device paths prefixed by “\\.\” or “\\?\”. Dots and spaces are never stripped from the end of “\\?\” literal device paths. ↩︎ ↩︎
os.path.normpath() and pathlib.Path() both remove a leading “.” component. In this case, one can join the problematic name with the full path of the working directory, e.g. Path('.').absolute() / 'con'. A leading “.” component also matters to disambiguate streams in single-letter filenames versus drive-relative paths, such as “.\C:spam” vs “C:spam”. A leading “.” or “..” component also restricts a file search to be relative to the working directory, such as for CreateProcessW(NULL, L".\\bin\\spam.exe", ...), LoadLibraryExW(L".\\lib\\spam.dll", ...), or SearchPathW(NULL, L".\\include\\spam.h", ...). If the current directory isn’t explicitly referenced with a leading “.” or “..” component, Windows searches all directories in the given or default search path, which actually might not include the working directory. In contrast to POSIX, this applies even to a qualified path such as “bin\spam.exe”, “lib\spam.dll”, or “include\spam.h”. ↩︎

steve.dower · May 14, 2024, 1:33pm

I’m torn, because I kind of agree, but ultimately file paths are for communicating between processes more than within a process.^[1] If other apps can’t read your paths, or are going to be exploited by those paths, you’re better off not creating them at all.

nul is an annoying inconsistency - I assume there’s some important compatibility reasons behind it - but the rest of the issues are relative paths. There was an effort about 12 years ago to kind-of-deprecate relative paths^[2] to avoid unsynchronised global process state but also ambiguity between device names and relative paths. Unfortunately, that wasn’t so popular, so it’s largely been undone.

But still, your own app will get the most reliability by resolving paths itself at boundaries (i.e. make CLI filename/path arguments absolute straight away and only deal with qualified paths throughout the app). That doesn’t require any change to the OS, just to the devs own coding practices.

Which is why temp paths are usually randomly generated while input/output paths are user-provided. ↩︎
By forcing the application to resolve and the OS would only accept fully qualified paths. ↩︎

eryksun · May 14, 2024, 7:34pm

Using unqualified relative paths to access items in the working directory (just the directory, not the subtree) is the most secure practice assuming the API is implemented correctly, but that isn’t always the case. At the very least, use a final path from GetFinalPathNameByHandleW(), in which every directory in the path is guaranteed to be in the same filesystem^[1], and keep the handle for the directory open while working in it. With a handle for the directory open, the filesystem is contractually obligated to guarantee that no parent directory can be renamed or replaced (e.g. by a symlink or junction).

File operations on a relative path in the working directory should set a handle for the working directory set in the NTAPI RootDirectory field of the OBJECT_ATTRIBUTES record. CreatFileW() and CreateDirectoryW() get this right for the working directory. I think their lpSecurityAtributes parameter should have been extended with a RootDirectory field to generalize this to operate on any open directory^[2], akin to POSIX openat().

MoveFileExW() gets this wrong. If lpExistingFileName is a relative path, it gets resolved to a full path and opened. To get the full path, it just uses the path of the working directory that’s stored in the PEB, which is not a final path. Thus symlinks and junctions in the path can be renamed, replaced, or removed, and even the drive/share component could resolve differently (e.g. a user mapped drive could be remapped or removed). Also, for the rename operation itself, it doesn’t open the target directory and use the RootDirectory field of the FILE_RENAME_INFORMATION record. That’s not a big deal for a single rename operation. But if many files are being moved to a new directory, for guaranteed consistency MoveFileExW() should support passing a handle for the target directory. In general, both lpExistingFileName and lpNewFileName should support optional RootDirectory handles, akin to POSIX renameat(). As is, the onus is on the developer to open the source and target directories, keep them open while working, and resolve final paths for use with MoveFileExW(). Or a developer can just bypass the Windows API to directly use NtOpenFile() and NtSetInformationFile().

That’s guaranteed only for local paths. Junctions (mount points) in remote paths cannot be resolved to a final path generally because they are resolved on the server side and can target any directory in a filesystem that is local to the server. For example, there is no way for a client to get a final path for a directory on the server’s local volume “\Device\HarddiskVolume42”. Remote symlinks, on the other hand, are always resolved by the client. The server sends the opened path and target path of the symlink back to the client to reparse. If the client determines that the target path is allowed according to its R2L and R2R policies, it resolves and opens it. The resolved target could be a path that’s relative to the share, an absolute path on the server or another server (R2R), or an absolute path on the client (R2L). ↩︎
It still could be without breaking compatibility since the SECURITY_ATTRIBUTES record has the field nLength to indicate the size of the record. ↩︎

OrangeDog · May 14, 2024, 8:56pm

Again, none of that is useful for the simple case of needing to make files that other software can handle.

If my code never touches Windows I cannot check any of those things “correctly”. I also cannot make e.g. a zipfile with UNC paths in it.

All you can do is a string-based check against all the known filenames that the Win32 API cannot handle. That’s what this function used to do for us.

eryksun · May 14, 2024, 10:58pm

I don’t remember why posixpath.isreserved() wasn’t implemented in 3.13, e.g. as a function defined in genericpath that always returns False. As a counterexample, genericpath.isjunction() was implemented to always return False. Currently only Windows uses anything like junctions (i.e. a type of symbolic link with intentional limits on the target path and special reparsing rules, which implements volume mounts and bind mounts). If posixpath.isreserved() were defined, would PurePath.is_reserved() no longer be deprecated?

One can’t simply depend on handling failure for reserved paths (i.e. “ask for forgiveness when creating a file fails”). Scripts that need safe filenames must use a conservative check. Consider a naive archive expansion.

For a file named “spam:eggs”, what actually gets created in an NTFS filesystem is an “eggs” data stream in an empty file named “spam”, or even weirder, an “eggs” data stream in an existing directory named “spam”.
For a file named “nul”, whether or not the app uses an absolute path, it may end up opening “\\.\nul” and writing the data into the ether instead of actually creating a file named “nul”. Even in Windows 11, sidestepping this behavior for “nul” requires an explicit device path such as “\\.\C:\path\to\dest\nul” or “\\?\C:\path\to\dest\nul”.
For a file named “spam…”, an existing file named “spam” will be overwritten.

If filenames have to be cross-platform, the namespace rules have to conservatively conform to the lowest common denominator for all platforms.

csm10495 · May 15, 2024, 2:04am

I would say that the pathlib version should exist if there is an os.path version.

Even on Linux checking if a file is just a dot would be good.

steve.dower · May 15, 2024, 10:44am

isreserved didn’t have a meaningful implementation for Linux, while isjunction did, despite both of them always returning False. As Charles suggests, isreserved could have a legitimate implementation, and if someone writes, contributes and maintains it, then the function can be available everywhere.

This check needs to be done by the app that’s about to open the path. We have to assume that malicious actors will bypass checks when creating malicious files, because that’s the definition.

An app that will open invalid or insecure paths just because untrusted inputs told it to is inherently insecure.

Which is easiest to see and understand when your code that is checking it explicitly checks on all the platforms it supports. A single isreserved for all file systems of all time on all platforms isn’t correct or helpful here.

The pathlib version doesn’t help with cross-platform checking, which is being argued to be the interesting use case. You might as well ntpath.isreserved(pathlib_object) as PureWindowsPath(pathlib_object).is_reserved() and reduce your changes of getting an unrelated error.

Nineteendo · May 15, 2024, 6:10pm

Invalid paths:

Embedded null
Too long file name: "a" * 256
Too long path: "a/" * 512

Am I forgetting something?