I noticed that pathlib allows you to create path objects containing invalid characters for that os. For example:
In [1]: from pathlib import PureWindowsPath
In [2]: PureWindowsPath('<')
Out[2]: PureWindowsPath('<')
In [3]: PureWindowsPath('file.txt ')
Out[3]: PureWindowsPath('file.txt ')
Both of these paths should be invalid (according to this SO answer).
As this path can never be used, why is it allowed to be instantiated? (I would have demonstrated with a concrete WindowsPath but I’m on MacOS.)
I would instead have expected that:
PureWindowsPath and WindowsPath raised a ValueError at construction time if any component of the path contained a character that is always forbidden in paths on Windows systems.
PurePosixPath and PosixPath similarly raised a ValueError at construction time if any component of the path contained a character that is always forbidden in paths on posix systems.
The benefit of this would be that you could use the various Path constructors to validate that a user-supplied path string is not malformed. It’s also better to raise at construction time than later on, for example to avoid calling .exists() on something that you could have known earlier could never possibly exist.
Validating Windows paths is infeasible because Python versions aren’t tied to specific Windows versions. If a path becomes invalid in a future Windows version, we would still face the same issue of being unable to validate paths.
If anything, it would be on a new file system. Any existing files would be on existing file systems. Here’s Microsoft to say it, not me:
“”"
Any other character that the target file system does not allow.
“”"
It’s annoying to have differences in what file names are supported, since it makes archiving and unarchiving somewhat frustrating. But we already have that across platforms (for example, I can create a .ZIP file that has a file called What is this?.txt and there’s no issues, but Windows won’t allow that to be extracted under that name) and this would be no different.
I think this has come up before, but I’m failing to dig out the GitHub discussion. It’s not a bad idea
I guess my main objection is that forbidding reserved characters in pathlib would have a performance impact, and I suspect that most users wouldn’t be willing to take the hit. From a more philosophical angle, most path-related functionality in the standard library takes an EAFP approach, e.g. we catch and handle encoding errors rather than checking for unencodable paths ahead of time.
It’s worth noting that Python 3.13 adds a Windows-only os.path.isreserved() function.