Make pathlib extensible

Most of the functions in the Windows file API first normalize a path into native NT form before making a system call such as NtCreateFile() or NtOpenFile(). Among other things, path normalization replaces forward slashes with backslashes. There are exceptions.

Of course, normalization is intentionally skipped for “\\?\” device paths. For example, r"\\?\C:\Windows/System32" is an invalid path because NTFS reserves forward slash as an invalid name character. Like all code in the the native NT API and system services, the NTFS filesystem only handles backslash as a path separator.

>>> os.stat(r'\\?\C:\Windows/System32')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: '\\\\?\\C:\\Windows/System32'

None of the Path* API functions, such as PathCchSkipRoot(), handle forward slash as a path separator.

When creating a relative symbolic link, CreateSymbolicLinkW() (i.e. os.symlink()) does not replace forward slashes with backslashes in the target path. This creates a broken symlink since paths in the kernel only use backslash as a path separator.

>>> os.mkdir('spam')
>>> open('spam\\eggs', 'w').close()
>>> os.symlink('spam/eggs', 'eggslink')
>>> os.stat('eggslink')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'eggslink'

There are several other functions in the Windows API that take file paths and fail to normalize forward slashes as backslashes, such as NeedCurrentDirectoryForExePathW().

It’s really a laundry list of exceptions to the rule. Better to just use the native path separator than to worry about what does and does not support forward slashes.

Also, when paths are parsed as command-line arguments, applications may fail to handle paths that use forward slashes. Notably, the CMD shell has this problem. For example:

>>> os.system('dir C:/Windows')
Parameter format not correct - "Windows".
3 Likes

What I’m saying is there normalization is not required.

The key thing is os.fspath() prevents you from accidentally calling str() on a non-path-like object like None.

The idea is that the string representation is like an encoding of a path just like some integer can be an encoding for a Unicode code point, and thus not something to directly think about if you’re using pathlib.

2 Likes

:sparkles: March 2023 progress report :sparkles:

Thank you to @AlexWaygood, @hauntsaninja and @steve.dower for reviewing and merging performance improvements to path construction. There’s one remaining PR to land on that issue, after which it can be resolved. I’ve logged an issue for optimizing PurePath.__fspath__() by returning an unnormalized path, and another for implementing os.path.splitroot() in C. I’m also looking at an issue with glob() performance.

Adding AbstractPath is a multi-year yak-shaving exercise, and with some of those performance improvements now in place, I can approach the yak with shears in hand:

That PR unifies and simplfies path construction, and opens the door to adding AbstractPath in short order. It’s something of a milestone for this project! I’m beginning to believe we could land AbstractPath in time for Python 3.13 :slight_smile:

Thanks as ever for reading, ta ra!

14 Likes

Oh, and lest we forget to mention, perhaps one of the most important updates is to congratulate @barneygale on his nomination to core developer (and pathlib maintainer) on the basis of his exceptionally diligent, thoughtful and tireless work on pathlib and beyond!

14 Likes

And now it’s official!

2 Likes

Congratulations, @barneygale . You really deserve it and it has been a pleasure working with you on pathlib so far!

1 Like