I’d like to tour the standard library’s existing support for file URIs and then make a proposal.
urllib
The urllib.request
module has longstanding support for parsing and generating file URIs with pathname2url()
and url2pathname()
. The implementation depends on the current platform. To always use Windows semantics, one can import the same functions from the undocumented nturl2path
module. On POSIX, one can call urllib.parse.quote()
and unquote()
. Bugs and discussion:
- #85168 - on POSIX, uses UTF-8 rather than local filesystem encoding
-
#90812 - on Windows, incorrectly produces/expects file URIs beginning
file:////
(four slashes), which is incompatible with pathlib’s implementation. - These functions expect you to add and remove the
file://
prefix yourself. The Windows bug mentioned above misleads some folks into thinking they need to add/removefile:
(no slashes). - The Windows variant isn’t documented.
- The operations have much more to do with OS paths than URLs, so
urllib
is arguably the wrong place for them.
pathlib
The pathlib.PurePath
class provides an as_uri()
method. Again, the implementation depends on the current platform. The Windows and POSIX variants can be found in PureWindowsPath
and PurePosixPath
. Bugs and discussion:
- #91504 - there’s no way to convert a URI to a path
-
pathlib
is 90% a high-level wrapper aroundos
,ntpath
andposixpath
. Theas_uri()
method is one of a small handful of exceptions wherepathlib
implements low-level path manipulation logic itself. IMOpathlib
is arguably the wrong place for its implementation.
os.path
(proposal!)
I propose we add two new functions to os.path
that parse and generate file://
URIs. I haven’t found good names for them yet, so here are their working names:
-
os.path.fileuri()
- returns a file URI from the given path. -
os.path.fileuriparse()
- returns a path from the given file URI.
Their implementations would live in ntpath
and posixpath
, like most other os.path
functionality.
We can then adjust the previously mentioned modules:
-
pathlib.PurePath.as_uri()
- remove implementation, call through tofileuri()
-
pathlib.PurePath.from_uri()
- add this new classmethod, call through tofileuriparse()
-
urllib.request
- replace usages ofurl2pathname()
withfileuriparse()
-
urllib.request
- deprecatepathname2url()
andurl2pathname()
-
nturl2path
- deprecatepathname2url()
andurl2pathname()
(and the entire module?).
I believe this would have the following benefits:
- Improve the experience for users who want to parse and generate
file://
URIs, who usually end up on this SO post with 40k views or one of several others. - Reduce the scope for bugs and incompatibilities in
urllib
andpathlib
by unifying their underlying file URI implementations - Slightly simplify the
urllib
codebase, including letting us deprecate thenturl2path
module. - Slightly simplify the
pathlib
codebase by more consistently delegating low-level tasks toposixpath
andntpath
.
Thanks for reading. What do you think?