I’d like to tour the standard library’s existing support for file URIs and then make a proposal.
urllib.request module has longstanding support for parsing and generating file URIs with
url2pathname(). The implementation depends on the current platform. To always use Windows semantics, one can import the same functions from the undocumented
nturl2path module. On POSIX, one can call
unquote(). Bugs and discussion:
- #85168 - on POSIX, uses UTF-8 rather than local filesystem encoding
#90812 - on Windows, incorrectly produces/expects file URIs beginning
file:////(four slashes), which is incompatible with pathlib’s implementation.
- These functions expect you to add and remove the
file://prefix yourself. The Windows bug mentioned above misleads some folks into thinking they need to add/remove
- The Windows variant isn’t documented.
- The operations have much more to do with OS paths than URLs, so
urllibis arguably the wrong place for them.
pathlib.PurePath class provides an
as_uri() method. Again, the implementation depends on the current platform. The Windows and POSIX variants can be found in
PurePosixPath. Bugs and discussion:
- #91504 - there’s no way to convert a URI to a path
pathlibis 90% a high-level wrapper around
as_uri()method is one of a small handful of exceptions where
pathlibimplements low-level path manipulation logic itself. IMO
pathlibis arguably the wrong place for its implementation.
I propose we add two new functions to
os.path that parse and generate
file:// URIs. I haven’t found good names for them yet, so here are their working names:
os.path.fileuri()- returns a file URI from the given path.
os.path.fileuriparse()- returns a path from the given file URI.
Their implementations would live in
posixpath, like most other
We can then adjust the previously mentioned modules:
pathlib.PurePath.as_uri()- remove implementation, call through to
pathlib.PurePath.from_uri()- add this new classmethod, call through to
urllib.request- replace usages of
url2pathname()(and the entire module?).
I believe this would have the following benefits:
- Improve the experience for users who want to parse and generate
file://URIs, who usually end up on this SO post with 40k views or one of several others.
- Reduce the scope for bugs and incompatibilities in
pathlibby unifying their underlying file URI implementations
- Slightly simplify the
urllibcodebase, including letting us deprecate the
- Slightly simplify the
pathlibcodebase by more consistently delegating low-level tasks to
Thanks for reading. What do you think?