Make pathlib extensible

barneygale · January 5, 2025, 6:44pm

Here’s a table showing the proposed methods of the pathlib.abc classes (from the previous post):

	JoinablePath	ReadablePath	WritablePath
Abstract methods	`__str__()` `with_segments()` `parser`	`__open_rb__()` `iterdir()` `readlink()` `info`	`__open_wb__()` `mkdir()` `symlink_to()`
Non-abstract methods	`__truediv__()` `__rtruediv__()` `joinpath()` `anchor` `parts` `parent` `parents` `name` `suffix` `suffixes` `stem` `with_name()` `with_stem()` `with_suffix()` `match()` `full_match()`	`read_bytes()` `read_text()` `exists()` `is_dir()` `is_file()` `is_symlink()` `glob()` `rglob()` `walk()` `copy()` `copy_into()`	`write_bytes()` `write_text()`

Feedback welcome

barneygale · January 11, 2025, 8:46pm

Here’s a way to replace the double-duty ReadablePath.open() method without resorting to dunder methods: add a stream argument to ReadablePath.read_bytes() and WritablePath.write_bytes(). When stream=True, these methods would return a readable or writable file object in binary mode. In the ABCs they’d be abstract. There’s analogous behaviour in libraries like requests, and it might appeal to users who feel daunted by the values accepted by open(mode=...).

poehlmann · January 12, 2025, 9:19pm

What about instead of adding the openable protocol, or the stream mode, provide a decorator in pathlib, that takes a function of type Callable[[JoinablePath], IO[bytes]] and returns a valid ReadablePath.open(...) or WritablePath.open(...) method.

Which basically provides the same support for users implementing new pathlib subclasses without the new __open_rb__, __open_wb__ dunders.

barneygale · January 12, 2025, 10:35pm

Please could you sketch some code? I don’t quite follow.

poehlmann · January 14, 2025, 11:21pm

I initially thought it would be convenient to implement via a single decorator with an argument to chose between read and write, but sketching it out, I realised that the user facing part should look more like the property builtin.
In the end it would just be your openable protocol squeezed into a different usage pattern for no reason other than avoiding new dunders, which likely is not a good a reason anyways.

My suggestion would have looked like something along these lines:

# open_signature_helper would return a callable with signature
# def __call__(self, mode="r", buffering=-1, encoding=None, errors=None, newline=None): ... 
# and internally would call __open_rb__ or __open_wb__ according to mode, 
# wrap IO[bytes] correctly and implement correct behavior for the other options

class MyPathRO(ReadablePath):
    ...

    @open_signature_helper
    def open(self):
        # == __open_rb__
        return ...  # type: IO[bytes]  readable


class MyPathRW(WritablePath):
    ...

    @open_signature_helper
    def open(self):
        # == __open_rb__
        return ...  # type: IO[bytes]  readable

    @open.writer
    def _(self):
        # == __open_wb__
        return ...  # type: IO[bytes]  writable

barneygale · January 19, 2025, 3:32am

It’s good to have someone else think through the problem all the same thanks

Having played with some options, I’m going to use the __open_rb__() / __open_wb__() approach, at least for now. It can be done without impacting pathlib.Path (which is nice) and it could always be revised later if someone hates it. Patch here:

github.com/python/cpython

GH-128520: Make `pathlib._abc.WritablePath` a sibling of `ReadablePath`

python:main ← barneygale:gh-128520-fix-inheritance

opened 02:06AM - 19 Jan 25 UTC

barneygale

+178 -115

In the private pathlib ABCs, support write-only virtual filesystems by making `W…ritablePath` inherit directly from `JoinablePath`, rather than subclassing `ReadablePath`. There are two complications: - `ReadablePath.open()` applies to both reading and writing - `ReadablePath.copy` is secretly an object that supports the *read* side of copying, whereas `WritablePath.copy` is a different kind of object supporting the *write* side We untangle these as follow: - A new `pathlib._abc.magic_open()` function replaces the `open()` method, which is dropped from the ABCs but remains in `pathlib.Path`. The function works like `io.open()`, but additionally accepts objects with `__open_rb__()` or `__open_wb__()` methods as appropriate for the mode. These new dunders are made abstract methods of `ReadablePath` and `WritablePath` respectively. If the pathlib ABCs are made public, we could consider blessing an "openable" protocol and supporting it in `io.open()`, removing the need for `pathlib._abc.magic_open()`. - `ReadablePath.copy` becomes a true method, whereas `WritablePath.copy` is deleted. A new `ReadablePath._copy_reader` property provides a `CopyReader` object, and similarly `WritablePath._copy_writer` is a `CopyWriter` object. Once GH-125413 is resolved, we'll be able to move the `CopyReader` functionality into `ReadablePath.info` and eliminate `ReadablePath._copy_reader`. * Issue: gh-128520

barneygale · January 22, 2025, 10:45pm

pathlib._abc surgery is now complete, so we have this class hierarchy:

The filled and dashed arrows currently represent the same thing: a standard super/subclass relationship. For performance reasons I’m hoping to ABCMeta.register() the pathlib classes as “virtual” subclasses of the pathlib._abc classes, and so the dashed lines represent these planned registrations. I expect it will be a few weeks until I can make that change.

carlcsaposs-canonical · February 11, 2025, 3:41pm

Hi @barneygale!

Thank you for your contributions to pathlib!

If you have time (I completely understand if you don’t), we have a question about pathlib-abc and creating a pathlib.PurePosixPath-like class that is not os.PathLike: Question: recommended approach for `PurePosixPath`-like class that is not `os.PathLike` · Issue #29 · barneygale/pathlib-abc · GitHub

I wasn’t sure if a GitHub issue or this thread would be a better place to ask—please let me know if I should move the contents of that GitHub issue to this thread

Thank you!

barneygale · February 16, 2025, 12:13am

Hi Carl! Thanks for getting in touch - I’ve left a reply on the issue.

barneygale · February 16, 2025, 12:14am

If anyone’s interested, I’ve begun writing a (pre-)PEP for making pathlib extensible

Lucas_Malor · March 2, 2025, 6:12pm

A question: Why there’s a separate JoinablePath and ReadablePath? Are there cases in which paths can be joined but not read?

encukou · March 3, 2025, 9:08am

That’s the same distinction as between the existing PurePath and Path, see the docs.
If a function takes a JoinablePath, I know that the file named by that path doesn’t need to actually exist.
The Readable/Writable split goes further: if something takes a ReadablePath, I’ll expect it to work with an immuable backup directory or a shared container layer.

Lucas_Malor · March 3, 2025, 7:26pm

Well, but it exists also PurePath in pathlib._abc:

What’s the difference between JoinablePath and PurePath?

barneygale · March 3, 2025, 7:33pm

Several differences:

JoinablePath is an ABC that you can’t instantiate directly. To use it, you need to subclass it and provide at least parser, with_segments() and __str__() attributes (at time of writing)
JoinablePath provides only part of the PurePath API (e.g. it doesn’t include __fspath__(), __eq__() or as_posix() )
JoinablePath doesn’t implement path normalisation (though subclasses might)
JoinablePath.parser may be any implementation of PathParser, but PurePath.parser may only be posixpath or ntpath.

barneygale · March 4, 2025, 8:23pm

I figured it might be interesting to revisit the questions in my original post from ~5 years ago and show how we answered them:

In Python 3.11 we removed the “accessor” classes altogether, as they were a vestige of early pathlib development that had no present purpose. Instead we’ve made various methods of _ReadablePath and _WritablePath abstract, like iterdir() and mkdir().

In Python 3.12 we added pathlib.PurePath.with_segments(), which is called whenever a new path object is created from an existing one (e.g. path.parent, path.iterdir()). User subclasses of _JoinablePath should implement this abstract method, which allows them to pass instance data to the new path’s initialiser.

In Python 3.14 we added pathlib.types.PathInfo as a high-level protocol for path metadata. User subclasses of _ReadablePath should expose an info attribute that implements the protocol.

In paths returned from pathlib.Path.iterdir(), the info attribute wraps an os.DirEntry object that’s initialised with information about the path gleaned from scanning its parent. But we haven’t changed os.DirEntry at all, nor do we directly expose os.DirEntry objects in pathlib.

In Python 3.14 we added pathlib.Path.copy() to support copying between paths.

In the pathlib ABCs, this method looks like _ReadablePath.copy(self, target: _WritablePath) - it supports copying between arbitrary readable and writable path objects, including preserving entire directory structures. Furthermore, each target path is given an opportunity to copy metadata from its source (specifically its info object.). Therefore the copy() method can be used to upload and download, to archive and extract, etc, depending on its operands.

Still not conclusively answered! A PEP will sort this out

Dobatymo · March 19, 2025, 6:32am

I am currently implementing a library following the interface of pathlib. I noticed that cwd() is a classmethod. This works for local file system paths, where this information is a process global, but is not great for other path implementations which need access to instance information like a connection object.

barneygale · March 22, 2025, 5:32pm

FWIW, I’ve omitted the alternative constructors (cwd(), home(), from_uri()) from the ABCs for this reason

barneygale · March 22, 2025, 5:44pm

If anyone is looking to contribute to the pathlib ABCs, there’s an open ticket here about adding complete type annotations in pathlib.types:

github.com/python/cpython

Type hints for `pathlib.types`

opened 06:11PM - 03 Mar 25 UTC

barneygale

type-feature stdlib topic-typing topic-pathlib

# Feature or enhancement The `pathlib.types` module is new in 3.14, and contain…s a single public class: [`pathlib.types.PathInfo`](https://docs.python.org/3.14/library/pathlib.html#pathlib.types.PathInfo). This module also contains a few private classes: `_PathParser`, `_JoinablePath`, `_ReadablePath` and `_WritablePath`. As the `pathlib.types` module is **not** imported by `pathlib`, I think we're free to add proper type annotations to the entire module, including the private classes. I think this will help clarify the interface. I'd like these hints to be compatible with the oldest version of Python still receiving security updates (3.9 at time of writing) because I'm hoping to provide a PyPI package from this module.

I’m no typing expert so I’d appreciate any help. It might also be a good opportunity to review the API and highlight rough edges - very happy to hear feedback! Thanks.

barneygale · March 25, 2025, 7:51pm

March 2025 update

In November’s update I wrote about my plan to split up PathBase and prune its interface. That work is now pretty much complete! I’ve published pathlib-abc 0.4 with the revised interface. Docs here:

I’m now working on a PR for zipp that adds a dependency on pathlib-abc:

github.com/jaraco/zipp

Add dependency on `pathlib-abc`

main ← barneygale:pathlib-abc

opened 05:24PM - 24 Mar 25 UTC

barneygale

+121 -376

Make `zipp.Path` subclass `pathlib_abc.ReadablePath`. This allows us to remove i…mplementations of `read_text()`, `read_bytes()`, `glob()`, `joinpath()` and `__truediv__()`, and to simplify implementations of a couple of few more methods. Maintain a tree of `PathInfo` objects representing the hierarchy of zip file members. We traverse the tree whenever we need to resolve a path to a `ZipInfo` object. This effectively hides the `.zip`-specific quirk that directories are recorded with a trailing slash in their filenames. Adjust `__str__()` so that it returns only the zip member path (`self.at`), which is currently a requirement of `ReadablePath`. This is probably the most significant change. Add the following methods/attributes, which are required by `ReadablePath`: - `info` - `parser` - `__open_rb__()` - `readlink()` - `with_segments()` Disable the following `ReadablePath` methods/attributes that we don't (yet) test in zipp: - `anchor` - `parts` - `parents` - `__rtruediv__()` - `with_name()` - `with_stem()` - `with_suffix()` - `full_match()` - `walk()` - `copy()` - `copy_into()`

This would unify the globbing implementations in the pathlib and zipp, solve all weirdness around trailing slash requirements, and open the door to enabling methods like zipp.Path.walk().

If I can convince @jaraco to merge the PR and backport to CPython, then the next task is to add public support for copying between pathlib.Path and zipfile.Path (both directions). I think we’d make zipfile.Path subclass WritablePath and enable its copy() and _copy_from() methods. We’d probably formalize the private PathInfo metadata methods so we can preserve metadata when copying, e.g. POSIX permissions, modification time.

Big thanks to everyone who has helped over the last few months, including Paul Moore, Petr Viktorin, Steve Dower, Alyssa Coghlan, Bénédikt Tran and Andreas Poehlmann.

If anyone has any feedback on the plan, or ABCs themselves, do feel free to share. Cheers!

barneygale · August 5, 2025, 2:25am

August 2025 mini update

I’ve replaced the JoinablePath.__str__() abstract method with JoinablePath.__vfspath__() (merged PR)
I’m revising the __open* methods (open PR, thread)

Both of these are important for using the pathlib ABCs in zipp.Path / zipfile.Path (open PR).

If/when zipp is ready to adopt, I’ll bump the pathlib-abc version number to 1.0.0.

For the time being I’ve disabled the copy() and copy_into() methods my zipp branch. Before we enable them, I reckon we’ll need our own version of PEP 706 - Filter for tarfile.extractall. IMO this doesn’t block the zipp PR, but it does block a PEP. If anyone would like to give this some thought or prototyping, please feel free to share your progress in this thread.

Thank you to Jason R Coombs for his feedback on the zipp PR, and to everyone else who has helped with pathlib things lately. Bye for now.