Make pathlib extensible

Here’s a table showing the proposed methods of the pathlib.abc classes (from the previous post):

JoinablePath ReadablePath WritablePath
Abstract methods __str__()
with_segments()
parser
__open_rb__()
iterdir()
readlink()
info
__open_wb__()
mkdir()
symlink_to()
Non-abstract methods __truediv__()
__rtruediv__()
joinpath()
anchor
parts
parent
parents
name
suffix
suffixes
stem
with_name()
with_stem()
with_suffix()
match()
full_match()
read_bytes()
read_text()
exists()
is_dir()
is_file()
is_symlink()
glob()
rglob()
walk()
copy()
copy_into()
write_bytes()
write_text()

Feedback welcome

5 Likes

Here’s a way to replace the double-duty ReadablePath.open() method without resorting to dunder methods: add a stream argument to ReadablePath.read_bytes() and WritablePath.write_bytes(). When stream=True, these methods would return a readable or writable file object in binary mode. In the ABCs they’d be abstract. There’s analogous behaviour in libraries like requests, and it might appeal to users who feel daunted by the values accepted by open(mode=...).

3 Likes

What about instead of adding the openable protocol, or the stream mode, provide a decorator in pathlib, that takes a function of type Callable[[JoinablePath], IO[bytes]] and returns a valid ReadablePath.open(...) or WritablePath.open(...) method.

Which basically provides the same support for users implementing new pathlib subclasses without the new __open_rb__, __open_wb__ dunders.

Please could you sketch some code? I don’t quite follow.

I initially thought it would be convenient to implement via a single decorator with an argument to chose between read and write, but sketching it out, I realised that the user facing part should look more like the property builtin.
In the end it would just be your openable protocol squeezed into a different usage pattern for no reason other than avoiding new dunders, which likely is not a good a reason anyways.

My suggestion would have looked like something along these lines:

# open_signature_helper would return a callable with signature
# def __call__(self, mode="r", buffering=-1, encoding=None, errors=None, newline=None): ... 
# and internally would call __open_rb__ or __open_wb__ according to mode, 
# wrap IO[bytes] correctly and implement correct behavior for the other options

class MyPathRO(ReadablePath):
    ...

    @open_signature_helper
    def open(self):
        # == __open_rb__
        return ...  # type: IO[bytes]  readable


class MyPathRW(WritablePath):
    ...

    @open_signature_helper
    def open(self):
        # == __open_rb__
        return ...  # type: IO[bytes]  readable

    @open.writer
    def _(self):
        # == __open_wb__
        return ...  # type: IO[bytes]  writable

1 Like

It’s good to have someone else think through the problem all the same :slight_smile: thanks

Having played with some options, I’m going to use the __open_rb__() / __open_wb__() approach, at least for now. It can be done without impacting pathlib.Path (which is nice) and it could always be revised later if someone hates it. Patch here:

pathlib._abc surgery is now complete, so we have this class hierarchy:

The filled and dashed arrows currently represent the same thing: a standard super/subclass relationship. For performance reasons I’m hoping to ABCMeta.register() the pathlib classes as “virtual” subclasses of the pathlib._abc classes, and so the dashed lines represent these planned registrations. I expect it will be a few weeks until I can make that change.

6 Likes

Hi @barneygale!

Thank you for your contributions to pathlib!

If you have time (I completely understand if you don’t), we have a question about pathlib-abc and creating a pathlib.PurePosixPath-like class that is not os.PathLike: Question: recommended approach for `PurePosixPath`-like class that is not `os.PathLike` · Issue #29 · barneygale/pathlib-abc · GitHub

I wasn’t sure if a GitHub issue or this thread would be a better place to ask—please let me know if I should move the contents of that GitHub issue to this thread

Thank you!

1 Like

Hi Carl! Thanks for getting in touch - I’ve left a reply on the issue.

1 Like

If anyone’s interested, I’ve begun writing a (pre-)PEP for making pathlib extensible :slight_smile:

12 Likes

A question: Why there’s a separate JoinablePath and ReadablePath? Are there cases in which paths can be joined but not read?

That’s the same distinction as between the existing PurePath and Path, see the docs.
If a function takes a JoinablePath, I know that the file named by that path doesn’t need to actually exist.
The Readable/Writable split goes further: if something takes a ReadablePath, I’ll expect it to work with an immuable backup directory or a shared container layer.

2 Likes

Well, but it exists also PurePath in pathlib._abc:

What’s the difference between JoinablePath and PurePath?

Several differences:

  • JoinablePath is an ABC that you can’t instantiate directly. To use it, you need to subclass it and provide at least parser, with_segments() and __str__() attributes (at time of writing)
  • JoinablePath provides only part of the PurePath API (e.g. it doesn’t include __fspath__(), __eq__() or as_posix() )
  • JoinablePath doesn’t implement path normalisation (though subclasses might)
  • JoinablePath.parser may be any implementation of PathParser, but PurePath.parser may only be posixpath or ntpath.
1 Like

I figured it might be interesting to revisit the questions in my original post from ~5 years ago and show how we answered them:

In Python 3.11 we removed the “accessor” classes altogether, as they were a vestige of early pathlib development that had no present purpose. Instead we’ve made various methods of _ReadablePath and _WritablePath abstract, like iterdir() and mkdir().

In Python 3.12 we added pathlib.PurePath.with_segments(), which is called whenever a new path object is created from an existing one (e.g. path.parent, path.iterdir()). User subclasses of _JoinablePath should implement this abstract method, which allows them to pass instance data to the new path’s initialiser.

In Python 3.14 we added pathlib.types.PathInfo as a high-level protocol for path metadata. User subclasses of _ReadablePath should expose an info attribute that implements the protocol.

In paths returned from pathlib.Path.iterdir(), the info attribute wraps an os.DirEntry object that’s initialised with information about the path gleaned from scanning its parent. But we haven’t changed os.DirEntry at all, nor do we directly expose os.DirEntry objects in pathlib.

In Python 3.14 we added pathlib.Path.copy() to support copying between paths.

In the pathlib ABCs, this method looks like _ReadablePath.copy(self, target: _WritablePath) - it supports copying between arbitrary readable and writable path objects, including preserving entire directory structures. Furthermore, each target path is given an opportunity to copy metadata from its source (specifically its info object.). Therefore the copy() method can be used to upload and download, to archive and extract, etc, depending on its operands.

Still not conclusively answered! A PEP will sort this out :slight_smile:

6 Likes

I am currently implementing a library following the interface of pathlib. I noticed that cwd() is a classmethod. This works for local file system paths, where this information is a process global, but is not great for other path implementations which need access to instance information like a connection object.

FWIW, I’ve omitted the alternative constructors (cwd(), home(), from_uri()) from the ABCs for this reason :slight_smile:

1 Like

If anyone is looking to contribute to the pathlib ABCs, there’s an open ticket here about adding complete type annotations in pathlib.types:

I’m no typing expert so I’d appreciate any help. It might also be a good opportunity to review the API and highlight rough edges - very happy to hear feedback! Thanks.

:sparkles: March 2025 update :sparkles:

In November’s update I wrote about my plan to split up PathBase and prune its interface. That work is now pretty much complete! I’ve published pathlib-abc 0.4 with the revised interface. Docs here:

I’m now working on a PR for zipp that adds a dependency on pathlib-abc:

This would unify the globbing implementations in the pathlib and zipp, solve all weirdness around trailing slash requirements, and open the door to enabling methods like zipp.Path.walk().

If I can convince @jaraco to merge the PR and backport to CPython, then the next task is to add public support for copying between pathlib.Path and zipfile.Path (both directions). I think we’d make zipfile.Path subclass WritablePath and enable its copy() and _copy_from() methods. We’d probably formalize the private PathInfo metadata methods so we can preserve metadata when copying, e.g. POSIX permissions, modification time.

Big thanks to everyone who has helped over the last few months, including Paul Moore, Petr Viktorin, Steve Dower, Alyssa Coghlan, Bénédikt Tran and Andreas Poehlmann.

If anyone has any feedback on the plan, or ABCs themselves, do feel free to share. Cheers!

10 Likes