Make pathlib extensible

One thought - how does all of this affect os.PathLike and friends? The __fspath__ method is defined as returning str or bytes, and it should return “the file system path representation of the object”. A Path subclass that represents (say) a member of a zip archive can’t really do this, particularly as the user expectation is that os.fspath(path) will return something that could reasonably be passed to an external utility (such as the user’s editor).

Maybe I missed some discussion on this - can _PathBase subclasses opt out of being PathLike? I wonder if they should be “not PathLike” by default, and have to explicitly opt into being convertible to a filesystem path.

Sorry if this is something you’ve already thought about - I’m not sure where I could look to check.

It’s a very good thought, and was briefly covered earlier (easy to miss in this long thread):

pathlib._PathBase won’t be os.PathLike by default for all the reasons you mentioned. It would be a catastrophe if open(ZipPath('README.rst', zipfile=blah)) silently opened a file called README.rst on the local filesystem.

1 Like

Cool, thanks. I’d imagined ZipPath returning something like /.../archive.zip/README.rst (the way zip files appear in the import system) so using it would be wrong, rather than disastrous, but what you’ve done is better :slightly_smiling_face:

1 Like

:sparkles: November 2023 progress report :sparkles:

I’ve merged GH-106703, which adds a new glob.translate() function, and speeds up globbing in pathlib. Thank you @AA-Turner, @jaraco and @encukou for your help with this.

I’ve been considering backporting and publishing the latest pathlib as a standalone PyPI package (compatible with 3.8+), so that other PyPI packages like dohq-artifactory can make use of the new _PathBase ABC and validate its design. Previously I was planning to backport all of pathlib, but it’s rather more work than I was expecting, so I’m now looking at backporting _PurePathBase and _PathBase only.

I’ve played around with possible values for _PurePathBase.pathmod and how we support customization of low-level path syntax. The right answer still isn’t clear to me, but I think setting it to posixpath is a good first guess. I’ll make a PR for that once GH-110670 lands, and then move on to the PyPI package.

Thanks to everyone following along and being encouraging, it means a lot

5 Likes

What’s the reason for this? Did you want the backported classes to be subclasses of the standard library classes (or isinstance(pathlib_backport.Path(), pathlib.Path) == True?

If you drop a link to the repo, I’d be happy to look at the code for answers.

Pathlib relies on new features elsewhere in the standard library - for example glob.translate(), os.path.splitroot() and os.path.realpath(strict=True). It’s the realpath() implementations in ntpath and posixpath that scared me off - they might not be easy to backport.

I wouldn’t mind if someone else does a complete backport, but I reckon my PyPI package will be just the ABCs, at least initially.

3 Likes

:sparkles: December 2023 progress report :sparkles:

I’ve added a private PurePathBase class (GH-110670) and moved the base classes into a private pathlib._abc module (GH-112881, GH-112904). Thank you @AlexWaygood and @tjreedy for helping solve a performance regresssion (GH-112907). The class hierarchy now looks like this:

I’ve copy-pasted the ABC code into a new repository:

I intend to publish this as a PyPI package once some of the more embarassing bugs are fixed. The package will serve as a proving-ground for the ABCs, with a view towards adding them to the (public) standard library once they’ve matured. I’ll make a proper announcement once the first version of the PyPI package is published.

Happy holidays all :slight_smile:

8 Likes

:sparkles: January 2024 progress report :sparkles:

I’ve published the pathlib ABCs as a PyPI package:

I’ve also written some basic docs, including an example TarPath implementation:

https://pathlib-abc.readthedocs.io/en/latest/index.html

The pathlib ABCs are present in the Python 3.13+ standard library, in the private pathlib._abc module. They won’t be made a part of the public standard library unless/until they mature as a PyPI package (and even then it’s not guaranteed!)

This will be my last monthly update in this thread. Folks interested in following the development of the ABCs may wish to watch the pathlib_abc project on github - I’m using the issue tracker there as a todo list. Contributions are most welcome.

It’s been a pleasure to lead this work through its first phase, but I couldn’t have done it without the mentoring, support, insights and reviews from many others. I’m not sure how to thank all these folks - I’ll put together some sort of CREDITS file perhaps, and include it in the pathlib-abc 1.0 release announcement when it comes (some time this year?).

That’s all! Happy new year everyone :tada:

28 Likes

Thanks a lot Barney! Your work on pathlib has been excellent and it was fun watching the progress. You’ve definitely helped out the Python community a lot here.

6 Likes

Yes, thanks a lot @barneygale ! This will make it much easier to write custom implementations of the pathlib interface.

I will gladly used it in my OmniArchive package where I extracted the necessary parts of the pathlib interface for the time being. (The package is similar to your TarPath example, but also implements the same interface for zip files and aims to provide transparent access to any kind of archive.)

3 Likes

An irregular update:

I haven’t undertaken much pathlib work lately, because by the beginning of the year I’d already made quite a lot of changes in 3.13, and didn’t want to overload users. The What’s New in Python 3.13 doc has all the public-facing stuff, and plenty has changed under the hood, too. Folks should find that pathlib is faster in 3.13 beta 1 than in earlier versions for a variety of common operations and scenarios. I’d love to see some independent confirmation of this (and only slightly less happy to see it refuted, if I get new optimisation targets).

With 3.13 beta freeze a few days away, I’m now thinking about what we could do for 3.14. My current plans are as follows:

Bootstrapping the ecosystem: I plan to release a few new PyPI packages that provide concrete implementations of various filesystems atop pathlib-abc. Not only will these be useful in and of themselves, they’ll also serve as templates that users can copy, adapt for other virtual filesystem backends, and perhaps release.

Flat filesystems: some virtual filesystems (tar and zip files, s3, git…) provide fast access to a list of all files (perhaps under a prefix), rather than direct children of a directory; in some cases, directories can only be inferred from file paths, and not directly recorded. I’m hoping to add a pathlib_abc.FlatPathBase class with efficient algorithms for these sorts of filesystems, and to steal all of @jaraco’s ideas from zipfile.Path! :wink:

File transfers: for Python 3.14, I’m planning to add copy(), copytree(), rmtree() and move() methods to PathBase and Path, fulfilling GH-73991. The implementations in PathBase will be generic, and so it will be possible to write something like:

source = TarPath('images', archive=tarfile.open(...))
target = FTPPath('public_html', 'images', ftp=ftplib.FTP(...))
source.copytree(target)

Backporting local paths: once the copy() etc methods are in, users of pathlib-abc in earlier versions of Python will want to use them, and so I’ll need to backport 3.14’s pathlib.PurePath, Path, etc, for earlier versions. Maybe another new PyPI package.

Collaborating with package maintainers: with all the above in place, I’ll try to convince maintainers of pathlib-y packages of the merits of using pathlib-abc, and make PRs if they agree.

That’s all! My apologies to everyone who hates thread necromancy! :sweat_smile:

16 Likes

Barney, please consider moving the implementation out of “__init__.py” in 3.13+. I prefer to use it just for imports that define the API, like how asyncio is implemented, except with more explicit module names if the name is otherwise too generic, such as “_pathlib_abc.py” instead of “_abc.py”. I’ll suffer in silence if you disagree, but I had to at least ask.

Hah, I was just considering doing that yesterday. PR coming up!

3 Likes

I’ve had a lot of experience with backports lately (importlib_*, zipp (zipfile.Path), backports.tarfile, configparser, singledispatch, backports.functools_lru_cache), all using slightly different approaches and tradeoffs. Maybe I’ll get around to documenting those, but in lieu of some useful documentation, feel free to reach out for advice if as you’re looking to backport.

tl;dr, I’d probably recommend using backports.* but keeping the CPython implementation as the canonical implementation.

1 Like

Sorry to be so late to the party here, but just throwing my 2 cents that the easiest way to make pathlib.Path extensible is to make it inherit from WindowsPath or PosixPath at definition time.

Currently which of WindowsPath and PosixPath to use is decided at instantiation time, which is wholly unnecessary because os.name is never going to change for the life of the program.

So the new class hierarchy should look like this:

class PathBase(PurePath):
    ... # I/O methods

class WindowsPath(PathBase, PureWindowsPath):
    ...

class PosixPath(PathBase, PurePosixPath):
    ...

class Path(WindowsPath if os.name == 'nt' else PosixPath):
    pass

This way, a user class inheriting from Path will never be missing a _flavour attribute.

I have a working demo in Is there a pathlib equivalent of os.scandir()? if anyone’s interested.

1 Like

That is broken, see: hatch/src/hatch/utils/fs.py at hatch-v1.10.0 · pypa/hatch · GitHub

1 Like

Come to think of it, I would say there really shouldn’t be multiple sources of truth for that. Aside from which they aren’t even the same string:

>>> sys.platform
'linux'
>>> os.name
'posix'

(Or 'win32'/'nt', respectively, for WIndows)

(Actually, for Linux systems, aren’t those two values exactly the wrong way around, if we grant that they should be different? I’ve never heard of installing a “posix distro” from “posix live media”; and if anything, the point of POSIX is to standardize interfaces, which sounds to me like the sort of thing a “platform” might do.)

2 Likes

Good point, though my example was a simple reuse of the current code:

Oh yes I understand thanks, I just wanted to avoid people copying that and having to dig deep to understand why type hinting did not work, as I had to do :slightly_smiling_face:

Right, though come to think of it, it really should be mypy that needs to be fixed to recognize os.name as well.

2 Likes