Make pathlib extensible

Hey folks. A question for you all: how should users supply state (like a backing socket, fileobj, etc) to their Path types?

Taking a potential TarPath class as an example (see gh-89812), here are some ideas:

import tarfile

mytar = tarfile.open('sample.tar.gz')

readme = mytar.TarPath('README.txt')                   # Idea 1
readme = tarfile.TarPath[mytar]('README.txt')          # Idea 2
readme = tarfile.TarPath('README.txt', backend=mytar)  # Idea 3
# xxx your idea here? :)

Ideas 1 and 2 generate a new TarPath type for each instance of TarFile; this type has the TarFile instance stored as a class attribute. Idea 2 is probably a patent abuse of __class_getitem__. The advantages of these ideas are:

  • The type’s interface, including its constructor, is exactly compatible with Path.
  • It doesn’t require much internal work in pathlib.

Idea 3 doesn’t generate a new TarPath type for every TarFile instance, but it does require some significant work on pathlib’s internals to facilitate passing the backing backend around to new TarPath objects (e.g. from iterdir()). This work might remove private constructors like _make_child_relpath() that assume the input is already normalized, which would have a performance impact. On the positive side, it could open up customization of how pathlib normalizes paths, which has been requested a few times over the years. E.g. folks might want to retain the leading ./ or trailing / in a path like ./foo/bar/baz/ as these can be meaningful to shells.

Any feedback? Other ideas/thoughts? Cheers.

2 Likes