Support for subclassing pathlib.Path
was added in 3.12, so this should just work:
class MyPath(pathlib.Path):
pass
The cost of the check in Path.__new__()
is a shame, but unavoidable without breaking APIs as far as I can tell.
Support for subclassing pathlib.Path
was added in 3.12, so this should just work:
class MyPath(pathlib.Path):
pass
The cost of the check in Path.__new__()
is a shame, but unavoidable without breaking APIs as far as I can tell.
On POSIX compliant systems, POSIX is the name of the API that’s mostly used to implement the os
module, and os.name
is “posix”. On Windows systems, Win32 is the name of the API that’s mostly used to implement the os
module, but os.name
is the name of the base system/kernel, “nt”. On POSIX systems, sys.platform
is the name of the base system/kernel, such as “linux” and “darwin”, while on Windows it’s the name of the API, “win32”.
Ah I totally missed that Python 3.12 already made pathlib.Path
subclassable when I tested my suggested code in Python 3.10.
I understand why for backwards compatibility you need to keep the relative class hierarchy as is and the if cls is Path: cls = WindowsPath if os.name == 'nt' else PosixPath
check in Path.__new__
, and I also like that the new implementation drops the whole _Flavour
class by simply reusing os.path
.
Thanks for all the good work.
At many places, functions support filenames, path-like objects and file handles like this:
if hasattr(filename, "read") or hasattr(filename, "write"):
fp = filename
closefp = False
elif isinstance(filename, (str, bytes, os.PathLike)):
fp = open(filename, mode)
closefp = True
But this code rules real generalized path-objects out. So how should we improve this?
Is it ok, to expect, that any object with a open
-method can be used
if hasattr(filename, "read") or hasattr(filename, "write"):
fp = filename
closefp = False
elif hasattr(filename, "open"):
fp = filename.open(mode)
closefp = True
elif isinstance(filename, (str, bytes, os.PathLike)):
fp = open(filename, mode)
closefp = True
or is it better to check for a instance of type PathBase?
if hasattr(filename, "read") or hasattr(filename, "write"):
fp = filename
closefp = False
elif isinstance(filename, pathlib._abc.PathBase):
fp = filename.open(mode)
closefp = True
elif isinstance(filename, (str, bytes, os.PathLike)):
fp = open(filename, mode)
closefp = True
It think, to allow a broader support for path implementations, there should be one way to handle this case.
Path objects work just fine with open()
, they don’t need a separate branch.
But I don’t love this pattern in general–I would rather write one function that takes a file-like object, and another that takes paths and opens them for use by calling the first function.
Sparkly update!
I’ve resolved GH-73991 by adding new Path.copy()
, copy_into()
, move()
and move_into()
methods. Publicly these can only be used for local filesystem copies/moves, but secretly these methods are implemented in PathBase
and allow any other instance of PathBase
as the destination path. With appropriate implementations of PathBase
, it should be possible to move a file from local storage to a .zip file, thence to a .tar file, and finally back to local storage, with three calls to move()
.
There are underscore-prefixed methods for preserving metadata when copying/moving between different types of PathBase
, but they need much refinement. As mentioned in my last update, I also need to develop a FlatPathBase
class to better support filesystems that aren’t directory-oriented. To make progress on both of these things, I’m planning to write my own private PathBase
-derived version of zipfile.Path
that passes all/almost all its tests. I’ll extract the generic bits (e.g. generation of implied directories) into FlatPathBase
, and ensure we can round-trip metadata in copy()
and move()
. If I can get it working, it should allow me to finalize the pathlib._abc
APIs.
I think that’s it for now. Cheers!
I haven’t closely followed the new developments but I noticed a really useful method is now deprecated: pathlib — Object-oriented filesystem paths — Python 3.14.0a0 documentation
What was the reason for this? I can’t find anything using GitHub’s search.
Ooo an update!
I’ve written a hacky implementation of zipp.Path
atop PathBase
and found the experience very informative! I’m looking forward to sharing the results with @jaraco. One thing that’s become obvious: the PathBase
interface is too large, at ~50 methods and attributes.
I’m thinking of eliminating around 20 methods from the PathBase
interface, such as is_fifo()
, hardlink_to()
and group()
. Implementations may wish to add those methods, but they shouldn’t be guaranteed by the PathBase
interface I don’t think.
I’m also considering splitting PathBase
into read-only and read-write classes. The read-only class would define ~20 methods (including three abstract methods: open()
, iterdir()
and stat()
), and the read-write class would add ~10 methods (including three abstract: mkdir()
, symlink_to()
and _delete()
). These could be made true ABCs rather than quasi-ABCs-that-raise-UnsupportedError
s-by-default, which might be nice.
I don’t know what to name these classes, maybe ReadablePathBase
and RWPathBase
? Meh…
Here’s my working, if it’s useful: PathBase pruning - Google Sheets
It’s an exciting stage of the project for me: the pathlib ABCs are feature-complete, and the remaining technical work amounts to pruning and neatening.
Thanks for reading!
I looked at your table, and one thing makes me uneasy about this split: open
is a method that can be used to modify or write, in addition to reading. It doesn’t look like it belongs to a “read” class. The same is true to a lesser extent for copy
and copy_into
, because they modify the target path. I understand that in practice you need open
to be able to get the content of the file, so I don’t see another way to do it while keeping the same separation.
How about replacing ReadablePathBase
with something like InfoPathBase
, which would only keep the methods that gather information about the path, like metadata, kind of file, directory scanning… RWPathBase
could become an ActionPathBase
that act upon the path, with the rest of the set. (Sorry, I’m not great ant naming things!)
I adapted your table to show what I mean: Another PathBase pruning - Google Sheets
I didn’t touch the deletion, maybe some of them look like they could be useful, like unlink
?
I’m not sure if this works in the static type system, but I think you can express the open int the base class as
class BasePath:
def open(mode: Literal["r"] = "r", ...):
...
While you can express open in the child classes as
class RwPath:
def open(mode: Literal["r", "w", "rw"] = "r", ...):
...
But I’m not sure if an inherited method can extend the allowed literals in the static type system.
In any case, I don’t see much of an issue with the open method.
I appreciate the feedback. Just on this point:
At the moment, copy()
and copy_into()
allow you to pass another kind of PathBase
object as the target path, and only the target path needs to be writable, so:
# This would work even if the zipfile.Path is read-only
source = zipfile.Path('src', ...)
target = pathlib.Path('blah')
source.copy_into(target)
Just a nitpick that mode has a lot more options (e.g. rt
and rb
)
… Can you tell I don’t work with binary files too often?
I agree that this one is debatable. I was thinking in terms of “is there a risk of losing data somehow?”, and in my mind a ReadablePathBase
as opposed to a RWPathBase
should be safe in this regard. Maybe I took it too much as a ReadOnlyPathBase
, but it’s actually not what you proposed.
move
seems like reasonable default implementation for any kind of path, i.e. subclasses would get it “for free” if they implement the underlying operations (but they could also optimize it for copies within a “filesystem”). It would be sad if it went away.
Similar for methods like rename
or replace
. Implementing them in the base class could be a good opportunity to ensure various Paths can work well together for users, without unnecessary effort from authors of the subclasses.
Most of the is_*
methods marked for removal are simple wrappers around stat()
. I don’t think it makes much sense to remove these default implementations if stat()
is still required.
Which brings to mind what to do about stat()
. It seems that stat_result itself could use a protocol (or generic superclass) – perhaps one with only attributes, not the old tuple-like interface?
Or perhaps another class for with stat-related methods (including is_*
, chmod
, touch
)?
Besides read/write, there’s another split that affects what kind of operations are available/allowed. It might be reasonable to encode this in the type system:
glob
, walk
, iterdir
; mkdir
; move_into
destination)open
, read_*
; write_*
)readlink
, lstat
)(For example, importlib.abc.Traversable
is essentially ReadablePathBase without links, stat
, and some ease-of-life utilities that it could get “for free” by subclassing the base, like glob
.)
On the other hand, such a class hierarchy could easily become too complex to be usable.
Sorry Petr, I think you might have seen a version of the spreadsheet where I was experimenting with evicting PathBase.move()
. I’ve now moved it back into RWPathBase section.
I reckon we should keep PathBase.move()
but evict rename()
and replace()
, because move()
supports most use cases for the latter two methods, adds support for directories, and allows an arbitrary PathBase
as a destination.
Even for “rare” file types? (specifically: block/character devices, FIFOs, unix sockets, arguably Windows junctions.) I’m not ruling out adding these back later in response to user demand, but I’m somewhat doubtful that folks will need them soon. They’re not representable in quite a few kinds of virtual filesystem.
Good idea, thank you, I will do this soon
Hmm, there’s a judgement call here. But there’s some value in keeping a simpler function that can’t copy a whole directory tree across a network. That can be an expensive operation (literally, with things like cloud storage).
They are representable in stat()
results, though. As long as you keep mode
, things like stat.S_ISBLK
need to work on it (i.e. return False in most virtual filesystems). And if that’s part of the API, you might as well add the wrappers to the base class.
(I don’t think the raw number of methods is a problem – the metric to minimize is things that users need to override or think about.)
Happy new year all!
Here’s a revised diagram showing the classes I’m now aiming for:
I’m not planning to add DeletablePath
yet. I’ve moved its prospective methods (_delete()
, move()
and move_into()
) from PathBase
to Path
. We might come back to it later.
WritablePath
inherits directly from JoinablePath
rather than ReadablePath
, which seems much neater to me. There’s a fly in the ointment: PathBase.open()
, which seems to apply to both readable and writable paths. I think this could be solved if we replace the PathBase.open()
method with a pathlib._abc.open()
function. This would work like built-in open()
, but additionally accept objects with __open_rb__()
or __open_wb__()
methods, depending on the mode used. When opening in text mode, it would try __open_r__()
or __open_w__()
first, and fall back to wrapping the binary dunder methods in io.TextIOWrapper
.
I still need to work out exactly how we convert a mode string to a dunder method name, and what modes should be allowed. When I come to write the PEP for making the ABCs public, it will include adjusting built-in open()
to understand the new “openable” protocol.
That’s my current thinking anyway, happy to hear feedback. Cheers!