I submit for your consideration what I believe is the first working version (with documentation) of an extensible, subclassable PurePath/Path to close this 6-year-old bug(1) that has spawned both a StackOverflow thread(2) and a page on CodeReview(3). I present the fix here both as a PR and an idea because as I understand it, this is the proper forum to discuss making additions (which my fix requires) to the standard library.
I believe that to make sense of what I did to fix it, it is important to understand at a deep level what the real cause of this bug report is. To do so you have to go down a little bit of a rabbit hole of logic. Iām hoping you will indulge me and join me as I walk you through why the bug exists and has persisted for so long. For the points that I am about to make I will examine Path, and its derivatives PosixPath and WindowsPath, but the points I make apply equally to PurePath and its derivatives.
The reason that Path is not naively subclassable (by design) is because it is a factory class which when instantiated returns one of two entirely different classes. As such, despite what the dependency diagram indicates at the top of the Pathlib documentation(4), Path inverts the normal dependency relationship with its derivative subclasses. To produce them, it must know that they exist somehow. Right now they are just hard-coded references in the new method. This means however if we naively subclass path with something like:
class MyPath(Path):
pass
It will fail because we didnāt also create new derivative classes to correspond to each of the Posix and Windows system flavours and tell our new MyPath about them so that it can generate them.
To my knowledge, there are four solutions to this problem, and I believe only one of them should be preferred. Before I show you that, Iām hoping you will go with me and take a moment to consider why Path is a factory that returns different classes in the first place.
From what I gather from the documentation and the PEP, the reason for this is to facilitate writing platform-agnostic code.
From PEP 428:
It is expected that, in most uses, using the Path class is adequate, which is why it has the shortest name of all.
From the Pathlib documentation:
If youāve never used this module before or just arenāt sure which class is right for your task, Path is most likely what you need.
Using Path one could for instance write:
path = Path(get_path_from_somewhere())
do_stuff(path)
This will run on both Posix and Windows as long as do_stuff is polymorphic with respect to both PosixPath and WindowsPath. This might all seem trivial, but if youāll go with me a moment, letās think about how that polymorphism would be achieved in actual practice. One could do the preferred:
def do_stuff(path):
path.really_do_the_stuff()
And then define the platform-specific logic on really_do_the_stuff on our customized derivatives of PosixPath and WindowsPath. Except that wonāt work because Path isnāt subclassable.
So alternatively, we could do this:
def do_stuff(path):
if isinstance(path, PosixPath):
really_do_unix_stuff()
else:
really_do_windows_stuff()
Ok, but then, this could instead be written as:
def do_stuff(path):
import os
if os.name == "posix":
really_do_unix_stuff()
else:
really_do_windows_stuff()
However, if we donāt need to reference the class names with isinstance to facilitate polymorphism, we are left wondering what having a separately named PosixPath and WindowsPath actually does for us.
This hints that this whole factory design of Path/PurePath maybe isnāt achieving what one might think that it would. But it must be necessary right? It canāt just be that it is only there complicating things and thereby preventing subclassing, right? Pathlib is brilliant with its object-oriented goodness and convenience methods (infinite praise goes out to Antoine Pitrou for this), but if the factory were necessary for all of that I wouldnāt be writing this. However, it turns out we donāt need the factory. Path works fine without it.
I know this because I have constructed a complete replacement of PurePath and Path which omit the factory design but instead attach the flavour to the class at the time of instantiation. These newly designed classes still pass all of the test cases that are run against PurePath and Path on both Windows and Linux*. Because they omit the factory design, they donāt have inverted dependencies and therefore are naively subclassable and extensible in any way you see fit. You can use them to achieve platform-agnostic code either using os.name as above, or by attaching platform-specific code to subclasses and calling it that way.
So now weāve come to what the real crux of the problem is. Itās not an issue hidden in the implementation, itās an issue hidden in the design. Of course this design public, and also codified in a PEP. As such the only real way out is to introduce an alternative set of classes to pathlib. To this end I give you:
SimplePath (subclassable alternative to PurePath)
SimplePosixPath (subclassable alternative to PurePosixPath)
SimpleWindowsPath (subclassable alternative to PureWindowsPath)
FilePath (subclassable alternative to Path/PosixPath/WindowsPath)
PathIOMixin (**explained further below)
On Windows, SimplePath behaves as if it were PureWindowsPath. On Posix, it behaves as PurePosixPath. Similarly, FilePath behaves like WindowsPath on Windows and PosixPath on Posix. These four classes combined (less PathIOMixin) could, if one were so inclined, act as a complete replacement for the existing six Path/PurePath classes.
Iāve attached all the code discussed above as part of this PR to close this issue. If you are still in doubt, Iām hoping youāll take the time to look at my code. Despite being 11 commits, itās essentially just two refactors. The first splits the two responsibilities PurePath has, separating the base class methods into _PurePathBase. The second does the same with Path, moving the base class methods into PathIOMixin. The other commits are just minor tweaks and documentation to account for all of that.
The advantage this design has is that it allows subclassable PurePath/Path-like objects that people can work with while abiding by the existing standards framework and simultaneously not breaking anyoneās existing code. If at some point one decided to deprecate via official methods the less functional versions of these classes and fix all of the surrounding documentation, then that is something that could be pursued.
In the beginning, I mentioned that I am aware of four solutions to this problem. Iāve left the other three until now because I wanted to make sure that you really understood the problem and how it could be avoided. The difference between the solution I gave above and the following, is that all of the following bolster the existing problem in design. Every one of them is built on top of the existing problem, pouring concrete around the assumption that the factory is necessary and fixing it in place for years and years to come. I prefer to leave the option open to remove that at some future date if it is not offering any functional benefit. All that said, in good faith, here are the other options:
A)
# Make Public in __all__
class SubclassablePurePath1(type(PurePath())):
pass
class SubclassablePath1(type(Path())):
pass
or alternatively:
import os
# Make Public in __all__
class SubclassablePurePath2(
PurePosixPath if os.name == "posix" else PureWindowsPath
): pass
class SubclassablePath2(
PosixPath if os.name == "posix" else WindowsPath
): pass
B)
Have Path be aware of its subclasses via registration with for example
init_subclass. Then, upon instantiation, in the new method, check whether a subclass of the appropriate flavour exists. If not create it with type, and then instantiate an instance of that with the pathsegment arguments. Then there are all sorts of little caveats to worry about. First, you have to worry about what if there are already existing multiple subclasses with the right flavour, how do you decide which is the class to use? Also, when instantiating the naively subclassed
class MyCoolPath(PurePath): pass
How do we choose the names of the derived classes if they donāt exist? On windows would we create a WindowsMyCoolPath? Or should it be MyCoolWindowsPath? (It turns out it would have actually been more straightforward if Pathlib used WindowsPurePath instead of PureWindowsPath, but Iām not sure how that plays with the English grammar rules for combining adjectives.) Also, this begs the question, should we create an overrideable function that allows users to customize how the name of their derived classes are chosen?
The answer is no. Just no. I started writing this, and I deleted it because I realized this merely kicks the can down the road and ignores the real problem. People are going to run into all sorts of derived class naming issues at the very least.
C)
So on to the fourth and last solution. Recently, there has been a lot of reorganization in pathlib. Barney Gale has been putting in a lot of work, fixing inconsistencies in its organization as well as various bugs. I admire how he recognized that _accessor is a vestigial abstraction buried in pathlib and has put in a PR that through a series of commits removes it. He also has a desire to make Path subclassable, and an idea (as I understand it) that he was proposing was to do this not directly, but adding a class that inherits from PurePath and from which Path is derived called AbstractPath. (His thread on that here.) He has several open PRs out there for Pathlib and was put in a lot of work in rewriting the bulk of the functions in pathlib to facilitate his vision. I donāt want to misrepresent his ideas, so Barney, if you are reading this, Iām hoping you can explain better than I can what your end solution is going to look like. Also, I apologize, Iām not trying to put you on the spot, but just want to make sure that I make space for an alternative solution and acknowledge all the work you have done towards solving this problem as well.
Incidentally, Barney, I hope youāll see that everything that you are working to achieve falls out naturally from the PR I submitted above.
For instance:
class LimitedIOMixin:
""" Depends on SimplePath """
def open(self):
...
# Any other specific I/O methods you want to implement
class MyPlatformAgnositcLimitedZipPath(SimplePath, LimitedIOMixin):
pass
So thatās my argument. I submit my idea and code because I think it is the way to truly solve this problem. Hopefully, others see this as well and also see how everyone gets out of this solution what they were hoping pathlib would provide. If you have made it all the way through this, thank you for your time and consideration. I welcome any feedback at all you have for me.
*Full disclosure, there are a couple of tests that referenced class names and were inappropriate to run, but this statement is otherwise true.
** Why is PathIOMixin Necessary?
The answer is that to write custom IO operations for paths that are dissimilar in flavour to the machine that you are running the code on, you need to be able to combine the flavoured SimplePath with a mixin that provides the IO operations. You could write an entirely new IOMixin, but because of the work Barney is doing simplifying the organization of Path, if you just override 11 of the methods in PathIOMixin then all of the other 21 methods from Path will just automatically work (because they depend on them / will depend on them). For example:
#Called from Posix
class AzureWinIOMixin(PathIOMixin):
def stat(self, *, follow_symlinks=True):
...
# Also define owner, group, iterdir, readlink, cwd, home, touch, mkdir,
# symlink_to, hardlink_to, rename, repace, unlink, rmdir, open, chmod
class AzureWinServerPath(SimpleWindowsPath, AzureWinIOMixin):
pass
Referenced Links:
- h-t-t-p-s://bugs.python.org/issue24132
- h-t-t-p-s://stackoverflow.com/questions/29850801/subclass-pathlib-path-fails
- h-t-t-p-s://codereview.stackexchange.com/questions/162426/subclassing-pathlib-path
- h-t-t-p-s://docs.python.org/3/_images/pathlib-inheritance.png