This might be an enormous can of worms, but I’d like to suggest that certain high-level path operations in the (longstanding) shutil
module might be more at home in the (relative newcomer) pathlib
module, and that we could move them without breaking backwards compatibility, and unlock other benefits along the way.
Why?
shutil long predates pathlib. Its “shell utilities” remit is broad and overlaps with other modules, including pathlib
. I wonder if Guido might be able to comment, but I get the impression it was the “module of last resort” for things that:
- Were written in Python (so couldn’t be added to
os
), and - Didn’t need platform-specific implementations (so couldn’t be added to
os.path
), and - Were too small to deserve their own module (unlike
glob
,shlex
, etc)
In PEP 428, Antoine suggested that pathlib might provide a good home for these functions:
More operations could be provided, for example some of the functionality of the shutil module.
These pathlib features have been requested perennially ever since.
By also introducing pathlib.AbstractPath
(see this topic), we’d unlock the potential to apply some of these functions to different filesystem backends, such as S3 and its ilk. Users would be able to write path.move()
without caring about the backing filesystem(s), which, like nature, is pretty neat!
What?
In my view, the functions in question are:
copy*()
, includingcopytree()
but excludingcopyfileobj()
move()
rmtree()
chown()
How?
These functions could be added as methods of pathlib.Path
, and in turn implemented using lower-level methods like Path.stat()
, Path.open()
, etc in many cases. When pathlib.AbstractPath
is introduced, users would be able to supply their own implementations of these lower-level methods.
The Path.copy*()
API may benefit from some revision, e.g. merging methods and adding arguments to control behaviour. Perhaps not.
The original implementations in shutil
would call through to pathlib
and probably undergo an extended deprecation period due to their high level of usage.
To make the implementation fully backwards compatible, we’d need to make the following (highly controversial!) changes to pathlib:
- Support for
bytes
paths. I’m pretty sure it’s a settled question that pathlib should not support bytes, but I’d like to unsettle it . Theshutil
functions support bytes;glob.glob()
supports bytes; any POSIX application built with portability and correctness in mind should usebytes
. Correctness and ease of use are not enemies in Python, and we shouldn’t make them enemies in pathlib. On a technical level this is totally doable, and indeed lately we’re moving more towards treating the underlying “raw” path as an opaque object, and leaning more onposixpath
andntpath
for low-level stuff. - Support for supplying directory/file descriptors. I believe Antoine intended to add support for this in pathlib but never finished it; remnants of this implementation survive in the pathlib codebase to this day!
- Support for disabling path normalization. This is to ensure that
shutil.rmtree('...')
etc aren’t affected by subtle quirks in pathlib’s normalization logic, particularly on Windows.
When?
There’s a lot to do first. For me, this only becomes compelling once we’ve introduced AbstractPath
. I’ll also make the case for supporting bytes
separately to this proposal when the time comes.
Still, is this worth (eventually) working towards? Thoughts? Thanks!