February 2023 progress report
GH-101002 has landed, and so Python 3.12 has gained an os.path.splitroot()
function, which can split a path into a tuple of (drive, root, tail)
. Pathlib uses this function to efficiently parse paths according to OS-specific rules. Thanks again to Alex Waygood and Eryk Sun for their invaluable input, and respect to Antoine Pitrou for identifying the importance of three-part division when he created pathlib.
My plan now looks like this:
- Address GH-101362: Optimize pathlib path construction
- Address GH-76846:
pathlib.Path._from_parsed_parts()
should callcls.__new__(cls)
and GH-85281: subclasses ofpathlib.PurePosixPath
never call__init__()
or__new__()
- This will reduce performance of some pathlib operations, notably
iterdir()
,glob()
andwalk()
. - I’m hoping to make this performance loss as small as possible through the optimisations in step #1.
- This will reduce performance of some pathlib operations, notably
- Address GH-100479: Support for sharing state between pathlib subclasses
- Add
pathlib.AbstractPath
I’m also looking at issues and feature requests related to glob()
– the largest category of pathlib issues on GitHub. There’s three lines of work that I think will converge:
- Make
glob()
treat symlinks consistently – see GH-77609 for discussion -
@Ovsyanka’s fast iterative implementation of
walk()
– PR: GH-100282 - My fast regex-based implementation of
match()
– PR: GH-101398
With these in place, we can write a fast implementation of glob()
, including a really chonky speedup for recursive globs. This should help relieve any lingering pain caused by the main plan (see step 2 above).
Thanks for reading! Bye for now