February 2023 progress report ![]()
GH-101002 has landed, and so Python 3.12 has gained an os.path.splitroot() function, which can split a path into a tuple of (drive, root, tail). Pathlib uses this function to efficiently parse paths according to OS-specific rules. Thanks again to Alex Waygood and Eryk Sun for their invaluable input, and respect to Antoine Pitrou for identifying the importance of three-part division when he created pathlib.
My plan now looks like this:
- Address GH-101362: Optimize pathlib path construction
- Address GH-76846:
pathlib.Path._from_parsed_parts()should callcls.__new__(cls)and GH-85281: subclasses ofpathlib.PurePosixPathnever call__init__()or__new__()- This will reduce performance of some pathlib operations, notably
iterdir(),glob()andwalk(). - I’m hoping to make this performance loss as small as possible through the optimisations in step #1.
- This will reduce performance of some pathlib operations, notably
- Address GH-100479: Support for sharing state between pathlib subclasses
- Add
pathlib.AbstractPath
I’m also looking at issues and feature requests related to glob() – the largest category of pathlib issues on GitHub. There’s three lines of work that I think will converge:
- Make
glob()treat symlinks consistently – see GH-77609 for discussion -
@Ovsyanka’s fast iterative implementation of
walk()– PR: GH-100282 - My fast regex-based implementation of
match()– PR: GH-101398
With these in place, we can write a fast implementation of glob(), including a really chonky speedup for recursive globs. This should help relieve any lingering pain caused by the main plan (see step 2 above).
Thanks for reading! Bye for now