Pathlib is great, yet every time I have to parse a bunch of files, I have to use os.walk and join paths by hand. That’s not a lot of code but I feel like pathlib should have higher-level abstractions for all path-related functionality of os. I propose we add a Path.walk method to combat this issue.
We have already had a small discussion in the related github issue where a few questions were raised:
- Where do we put the os.walk logic?
i) Do we extend iterdir logic with walk logic?
ii) Make a Path.walk method with the same API as os.walk
iii) Do we make a method with a different name and API to provide a simplified access to os.walk features?
- Do we re-implement Path.walk using only pathlib tools to optimize it?
Extending iterdir with os.walk logic will most likely lead to iterdir having 4-6 arguments and its “walk” usages being basically equivalent to Path.glob("**/*"), losing some of the core os.walk features. For example, in os.walk, the users can prevent descension into certain directories by removing their names from the yielded directory list.
barneygale has mentioned that os.walk api is a bit too complex so it would be great to simplify it for pathlib, yet I do not see any way of simplifying it without losing its features.
Here’s the current prototype of the implementation:
def walk(self, topdown=True, onerror=None, followlinks=False): for root, dirs, files in self._accessor.walk( self, topdown=topdown, onerror=onerror, followlinks=followlinks ): root_path = self._from_parts([root]) modified_dirs = [root_path._make_child_relpath(d) for d in dirs] yield ( root_path, modified_dirs, [root_path._make_child_relpath(file) for file in files], ) # In topdown mode, os.walk() allows user to prevent descension # into certain dirs by removing them from the yielded list. # We use this little hack to reach the same behavior. # See os.walk for more details. dirs.clear() dirs.extend([d.name for d in modified_dirs])
It is twice as slow as os.walk but about ~2.7 times faster than
Path.glob("**/*") and seems to support all of the features and has an equivalent API as os.walk.