Destructuring paths with glob patterns

barneygale · May 30, 2023, 10:13pm

With some recent changes to pathlib.PurePath.match(), it’s become possible to write a variant method that returns the matching path segments, rather than True, on successful match. Consider:

>>> from pathlib import PurePath
>>> path = PurePath('/home/barney/cpython/Lib/test/test_pathlib.py')
>>> path.destructure('**/*')
('/home/barney/cpython/Lib/test', 'test_pathlib.py')
>>> path.destructure('**/*', keep_ends=True)
('/home/barney/cpython/Lib/test/', 'test_pathlib.py')
>>> path.destructure('/home/*/**/cpython/**/*.py', keep_ends=True)
('/', 'home/', 'barney/', '', 'cpython/', 'Lib/test/', 'test_pathlib.py')

I think this could be useful for pulling information out of paths, but I’m not sure. Any opinions on this potential feature?

Patch (click to expand...)

diff --git a/Lib/pathlib.py b/Lib/pathlib.py
index 62406473b6..d51bbc7808 100644
--- a/Lib/pathlib.py
+++ b/Lib/pathlib.py
@@ -145,7 +145,7 @@ def _compile_pattern_lines(pattern_lines, case_sensitive):
             # path separators, because the '.' characters in the pattern will
             # not match newlines.
             part = fnmatch.translate(part)[_FNMATCH_SLICE]
-        parts.append(part)
+        parts.append(f'({part})')
     # Match the end of the path, always.
     parts.append(r'\Z')
     flags = re.MULTILINE
@@ -785,6 +785,25 @@ def match(self, path_pattern, *, case_sensitive=None):
         else:
             raise ValueError("empty pattern")
 
+    def destructure(self, path_pattern, *, case_sensitive=None, keep_ends=False):
+        if not isinstance(path_pattern, PurePath):
+            path_pattern = self.with_segments(path_pattern)
+        if case_sensitive is None:
+            case_sensitive = _is_case_sensitive(self._flavour)
+        pattern = _compile_pattern_lines(path_pattern._lines, case_sensitive)
+        match = pattern.match(self._lines)
+        if not match:
+            return None
+        sep = self._flavour.sep
+        trans = _SWAP_SEP_AND_NEWLINE[sep]
+        groups = []
+        for group in match.groups():
+            group = group.translate(trans)
+            if not keep_ends:
+                group = group.rstrip(sep) or group
+            groups.append(group)
+        return tuple(groups)
+
 
 # Subclassing os.PathLike makes isinstance() checks slower,
 # which in turn makes Path construction slower. Register instead!

taleinat · June 1, 2023, 6:48pm

To me this seems very esoteric. I’ve been programming for over 20 years and have done lots of path handling, but have never needed something like this.

Therefore, to me this seems better addressed via a recipe/gist rather than added as yet another method to maintain, document etc.

pylang · June 1, 2023, 10:24pm

Overall, I welcome tools to help more easily handle paths in pathlib.

At first glance, I like the idea of “resolving”/destructuring the glob pattern into a path. However, it takes some getting used to since I usually use the **/* pattern when working with multiple file paths. Here, the destructure() method seems to focus on a single path, so I’m uncertain of the use cases.

Is **/* the only pattern this method would use? Ex. I can’t think of a reason to use * on a single path.
How are directory paths destructured?

path = PurePath('/home/barney/cpython/Lib/test/')
path.destructure.('**/*')
# ('/home/barney/cpython/Lib/test', '')
# or ...
# ('/home/barney/cpython/Lib', 'test')
# or should keep_ends=True be default ...
# ('/home/barney/cpython/Lib/', 'test/')

Topic		Replies	Views
pathlib.Path.match and ** Python Help documentation , help	3	4232	June 3, 2021
Add glob.translate(): convert path with shell wildcards to regular expression Ideas	4	813	December 1, 2023
Enrich pathlib with a PureURLPath Ideas	10	893	July 19, 2021
pathlib.Path.joincomponent() Ideas	29	949	March 19, 2024
Suggestion for pathlib: differentiate explicit and implicit local paths (pathlib.StrictPath?) Ideas	39	1827	January 9, 2024

Destructuring paths with glob patterns

Related Topics