Add glob.translate(): convert path with shell wildcards to regular expression

barneygale · August 13, 2023, 11:33am

Quoth the fnmatch docs:

Note that the filename separator ('/' on Unix) is not special to this module. See module glob for pathname expansion (glob uses filter() to match pathname segments).

So fnmatch operates only on filenames. What if we want to translate, match or filter on paths? The table below shows the situation:

	arg: filename	arg: path
translate	`fnmatch.translate()`	not supported!
match	`fnmatch.fnmatch()` `fnmatch.fnmatchcase()`	`pathlib.PurePath.match()`
filter	`fnmatch.filter()`	not supported!
glob	N/A	`pathlib.Path.glob()` `glob.glob()`

GH-72904 requests that we fill in that top right “not supported” cell.

I propose we add a glob.translate() function that converts a path with shell-style wildcards to a regular expression. I have an implementation available in GH-106703, which also adjusts pathlib to call the new function for a tidy speedup.

Thoughts? Qs from my side:

Does this seem sensible?
Should the function support a recursive argument, like glob()? Should we match its default (false)?
Should the function support an include_hidden argument, like glob()? Should we match its default (false)?
How worried should I be about exponential execution time? IIUC the fix for this in fnmatch.translate() won’t carry over to glob.translate(), because it relies on all the variable-width parts matching anything. (I may be totally wrong here.)
Am I alright to copy-paste parts of the fnmatch.translate() implementation, particularly [seq] handling, which is common to both versions? Or should I look adding some sort of fnmatch.Translator class that can be subclassed in glob? Or something else?

Ta!

facelessuser · August 13, 2023, 7:45pm

I personally think a glob.translate() seems sensible, but then again, I’ve written a dedicated library for this sort of thing, glob.translate() being here. I think it can be useful if you want to generate some hard path matches for a script, but don’t want it to have any dependencies.

I think being able to include and exclude hidden files is useful as well.

I probably can’t recommend how it should be implemented in the existing Python framework though. Happy to see some of these ideas making it into the standard lib though.

facelessuser · August 13, 2023, 8:15pm

I guess I was thinking about this from an external library perspective. I guess if this is in the standard lib, it can still be useful as you can generate the pattern once, compile it, and not waste time regenerating the pattern and compiling it in the future.

barneygale · September 30, 2023, 8:55pm

Update: I’ve added recursive and include_hidden arguments to my implementation in GH-106703, so it should match glob.glob() exactly. @jaraco has approved an earlier version, but it would be good to get some eyes on the latest version. Would anyone be up for reviewing? Thanks!

barneygale · December 1, 2023, 10:43pm

For posterity: glob.translate() has landed in 3.13. Thanks for your pointers @facelessuser, and thanks also folks who helped review the PR.