Extended globbing through path names

As far as I’m aware there are two tools within the standard library that support globbing pathnames, glob and pathlib.

Neither of these seem to support bash-style extended globbing or regex expression usage. Is there any appetite for adding this functionality?

3 Likes

wcmatch library has an option to emulate Linux-style extended globbing and more, though I agree I think this is commonly used enough to be included in the standard library.

I have been both a Python user (avid fan of pathlib) and a Linux user for the last 5 years and I have never felt I needed more extended globbing. It might just be because I’m unaware of a nice feature, so could someone explain how the ‘bash-style extended globbing’ works with an example?

1 Like

Some features aren’t a big deal until you run into a situation where you have a strong need for them. I’m the author of the above mentioned wcmatch lib. Personally, I’ve used extended glob patterns in a number of solutions and find them quite useful, especially when you’d like to glob a more complex set of files and folders and would prefer to limit the number of passes required.

You can certainly glob with multiple patterns and never touch extended glob patterns, and things like brace expansion (also included in wcmatch) allow you to generate multiple patterns for globbing quite easily, but extended glob patterns are performed in one pass, as a single pattern.

You can think of them as capture groups in regex.

@(this|or|this|or|this)
*(zero|or|more|of|these)
?(zero|or|one|of|these)
!(not|any|of|these)

I’d argue that @(this|or|this) is generally better than {this,or,this} as the latter is implemented more as a pattern expansion (one pattern will become multiple patterns) while the former is essentially accomplished using groups within a single pattern.

With that said, I’ve seen more people interested or asking about brace expansion more than I’ve seen them ask about extended glob patterns. I think it may resonates more with people regardless of whether it is more efficient or not :person_shrugging:.

2 Likes

So if I understand you correctly, something like Path.bash_glob("some/path/*(*.py|*.js)") would find all the python and javascript files in the some/path directory? (Excuse me if I misused the glob syntax, I don’t use it very often.)

Yeah, basically:

>>> from wcmatch import pathlib
>>> list(pathlib.Path('.').glob('docs/**/*.@(md|html)', flags=pathlib.GLOBSTAR | pathlib.EXTGLOB))
[PosixPath('docs/theme/announce.html'), PosixPath('docs/src/markdown/glob.md'), PosixPath('docs/src/markdown/pathlib.md'), PosixPath('docs/src/markdown/about/license.md'), PosixPath('docs/src/markdown/about/changelog.md'), PosixPath('docs/src/markdown/about/release.md'), PosixPath('docs/src/markdown/about/contributing.md'), PosixPath('docs/src/markdown/index.md'), PosixPath('docs/src/markdown/wcmatch.md'), PosixPath('docs/src/markdown/fnmatch.md')]
3 Likes

That looks like a very nice feature, will remember to use that lib of yours in the future!

Appreciate the kind words, hopefully, you find it useful :slightly_smiling_face:.

1 Like