Some modules in the stdlib have also a simple CLI, for example the gzip module allows to compress and decompress files, the uuid module generates the UUID, and the sqlite3 module provides the SQL REPL. And a number of modules (tokenize, ast, symtable, dis) allow to inspect Python sources at different levels.
Using such commands as grep and sed every day I often regret their limited regular expression syntax. I propose to add the CLI for the re module which would allow to search and replace in files using the syntax of Python regular expressions. For searching the obvious candidate is implementing the grep command. For replacing, Iâm still thinking about the interface. It could be possible to implement the sed command, but I think that it may be too complex for this.
The benefits:
Windows users will have access to powerful tools from the box.
All users will be able to use more powerful regular expression syntax in these tools.
So I have few questions to community:
Is it a good idea at all? I already implemented grep, with almost full set of the GNU grep options, and it all in just 250 lines.
How to invoke it?
python3 -m re.grep â a submodule of re, separate submodules for different commands. It was one of my internal reasons for reorganizing re into a package.
python3 -m re.tool grep â a submodule of re for the CLI, similar to json.tool. Different commands are subcommands of this module.
A separate script in the Tools/scripts directory. But seems the recent tendency is to wipe out this directory.
Is there existing CLI tool similar to grep for replacing text, so we can borrow its interface instead of designing something new?
ripgrep (rg command) is quite popular these days. There are also pcregrep, ack and ag, off the top. And, of course, git grep.
Note that rg --engine=pcre2 or git grep -P or ag or ack gives you Perl-compatible regular expressions, which is about what Python supports, so Iâm not sure this is worth doing. All of these support Windows.
Since the trend for this kind of tool seems to be towards short command names, canât we just use
python -m re PATTERN FILE ...
as the basic incantation? It could import re.__main__ so it oughtnât weigh down programmatic imports of the re module.
I donât think we should put any effort into perf or functionality to match tools like ripgrep or my personal standby, good old ag (Silver Searcher). It would be nice if it could search directories though.
Maybe once we have no-GIL implemented we can add a -j flag.
To iterate earlier response, what are the benefits compared with rg and other tools that are used everywhere today? ripgrep is one of the first tools I install if Iâm getting a new dev system.
They are not used everywhere today, as not everyone is a professional developer; but even casual Python programmers can benefit from an easy-to-use and built-in grep tool.
Adding a slow version of grep to the stdlib seems like a lot of added maintenance, documentation, and support for a small convenience [1]
Beginners who donât know to install a better tool would be better served with a tutorial on how to write the appropriate python script for their task.
honestly this doesnât even seem that convenient to me? âŠď¸
Serhiy noted that this was implemented in 250 lines, which doesnât seem a massive burden, and often you do just have standard-library Python on a computer, so I can see the benefits. (c.f. batteries-included).
Thank you, it is interesting. But they only support search (ripgrep also allows replacement in the output), not search & replace.
Yes, it was also my initial idea. But I want to support two different operations: âsearchâ and âsearch & replaceâ. They need two different commands.
Better discoverability and accessibility for Python users.
More familiar regex syntax for Python users.
And the main reason â a CLI is a living test bench. Even for core devs it may be easier to use CLI for simple test than use REPL or write a script. It helps to support both the module to which the CLI belongs and argparse in healthy conditions.
250 lines might be how it starts, but the scope creep has already begun.
Even more than maintaining the code, I was thinking that âhow do I use grepâ is suddenly a relevant topic for Python Help and similar places. This requires documentation and continuous support to be a useful feature for beginners, and itâs a redundant feature for non-beginners.
Is there an OS that comes with python pre-installed, but doesnât give you grep [1]? In any case, I thought the idea was to discourage using system-managed python installations. I use a bunch of different environments in my work, and I really donât want to worry that the activated environment will change how my file-searching tool works.
I feel like the intersection of people who a) strongly prefer python regular expressions to other tools and b) canât install a PyPI package is roughly no one. This feels like a perfect candidate for a package that people can install in their path. Thereâs no need to add the documentation and support burden to the stdlib, and it would delay availability by many years (whatâs the version on that âos-provided pythonâ youâve got, anyway?).
Personally I lean more on the side PEP 594, and I donât think python needs any more miscellaneous batteries than it has.
Isnât a regular expression search command already widely available? grep (or egrep or grep -E, depending upon when you started with a Unix CLI) in Unix-like environments is widely available, even on Windows (WSL), right?
I use sed, though only in its most elementary form (e.g., sed -e s/pattern/replacement/[g], sometimes sed -e '/pattern/d' or its complement, sed -n '/pattern/p'). I never understood all the complexity of the rest of it (hold spaces and such). I also use awk where itâs simpler (input lines conveniently broken up into words, making {print $3, $5, $7, ...} trivial).
In short, while providing a grep-like CLI for re might be an interesting programming exercise, I suspect other than for sussing out the preferred general approach to such interfaces for Python-based command line tools, I imagine you might find CLI-ifying other modules/packages would fill more of a niche. For example, years ago I did a lot of work with CSV files. It made sense (to me) to use Unix pipelines for quick-n-dirty transmogrification of such data (moving averages, Sharpe ratios, simple plotting, etc). I found it useful enough that I coaxed my employer at the time to let me take this little toolkit with me when I moved on to my next job. I donât claim that itâs a world-beating CLI (it could hardly be called âproperly designedâ), but it worked for me, and Iâve enhanced it in one way or another over the years. Providing command line access to this sort of functionality (or at least helper packages to create command pipelines) might be more useful.
Not on Windows. Yes, tools like ripgrep exist, and there are various ways of getting Unix-like commands. But first of all, we canât assume these tools are available - in a locked down corporate environment, itâs not impossible that Python is the only tool allowed on servers, for example. And secondly, thereâs no way of writing instructions that will work for everyone - ârun grep ..., or if you have ripgrep, rg ..., or if you have neither of those but you have git, find out where git is installed and run C:\Path\To\Git\bin\grep.exe ..., orâŚâ Whereas being able to say ârun python -m re ...â[1] works for everyone.
Maybe thatâs not important. But âeveryone has tools like grep these daysâ just isnât true, in my experience. Itâs a lot closer than it used to be, and thereâs usually something thatâs no more than a download away, but having something that will always work is a big advantage.
For comparison, Iâve never actually used py -m zipfile or py -m tarfile, but knowing they are always available is an extremely valuable safety net when working in unknown environments.
(Actually, even more useful would be if difflib had a CLI that handled the basic features of diff, because cross-platform implementations of that are still uncommon).
Yes, âhow do I run Pythonâ is still an issue, thereâs the launcher, or store Python, etc. But thatâs an issue thatâs in our control to solve, if we want to âŠď¸
When assisting people on Mac OS, Iâve often run into the problem of âthe tool exists, but THIS flag doesnâtâ (because itâs not the GNU utility, itâs another with the same name and somewhat similar features). Having something I could depend on would be good, as long as I can dictate a Python command to someone.
IDLE has a grep facility called âfind in filesâ that uses os.walk to fine directories, fnmatch.fnmatch to filter files, and str or re functions to search lines. I cannot remember any questions about using this feature and found essentially nothing on Stackoverflow searching [python-idle] for âgrepâ or ââfind in filesââ. If the grep options were sufficiently well documented, I would not expect a support burden.
Searching installed 3.12 /Lib/*.py for a no-match string takes over 10 seconds on my machine. A repeat with caches full takes under 2 seconds. (I am thinking of having IDLE report the search time in the future.) Would a C-coded grep really be much faster? Finding that a no non-idlelib stdlib file does âimport idlelibâ but one 3rd-parth project I have loaded does (both expected) barely took longer.
Edit: I work on Windows so this is my working grep.
Oh fun times! My previous work machine was macOS and I found this out the hard way.
Turns out that grep on macOS is BSD grep, and a very old version of BSD grep. Which is manageable, if it werenât also multiple orders of magnitude slower than GNU grep. Literally profiled it once. We had an old automation bash script that used grep a lot internally that I ended up writing a Dockerfile for[1] because running the script in a container was faster than natively with the macOS version of grep.
Anyway. If thereâs a point there I guess itâs that it can be nice to have options.
not solely because of grep, but it was definitely a factor⌠âŠď¸
Itâs sometimes available, but using the tools from WSL in a native shell isnât easy. They arenât on PATH by default, and they almost certainly handle filenames in a non-native manner.
An option like âpython -m re --sub=âreplacement textâ âmatch patternâ filesâŚâ would also be substantially less cryptic than sedâs âpattern separators with trailing commandâ approach.
Other ideas seemed more ambiguous to me:
allowing two args to imply replacement would conflict with the convention of grep accepting the files to search at the end of the command
same rationale for associating the replacement text directly with the CLI option rather than changing the meaning of the second positional command line argument
ââsubâ (rather than ââreplaceâ) matches the name of the module function and re pattern method
a submodule would have to be called âre.replaceâ to avoid conflicting with the âre.subâ function
I really like this proposal. It would also be nice if it had some flags to control output format like grep. An optional JSON output mode would be especially interesting for downstream processing.