Getopt and optparse vs argparse

cameron · October 31, 2024, 8:40pm

Well I write a lot of CLIs also. I use getopt. And only getopt. I do
have a little library on top of it for ease of use. It is solid, robust,
simple, stable, and does everything I want.

I’ve always dislikes argparse. I have not tried optparse.

I was very glad to see it not removed.

There’s no need for it to be third party because it’s stable: not
growing features or displaying bugs.

pf_moore · October 31, 2024, 8:57pm

On the other hand, trimming the stdlib down to the point where you break pip, which is what enables people to install things easily, somewhat defeats the object. If someone wants to do the work to move pip off optparse, and then propose removing optparse from the stdlib, I wouldn’t object, but dropping optparse and leaving pip to deal with the consequences is not, in my view, helpful to anyone.

I see a number of options here:

Leave optparse alone (or un-deprecate it). It’s not clear to me what the harm is in simply leaving it. If necessary, update the documentation to better describe its target use cases (I like Alyssa’s framing of the differences between the 3 modules).
Someone do the work to migrate pip off optparse onto something that doesn’t depend on optparse (which either means moving click off optparse as well, or persuading the pip developers to move to something other than click^[1]).
Someone do the work to make argparse a drop-in replacement for optparse. This is what @storchaka is planning on doing, so maybe we should just drop the subject until that work is done, and then re-evaluate things.

Moving optparse out of the stdlib into a 3rd party module isn’t much help. As @notatallshaw said, pip would have to vendor it, which means that all distros that devendor pip will expect that package to be maintained at least as well as it was in the stdlib. And it seems unlikely that will be the case, in practice. More generally, I don’t think it sends a good message if the Python core developers are willing to drop support for a key component of one of the fundamental tools in the packaging ecosystem.

I’m happy to see argparse improved. I wasn’t aware that it was as buggy as this thread seems to suggest, but if it is, then I’m all in favour of fixing those bugs and making it better. But can we please just do that and not disrupt the status of getopt and optparse? I’m willing to give up on the suggestion to un-deprecate them, if it helps keep things from getting worse, but I don’t see why we can’t just leave them alone.

which, to be fair, is mostly just an idea, not a hard and fast plan ↩︎

ofek · October 31, 2024, 8:59pm

@davidism is already working on that rewrite the parser · Issue #2205 · pallets/click · GitHub

pf_moore · October 31, 2024, 9:06pm

Cool. If that happens, and someone steps up to move pip off optparse onto click, my specific objections to removing optparse from the stdlib will vanish. I still disagree with slimming down the stdlib on general principle, but that’s a different question.

brettcannon · October 31, 2024, 9:16pm

Pip actually already vendors everything, so that probably wouldn’t be a massive burden on pip itself (but see a reply to Paul later).

Possibly, but those sorts of timelines aren’t intimidating for a language that predates Linux.

And yet here we are talking about it and it taking up development time. I personally would argue no code is free of maintenance.

That’s fine as the code hasn’t been updated in 6 years (so while updates are slow, they exist but could possibly consider the module frozen for the sake of pulling out and encouraging migrating if something were to break).

dg-pb · October 31, 2024, 9:26pm

Or maybe just wait for argparse (or things in general to resolve in stdlib) if it is not of critical importance?

Adding 3rd party dependency for pip for argument parsing doesn’t sound like a very good idea.

ericvsmith · October 31, 2024, 9:37pm

As @storchaka well knows, saying “just fix the issues with argparse” is not possible while keeping the same feature set. Some of its features cause, for example, the “option arguments can’t start with a hyphen” problem. I’m all for a 10 year plan to improve argparse, if devs are up for the task, but I think we should recognize that some features will need to be broken. We should probably come with a plan before we embark on this. Maybe breaking it isn’t acceptable.

notatallshaw · October 31, 2024, 9:39pm

Pip actually already vendors everything, so that probably wouldn’t be a massive burden on pip itself (but see a reply to Paul later).

As someone who contributes to pip, including vendoring PRs, I would challenge these assertions. Firstly, pip doesn’t have to vendor “everything” because it strongly relies on the standard library. Secondly, vendoring is a burden on pip for a variety of reasons, but largely because it depends on release cadences to line up and long term support of packages that also support all user platforms that CPython supports. A current example is urllib3 2.0+, which pip can not vendor until pip drops Python 3.9 support, and it is still being debated whether to drop Python 3.8 support in the next release.

That’s fine as the code hasn’t been updated in 6 years (so while updates are slow, they exist but could possibly consider the module frozen for the sake of pulling out and encouraging migrating if something were to break).

Third party packages though are much more susceptible to breaking pip than standard library ones:

Standard library breakages are more likely to be picked up alpha / beta / rc cycle
When something breaks in the standard library it only needs to be fixed for that version of Python, a third party library needs to make sure their fix doesn’t break older versions
And pip supports more versions of Python than a third party library might, for example pip currently supports Python 3.8, but a third party library might choose not to test against Python 3.8 when fixing for a new version of Python

cameron · October 31, 2024, 10:15pm

Cameron Simpson:

There’s no need for it to be third party because it’s stable: not
growing features or displaying bugs.

And yet here we are talking about it and it taking up development time.

But only because some peopel want to remove it. We’d be taking
zero dev time if people would just leave it alone. It’s 200 lines of
Python! And very stable.

I personally would argue no code is free of maintenance.

Have you looked at its blame listing?

The last change was 6 months ago for a stdlib wide minor doc tweak for
doctests. Almost everything else is Very Old.

I am not a core dev, but I am -1 (for very large values of -1) on
removing it.

pf_moore · October 31, 2024, 10:46pm

So you’re saying that we’d remove optparse from the stdlib, publish it as a 3rd party module on PyPI (owned by who?) and pip would then vendor it for normal use and distributions that devendor pip (and pip itself) would be supported by whoever it was who owned the optparse project on PyPI? There’s quite a big assumption there that someone wants to take on that responsibility. Are you volunteering?

Note that pip’s vendoring policy requires that vendored code be published as an official package on PyPI. I would not support copying the optparse codebase into pip. That would be a lot more work, as we’d have to alter it to conform to our project standards (add type hints, fix linter errors, etc., etc.). It would also require us to support the code, and that’s a commitment I’m not willing to take on (regardless of the fact that it’s not been updated in 6 years).

brettcannon · November 1, 2024, 12:07am

I’m saying code dump.

No, but no one is volunteering to really keep any of these modules going either except perhaps argparse since it’s the most widely used, and that’s partially becaues Serhiy is a dev-in-residence. As I’ve said many times over the years, I think no code takes “zero dev time”. It’s still needs to be updated occasionally, people will want to update it if a new bit of syntax comes in, it’s still getting packaged and downloaded millions of times a day, etc. And that’s ignoring that if we said all 3 modules were equally supported then we would get asked which one to use and then people would talk about the confusion caused by multiple options.

To be clear, I don’t love any of the options we have before us. But we aren’t making any of them better by keeping the status quo either. So I do think we have to do something, and my vote is not keeping all 3 modules.

ncoghlan · November 1, 2024, 1:26am

getopt is small enough and stable enough that it doesn’t need maintenance. It’s in the same boat as other small utility modules like fileinput: we wouldn’t add it today, but since it isn’t causing us (the maintainers) or its users any hassles, we don’t (or we shouldn’t) have any motivation for putting other people through the hassle of removing it. PEP 389’s addition of argparse left getopt alone for a reason, and so did PEP 594’s removal of dead batteries.

optparse is similar: its feature set of just handling options (without supporting positional arguments) means that many of the opinionated decisions that can pose problems when using argparse simply don’t come up, as those decisions are intrinsically application level choices when using optparse.

If @storchaka needs a reviewer for optparse fixes (if there are any), I’m happy to volunteer for that. It certainly sounds like a better path than disrupting users that are happy with the way argparse currently works by making it behave more like other argument processing libraries do.

nas · November 1, 2024, 3:11am

One idea is that this improved version of argparse could first live as a PyPI module. Once it has been polished and sufficiently backwards compatible enough, it can replace the stdlib argparse.

The reason to not just leave it as a PyPI package is as follows. Command line argument processing is a common enough requirement that it should be an “included battery”. Especially for smaller scripts, requiring a PyPI package install is too much. Ideally we would have one built-in argument processing library that we would recommend. optparse and getopt could become legacy (still available, but not promoted or recommended).

If we want to do the evolution in stdlib, I suggest the best way is to create a new parser class, that has the new behavior. E.g. rather than:

parser = argparse.ArgumentParser(...)

do this to get the new behavior:

parser = argparse.ArgParser(...)

For scripts that don’t care if they get the new behavior and want to run with older Python versions:

try:
    from argparse import ArgParser
except ImportError:
    from argparse import ArgumentParser as ArgParser

I don’t think removing any of the three arg parsing libraries is a good idea. Maintenance cost is non-zero for core devs but the cost in terms of breaking working Python scripts is huge. Maybe we could eventually deprecate optparse but that’s so far away its not enough worth planning about.

Edit: I should add, the main benefit of starting with a PyPI package is that you can iterate on the design faster. If you can find some friendly testers, they can give feedback while you revise the design. No need to wait for the 6-month Python release cycle.

dg-pb · November 1, 2024, 3:37am

In this case, is it necessary to bother with PyPI package?

Can just implement ArgParse (I like shorter name) in the same module.
Which ideally runs on the same backend as ArgumentParser.

Then, can just test it on optparse test suite and deprecate optparse.

First step for ArgParse would be to emulate optparse perfectly, but other features could be included later given they do not break the former.

mcepl · November 1, 2024, 9:20am

Actually, this is exactly the reason why we should remove getopt from the standard library. Even for the getopt’s afficionados as you it is obviously not enough to use, so you have your own library. How different would be your life if you just included those 215 lines of Python code to your argument parsing library? Why do you need it in the standard library at all?

pf_moore · November 1, 2024, 10:47am

Why not statically link all libraries into every C program? Why have shared libraries at all?

I don’t see why the answer is any different for C libraries than for Python libraries.

oscarbenjamin · November 1, 2024, 2:54pm

It seems odd to discuss these particular modules in the context of slimming down the stdlib. Getting any significant benefit from slimming down the stdlib requires removing much more than these modules and there are significantly better options for reducing size or maintenance cost without breaking lots of things.

If the goal for stdlib reduction is to reduce maintenance effort or size then getopt and optparse do not seem like useful candidates. On either measure it would seem that argparse is “bigger” than getopt and optparse put together.

Removing all three of these modules does not go far in terms of reducing installed footprint. The disk usage here for the stdlib is 234MB. About 127MB seems to be the test directory which I guess could be omitted (maybe that is a pyenv specific problem). I never use IDLE but idelib is 7MB. Apparently getopt.py is 8KB here, optparse is 60KB and argparse is 100KB. Altogether they are maybe 0.2% of disk usage for the stdlib if the .pyc files are included.

As Serhiy says it would have been better if argparse was never added to the stdlib because it is redundant when optparse is already there. The additional features of argparse don’t offer substantial improvement compared to optparse. Although argparse is widely used I suspect that most uses of it could just as easily use optparse and would have done if the docs did not push argparse as preferred for many years. Had argparse been a third party module it would have died by now in the face of the competition and optparse would still be happily used by those who don’t want a non-stdlib dependency.

Now that argparse is there is it really worthwhile to spend many years trying to make it compatible with optparse when optparse is already there and bug free? Why not just recommend that people use optparse instead?

If there is some reason to choose to keep only one of these because of a slimmed down stdlib then that should be its own discussion where more significant reductions in the “size” of the stdlib are considered. In the broader context of “slimming the stdlib” I doubt that any of these modules would appear as either high priority or as low hanging fruit even if there is some redundancy between them.

pf_moore · November 1, 2024, 3:04pm

I would seriously argue that no module can be proposed for removal from the stdlib unless it’s possible to demonstrate that more time has been spent maintaining the module over the last (say) 3 years than would be spent discussing the removal. Otherwise the proposal is clearly a net loss in terms of developer productivity.

davidism · November 1, 2024, 4:08pm

I appreciate that people are including Click in here. As noted already, we don’t want to continue to vendor optparse, as it doesn’t match how we actually process args, and I’m working on removing it. That said, Click is not “better” or “more correct” than any of the other CLI processing libraries, it just picked different features/patterns to support. The fact that we don’t want to depend on optparse doesn’t mean optparse is not useful. I doubt one library is going to be perceived as correct by all users. That’s sort of the problem with maintaining a CLI library, different users have different expectations, and so they make requests and you have to figure out if each fits or not. The more configurable and dynamic the library becomes, the more potential there is for unexpected combinations.

All that said, I think the current policy of “getopt and optparse are complete, they’re not being removed but they’re not being changed either” is fine; perhaps change the wording to that rather than “deprecated”. Especially as otherwise the policy is “we’re maintaining three different CLI libraries at once”, which I don’t wish on anyone.

gpshead · November 1, 2024, 4:25pm

“Maintain” is a strong word. We don’t touch their code anymore and cannot fathom a reason to need to do so. They will never be deleted because there is no benefit to anyone (including ourselves as maintainers) in doing so. I like your “complete” term.