Documentation for CGI migration violates Principle of Least Surprise

gentlecolts · March 21, 2025, 3:33pm

I was recently made aware of the removal of cgi from python 3.13. My application was an application which would download content (typically images) from given URLs, and was using cgi.parse_header to determine the file name if content disposition was available. The docs currently, instead, recommend using the “email” package. My application does not have any relation to emails at all, so it is very much surprising that I would include such a package for any reason, and additionally I’m worried this functionality may break in the future. Please update the docs to provide a solution which more closely resembles the original functionality

AA-Turner · March 21, 2025, 3:49pm

@gentlecolts the documentation is maintained in the CPython repository (in the Doc/ folder). Please consider opening a pull request to make the improvements you seek.

You can find further information on contributing to the documentation in the Developer’s Guide.

Please also note, you can simply vendor the pre-removal cgi sources into your application – this may be an acceptable solution for you.

A

fungi · March 21, 2025, 3:51pm

SMTP and HTTP header parsing are very similar, due to both protocols sharing a common lineage. Using the remaining parser in the email module from stdlib is the lightest-weight solution, but if you’re looking for something less confusing you’ll need to either write it yourself or install a third-party package like legacy-cgi · PyPI instead.

If you want to update CPython’s documentation, depending on where you found this guidance (in the developer docs themselves? in the text of PEP 594?) the way to go about it will differ. Please include an actual citation first, if you’re not sure where to go about reporting your concern and proposing a fix.

gentlecolts · March 21, 2025, 3:52pm

guidance came from the docs

gentlecolts · March 21, 2025, 4:17pm

I don’t know if I’m qualified to make a pull request, as I don’t know what “best practice” would be, though I’d be happy to open an issue somewhere more appropriate if this hasn’t already been brought up

nedbat · March 21, 2025, 4:17pm

If you look at the cgi page in the 3.13 docs, it points you to a drop-in replacement you can use:

A fork of the module on PyPI can be used instead: legacy-cgi. This is a copy of the cgi module, no longer maintained or supported by the core Python team.

Rosuav · March 21, 2025, 5:17pm

The documentation is pretty clear on that point. The email package “implements the same MIME RFCs”. What you’re seeing is that HTTP borrowed some ideas from emails. The name “MIME” itself comes from there too ^[1]; any time you work with “MIME types”, you’re using something that was originally specced up for emails.

Why is it surprising to use the recommended package? Can you suggest alternative documentation wording that would help, given that the best way to parse MIME headers is the email package?

Multipurpose Internet Mail Extensions ↩︎

steve.dower · March 21, 2025, 6:07pm

It seems pretty obvious that it’s surprising because it’s named “email” and we’re using it on HTTP headers. Surprise exists because of the recommendation.

Frankly, I find it being under CGI just as surprising (I never figured out why I might want that module, and then it was gone, but I may well have used this functionality).

There’s no need to question the surprise. We’re clearly failing on the “one obvious way to do it” part here.

Perhaps part of making it less surprising is to explain why it’s in email as part of the recommendation?

Rosuav · March 21, 2025, 9:04pm

That’s true, but that question was a lead-in to the second half. What EXACTLY is surprising, and therefore, how can we word it differently?

To be precise, I’m trying to get the same information that would be put into a docs PR, without the (potentially daunting) expectations of best practices, opening issues, etc. What we need is a suggestion for better wording.

I agree. That’s the truly surprising part. But, we are where we are.

Maybe? I don’t think a deprecation note should go into a long history of what MIME means and why HTTP headers use it, though. Anyhow, I’m hoping for the OP to make a suggestion as to what would have been less surprising.

gentlecolts · March 21, 2025, 9:52pm

Networking sadly isn’t really my area of expertise. When I originally built this, I didn’t really understand the context of cgi itself, but I wanted to use content disposition if it was available for determining filename first, before falling back to the URL, as there are many websites which serve content from urls that don’t actually reflect a filename. Quick googling pointed me to a module (cgi) with a “parse_header” function in the standard, and, again without the context of cgi itself, that seemed perfectly reasonable to me and did what I needed it to. The code itself was readable and i don’t think anyone would raise an eyebrow to seeing code that is “get the content disposition header, then call parse_header”.

What’s now a little eyebrow raising is that this code gets the content disposition…then creates an email message, sets the email message header, then gets the param off what was just set. It reads like a kludge, especially since, again, this program has nothing to do with email.

From what I’ve read of this thread, I’d personally be inclined to fall back to the legacy cgi package if there’s no better solution, though I’m not a huge fan of adding an additional dependency, especially one with “legacy” in its name. Endorsement from the docs would give further confidence for this approach, especially since for myself and any others running into this, it’d be a much more “drop in” solution. Though, again, I’m not in a position to say that this is necessarily the “right” recommendation.

nedbat · March 21, 2025, 10:08pm

The docs have endorsed this as a solution for you, no? cgi — Common Gateway Interface support — Python 3.13.2 documentation

ubernostrum · March 24, 2025, 1:07am

If I understand you correctly, you’re suggesting that the documentation explain something about how HTTP and email headers have the same format and so the parsing logic that works for one also works for the other? And that would serve to explain “why it’s in email”?

steve.dower · March 24, 2025, 2:29pm

That’s more or less what I had in mind. Probably no longer than that, either - certainly doesn’t need to be a “long history”.

Wombat · April 3, 2025, 3:25am

Thank you for adding clarity to the conversation. Your insight is dead-on.