Using email module as a replacement for CGI?

The deprecation documentation for the CGI module currently recommends two paths forward when it comes to parsing POST request bodies going forward:

  • 3rd-party library multipart
  • Standard-library email.message

I’m wondering if the latter is really appropriate to recommend, given how non-obvious and non-trivial the implementation of that seems to actually be.

I’d like to propose either removing that bit of phrasing (that suggests email be used) to avoid throwing new programmers into a tarpit, or adding a HOWTO page targeting the average person googling “python cgi deprecated how to parse file form upload 2023”.


Porting this over from elsewhere—

It would at least be interesting to better understand why using email.message is so cumbersome. Is there a way to use an email policy to make the job easier?

I would strongly advise against using the email parser for multipart/form-data for several reasons:

  • email.Message holds everything in memory, including large file uploads.
  • The line-based email parser is very slow for certain inputs. So much so that this might be abused for denial of service attacks.
  • Email is a mess, parses have to deal with broken clients, dozens of RFCs and decades of technical dept. Modern multipart/form-data on the other hand is (relatively) well defined and most modern browsers and HTTP clients emit clean and correct data. A specialized parser with stricter parsing rules is usually way faster, way less complex and as a consequence, more secure.
1 Like

I can’t argue about any of this, especially:

However[1]:

I do recall designing hooks for this problem, such that large files could be stored on disk rather than in memory. I don’t quite remember whether that’s still[2] actually possible.


  1. and my memory about the email library technical details details degrades daily ↩︎

  2. or ever ↩︎

The default behavior is still to buffer everything in-memory and most code bases I have seen that switched from cgi to email made this mistake. This is fine for emails, which usually have a reasonable size limit and are processed in batches, but dangerous for a web application that need to deal with untrusted and potentially large inputs.