Alternative function for deprecated cgi

Pejamide · December 15, 2022, 8:31pm

To my surprise, I received the following warning in my Apach system:
DeprecationWarning: ‘cgi’ is deprecated and slated for removal in Python 3.13.

I need the function cgi.FieldStorage() for extracting parameters from each Internet request from my web system such as ‘?aaa=111&bbbb=222&ccc=333’.

What is the best alternative function for cgi?

smontanaro · December 15, 2022, 8:50pm

Check the cgi module docs. It gives you alternatives right at the top.

Pejamide · December 17, 2022, 7:53pm

That document is telling, that the function ‘urllib.parse.parse_qsl’ can be used for the requset method ‘GET’ and the function ‘email.message’ for ‘POST’. Because I’m not expert in Python. My current function contains as follows:

wrktbl = cgi.FieldStorage()
    for wrkkey in wrktbl:
      if isinstance(wrktbl[wrkkey], list):
        wrkvlx = wrktbl[wrkkey][0].value
      else:
        wrkvlx = wrktbl[wrkkey].value
      tblrqs[wrkkey] = wrkvlx

I wonder, whether anybody can convert that code for me. For instance:

import os
import urllib.parse.parse_qsl
import email.message
if os.environ('REQUEST_MODE') == 'GET':
    .......
   # How to get input and to convert to the variable table 'tblrqs'
   .......
else if os.environ('REQUEST_MODE') == 'POST':
    .......
   # How to get input and to convert to the variable table 'tblrqs'
   .......

Pejamide · December 23, 2022, 7:47pm

As for the GET request parameters, I am able to extract them from the os.environ(‘QUERY_STRING’).
As for the POST request parameters, unfortunately I don’t see in the document ‘multipart’, how to extract the POST request parameters from wheret. Can I get any help from you for it? If yes, rather give me a clear example.

Pejamide · February 1, 2023, 2:06pm

According to alternatives for parsing POST query string, I need to use email or multipart, but I do not see any example to ectract receiving POST query string. Help, who can help with that problem.

See my following examples for GET as well as for POST:
The webpage ‘Request.htm’ to be sent out for getting query string:

<h1>Form in GET method</h2>
<form action="http://localhost/PythonwebTest/getpost.py" method="get">
<label for="fname">
First name:
<input type="text" id="fnameget" name="fnameget">
</label>
<br>
<label for="lname">
Last name:
<input type="text" id="lnameget" name="lnameget">
</label>
<br />
<input type="submit" value="Submit">
</form>
<hr />
<h1>Form in POST method</h2>
<form action="http://localhost/PythonwebTest/getpost.py" method="post">
<label for="fname">
First name:
<input type="text" id="fnamepost" name="fnamepost">
</label>
<br>
<label for="lname">
Last name:
<input type="text" id="lnamepost" name="lnamepost">
</label>
<br>
<input type="submit" value="Submit">
</form>

The function ‘getpost.py’ to process received query string:

#!C:/Program Files/Python/python.exe
print('Content-Type: text/html; charset=utf-8\n')
import os
rqsmtd = os.environ['REQUEST_METHOD']
tblrqs = {}
if rqsmtd == 'GET':
  wrktbl = os.environ['QUERY_STRING'].split('&')
  for wrkarg in wrktbl:
    wrklst = wrkarg.split('=')
    tblrqs[wrklst[0]] = wrklst[1]
  print('GET request data: ' + str(tblrqs))
elif rqsmtd == 'POST':
  import cgi
  wrktbl = cgi.FieldStorage()
  for wrkkey in wrktbl:
    if isinstance(wrktbl[wrkkey], list):
      wrkvlx = wrktbl[wrkkey][0].value
    else:
      wrkvlx = wrktbl[wrkkey].value
    tblrqs[wrkkey] = wrkvlx
  print('POST request data: ' + str(tblrqs))
else:
  print('Request method \'' + rqsmtd + '\' not supported yet')

The GET part works fine. And POST part works well also, but with the deprecated function ‘cgi.fieldstorage’. I do not know how it can be replaced by function email or multipart, because I do not seen any possiblity there. Hence my help!

James_E · May 2, 2023, 9:11pm

There are problems on problems on problems here, unfortunately. I had the same confusion as you, and it looks like it is disputed by others as well (CPython #101932).

The short answer is: the email module isn’t a complete solution for parsing POST requests, and doing it yourself is harder than you expect; save yourself the headache and just grab a small 3rd-party library or a large 3rd-party framework.

As for the long answer…

email.parser.BytesParser().parse(environ['wsgi.input']) doesn’t work because the email module is expecting properly-formed e-mails that include a header, but WSGI just passes the body.
the least wasteful way I could come up with to do this was to create a BytesFeedParser, stick a header into it, terminate the header block, then stream the request body into it.
we are Formally Given Permission not to worry about chunked transfer encoding, thankfully.
- To go into detail: the major WSGI server implementations all take one of these 3 strategies: (a) reject such requests, per the spec; (b) cache the whole thing and give the WSGI app a synthetic CONTENT_LENGTH value when the time comes, also allowed per the spec; or (c) implement actual empty reads instead of just hanging when you try to consume past the end of the input, this last and nonstandard behavior being indicated by a input_terminated flag. Each of these strategies is very easy for us to operate under.
the outer package is a “multipart” message, with each message inside of it having the field value as the sub-message body, and the relevant details scattered variously around the Content-Type and Content-Disposition headers

Combining these, and handling the 3 core use-cases of (i) GETed forms, (ii) POSTed forms, and (iii) multipart forms, which is required to send files, we get something like this:

import wsgiref.simple_server
import email.message, email.parser, email.policy
import urllib.parse
import re
from collections import namedtuple
try:
    from resource import getpagesize
except ImportError:
    import mmap
    def getpagesize():
        return mmap.PAGESIZE

FieldEntry = namedtuple('FieldEntry', ["name", "value", "filename", "MIMEtype"])
FieldEntry_T = 'tuple[str, Optional[bytes], Optional[str], Optional[str]]'
#  * Field names are decoded into strings for CONVENIENCE
#  * Values are left as they came in from the wire.
#  * No need for fe.isFile -- fe.filename is None for non-file inputs.
#    BEWARE: fe.filename = '' when no file is chosen. This is what the browesr sends, weirdly.
#  * MIMEtype is 99% of the time GUESSED by the browser based on literally
#    nothing except a static lookup of the file extension in its local database,
#    so it's not usually useful. But it's not ALWAYS so useless; it can
#    be set by certain API clients in certain circumstances, so we're
#    not going to discard that information, when it is actually sent, in case
#    the application building off this function needs it.


def wsgi_parseForm(environ, /) -> 'Iterable[FieldEntry_T]':
    if environ['REQUEST_METHOD'] == 'GET':
        for k, v in _parse_qs(environ['QUERY_STRING']):
            yield FieldEntry(k, v, None, None)
        return
    m = email.message.Message()
    m.add_header('Content-Type', environ['CONTENT_TYPE'])
    match m.get_content_type():
        case 'application/x-www-form-urlencoded':
            for k, v in _parse_qs(bytes().join(_wsgi_body(environ)).decode('ascii')):
                yield FieldEntry(k, v, None, None)
            return
        case 'multipart/form-data':
            p = email.parser.BytesFeedParser(policy=email.policy.HTTP)
            p.feed(('Content-Type: %s\r\n' % environ['CONTENT_TYPE']).encode('utf-8'))
            # ^Don't try to abbreviate this line; it also injects the boundary parameter, which is needed to parse out the sub-messages!
            p.feed('\r\n'.encode('utf-8'))
            for chunk in _wsgi_body(environ):
                # TODO stream each element to the caller as they arrive
                # rather than loading them all into RAM at the same time
                p.feed(chunk)
            m = p.close(); del p
            assert m.is_multipart()
            for part in m.iter_parts():
                part.set_default_type(None)
                yield FieldEntry(
                    part.get_param('name', header='content-disposition'),
                    part.get_payload(decode=True),
                    part.get_filename(None),
                    part.get_content_type()
                )
        case t:
            raise ValueError('unexpected Content-Type: %s' % t)


def _wsgi_body(environ, /):
    # Workaround helper function for https://github.com/python/cpython/issues/66077
    wsgi_input = environ['wsgi.input']
    try:
        _read = wsgi_input.read1
    except AttributeError:
        def _read(n=getpagesize(), /):
            return wsgi_input.read(n)

    if environ.get('wsgi.input_terminated', False):
        # https://github.com/GrahamDumpleton/mod_wsgi/blob/4.9.4/docs/configuration-directives/WSGIChunkedRequest.rst
        try:
            yield from iter(wsgi_input)
            return
        except (TypeError, NotImplementedError):
            while chunk := _read():
                yield chunk
            return
        
    if 'HTTP_TRANSFER_ENCODING' in environ:
        # https://mail.python.org/pipermail/web-sig/2007-March/002630.html
        # https://wsgi.readthedocs.io/en/latest/proposals-2.0.html#unknown-length-wsgi-input
        # https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding#chunked_encoding
        raise NotImplementedError("Transfer-Encoding: %s" % environ['HTTP_TRANSFER_ENCODING'])

    toread = int(environ.get('CONTENT_LENGTH', 0) or 0)  # Weirdly, this is set to the empty string for requests with no body, hence the or-clause
    readsofar = 0
    while (readsofar < toread):
        chunk = _read()
        readsofar += len(chunk)
        yield chunk


_QSSPLIT = re.compile(r'(?:^|&)([^&]*)')
_QSPARAM = re.compile(r'^(.*?)(?:=(.*))?$', re.DOTALL)
def _parse_qs(qs: str) -> 'Iterable[tuple[str, Optional[bytes]]]':
    for p in (m.group(1) for m in _QSSPLIT.finditer(qs)):
        k, v = _QSPARAM.match(p).groups()
        k = urllib.parse.unquote_plus(k)
        v = urllib.parse.unquote_to_bytes(v.replace('+', ' ')) if v is not None else v
        yield k, v


def main_wsgi(environ, start_response):
    from pprint import pformat
    if (environ['REQUEST_METHOD'] == 'GET') and (not environ.get('QUERY_STRING')):
        start_response('200 OK', [('Content-Type', 'text/html')])
        yield '<form method="POST" enctype="multipart/form-data">'.encode()
        yield '<label for="username">username:</label><input name="username" value="user123" /><br />'.encode()
        yield '<label for="password">password:</label><input name="password" value="lol456" /><br />'.encode()
        yield '<label for="avatar">avatar:</label><input name="avatar" type="file"><br /><input type="submit" value="register" />'.encode()
    else:
        start_response('200 OK', [('Content-Type', 'text/plain')])
        yield pformat(list(wsgi_parseForm(environ))).encode()


if __name__ == '__main__':
    wsgiref.simple_server.make_server('', 8000, main_wsgi).serve_forever()

davidism · May 2, 2023, 9:30pm

There’s a few multipart/form-data streaming parsers out there, for example GitHub - siddhantgoel/streaming-form-data: Streaming parser for multipart/form-data written in Cython (Werkzeug/Flask implements its own). Probably better to use that rather than email when working with HTML form data. Terminating input is a whole other class of problem though, one that is hard to solve “completely” for CGI, WSGI, ASGI, or any other server spec. The ?a=b&c=d syntax referenced in the original post is simple though, use urllib.parse.parse_qs.

All that said, you’re probably better off using a WSGI or ASGI framework and server that’s solved this already rather than trying to write your own, if you’re not sure how to write your own.

davidism · May 2, 2023, 9:34pm

Oh wait, just noticed that you resurrected an old topic. You should open a new topic if you want to have a new discussion, what you posted is only tangentially related to the basic question asked here. I guess we both ended up saying roughly the same thing though.

James_E · May 2, 2023, 9:41pm

Terminating input is a whole other class of problem though, one that is hard to solve “completely” for CGI, WSGI, ASGI, or any other server spec. The ?a=b&c=d syntax referenced in the original post is simple though, use urllib.parse.parse_qs.

The problem of input termination may be hard at a philosophical level (2 generals problem, protocol design, etc.), but it’s one that was “solved” for many years with just import cgi.

Saying to use parse_qs is one thing, but how one is actually to get the input to that function out of a POST body reliably (which, again, is something we were able to do with no issue before 3.13) is the question—

I spent a few hours trying to go from the CGI module documentation’s vague exhortation “maybe you can use the email module somehow” to the multi-kilobyte shim I shared earlier. It took a lot of experimentation, a lot of creative labor, and I’m still not sure I did it right. If my code’s correct, I’d be happy to contribute a HOWTO page with a cleaned-up version of it to direct other CGI deprecatees to; but if (as I suspect) it’s totally doing it wrong, then I don’t see how an average person is supposed to interpret that suggestion of using the email module; maybe it should be removed?

James_E · May 2, 2023, 9:44pm

I apologize for the necropost, but (as usual with necroposts) the question as posed by the original poster was still unanswered, still an outstanding problem applicable to many, and floating up in the google search results for the question.

what you posted is only tangentially related to the basic question asked here. I guess we both ended up saying roughly the same thing though.

I provided a direct answer to the question as the original poster clarified it, without only saying “just use a 3rd-party library”… this was the immediately prior message, anyway, from the original poster:

ohlogic · May 31, 2023, 3:42pm

For Python3 cgi programming, at the website pythoncgi.net I have created mycgi.py that replaces the stdlib cgi.py that will be deprecated using multipart.

davidism · April 17, 2024, 9:08pm

Please use the Python Help category for questions.