Speed up urllib.parse

Hello. Are there any plans to speed up urlib.parse?

I mean, it’s used a lot in web apps and I feel it’s CPython’s weak side on the web for decades.

There are some efforts out there trying to rewrite it with C.
But I think a lot of people would be happy if it was done in CPython.

I sometimes prefer application/x-www-form-urlencoded over application/json, but I worry that urlencoded is slower than json, in terms of encoding - decoding.

1 Like

None that I’m aware of. Feel free to open an issue and PR with any proposals you have.

1 Like

Is there a specific part of the functionality you feel is unacceptably slow?
Do you have some timing results and/or concrete reason to expect that significant improvements are possible?

The underlying quote/unquote is my main concern, e.g. _unquote_impl.

Percent-encoding basically has to iterate over every character which is resource intensive on long strings if done with Python code.

Some libraries like websockets have speedups.c to deal with similar situations, to apply masks to each bytes.

Have you measured? Is this actually a bottleneck?

If quote/unquote is used in web applications, it can certainly bottleneck. In fact, QUERY_STRING tends to exist in every request.

Here are the performance differences between CPython vs PHP:

#!/usr/bin/env python
import timeit

from urllib.parse import quote, unquote


UNQUOTED_STRING = '=' * 10000
QUOTED_STRING = '%3D' * 10000


def measure_quote():
    return quote(UNQUOTED_STRING)


def measure_unquote():
    return unquote(QUOTED_STRING)


if __name__ == '__main__':
    # measure the time it takes to execute quote
    quote_time = timeit.timeit(measure_quote, number=10000)

    # measure the time it takes to execute unquote
    unquote_time = timeit.timeit(measure_unquote, number=10000)

    print(f'Time taken for quote: {quote_time:.6f} seconds')
    print(f'Time taken for unquote: {unquote_time:.6f} seconds')
#!/usr/bin/env php
<?php

$UNQUOTED_STRING = str_repeat('=', 10000);
$QUOTED_STRING = str_repeat('%3D', 10000);


function measure_urlencode() {
    global $UNQUOTED_STRING;
    return urlencode($UNQUOTED_STRING);
}


function measure_urldecode() {
    global $QUOTED_STRING;
    return urldecode($QUOTED_STRING);
}


// measure the time it takes to execute urlencode
$startEncode = microtime(true);

for ($i = 0; $i < 10000; $i++) {
    measure_urlencode();
}

$endEncode = microtime(true);
$encodeTime = $endEncode - $startEncode;

// measure the time it takes to execute urldecode
$startDecode = microtime(true);

for ($i = 0; $i < 10000; $i++) {
    measure_urldecode();
}

$endDecode = microtime(true);
$decodeTime = $endDecode - $startDecode;

echo "Time taken for urlencode: $encodeTime seconds", PHP_EOL;
echo "Time taken for urldecode: $decodeTime seconds", PHP_EOL;

Results (CPython 3.11 vs PHP 8.1 on Pentium T4200):

tux@host ~/pyvsphp> python py.py
Time taken for quote: 14.260578 seconds
Time taken for unquote: 100.511969 seconds

tux@host ~/pyvsphp> php php.php
Time taken for urlencode: 0.56293702125549 seconds
Time taken for urldecode: 1.9787909984589 seconds
2 Likes

This is a microbenchmark and you haven’t even posted the results. How much measurement of web apps have you done, and how much of a problem is URL parsing? I’m not saying it isn’t a problem, but we need to know how much we’re talking about here.

One way to describe the problem is to say how many requests per second you could gain if the time spent in urllib.parse were reduced to zero. So start by figuring out how many r/s you can get, then work out how much of the time spent is inside the URL parsing. As an example of that, I have a game file parser and analyzer that used to take about 80-90 seconds to process one file. It turned out that nearly all of that was the initial parse step, and maybe 4-8 seconds was the analysis; which means that, if the parsing time could be eliminated entirely, it would actually make a very real difference. (Rewriting the parser in C was therefore entirely justified.) However, what if it had been the other way? Reducing a 90 second task by 8 seconds is about a 10% improvement, and that’s definitely something, but far less significant.

3 Likes

I’ve edited the post to include the benchmark results on my machine.

I’m sorry I haven’t had time to compile comprehensive calculations if you’re expecting that.

I’m simply hoping for a performance boost for it. Not just as a URL, but as a body. So I can send slightly larger data with application/x-www-form-urlencoded.

I wasn’t expecting fully comprehensive results, but ANY result is more informative than NO result. Thank you.

So, the next thing to figure out is whether this is even meaningful. How much of a web app’s time is spent parsing URLs? And if you can, use real-world data, not strings of encoded equals signs (people don’t send that kind of thing usually).

IMO, in the wild, people can send anything.

But it seems a long way off to optimize this. I haven’t researched much about this either. Thanks for the feedback.

Yes, they CAN, which means that correctness has to be ensured even for these situations. But unless you’re worried about a denial-of-service attack, performance doesn’t matter as much for the unusual cases. In any case, all I’m saying is that a more real-world input string (one containing multiple variables, each with its associated value, only some of which need further escaping) would make a better test.

Optimizing is HARD. Don’t get me wrong, that doesn’t mean it’s useless - far from it! - but it takes a lot of work to squeeze a bit more performance out of something. And often, it means extra work forever afterwards, since the code is more complicated, harder to read, or in some other way less easy to work with. So it’s important to focus optimization efforts where it’s going to make a real difference. See my example from further above; it took me a good bit of work to make that happen (and it added code duplication), but it was worth it because it really was the bottleneck.

Ultimately, there will always be bottlenecks you CAN’T control, like network speed, database speed, or the speed of light. The first step in optimization is figuring out exactly how much you CAN control, and how much benefit that will give.

That’s completely true. I know you’re a realist and I wasn’t trying to deny it :slight_smile:

There have been some changes in 3.12 to reduce the memory use of unquote: python/cpython#96763

Also newer versions of Python are faster.
Not as big as the two orders of magnitude compared to PHP, but, for example, running the microbenchmark on a Mac, quote is 67% times faster, and unquote is 38% times faster:

❯ python3.8 1.py
Time taken for quote: 2.984720 seconds
Time taken for unquote: 15.092996 seconds

❯ python3.9 1.py
Time taken for quote: 2.953921 seconds
Time taken for unquote: 14.810386 seconds

❯ python3.10 1.py
Time taken for quote: 2.480250 seconds
Time taken for unquote: 12.271389 seconds

❯ python3.11 1.py
Time taken for quote: 2.610697 seconds
Time taken for unquote: 11.858244 seconds

❯ python3.12 1.py
Time taken for quote: 1.782201 seconds
Time taken for unquote: 11.197185 seconds

❯ python3.13 1.py
Time taken for quote: 1.783183 seconds
Time taken for unquote: 10.944934 seconds
2 Likes

For what it’s worth, the “main” part of urllib.parse, from what I can tell, looks something like this (adapted to be self-contained):

_hextobyte = None
_hexdig = '0123456789ABCDEFabcdef'

def _libparse(string):
    bits = string.split(b'%')
    if len(bits) == 1:
        return string
    res = bytearray(bits[0])
    append = res.extend
    # Delay the initialization of the table to not waste memory
    # if the function is never called
    global _hextobyte
    if _hextobyte is None:
        _hextobyte = {(a + b).encode(): bytes.fromhex(a + b)
                      for a in _hexdig for b in _hexdig}
    for item in bits[1:]:
        try:
            append(_hextobyte[item[:2]])
            append(item[2:])
        except KeyError:
            append(b'%')
            append(item)
    return res

Here I have used the name string from the original code, but it has actually been preprocessed to a bytes object at this point and various other checks have been done for fast paths etc.

So this is cutting the string up into pieces at each % sign, and then each piece except the first could potentially be a % escape (with the % removed by the splitting), so it’s looked up in a dictionary. All these pieces are then strung together into a bytearray using an imperative loop. It seems there’s been some attempt at optimization here anyway; in particular, the lookup is lazy-loaded but persistent, and res.extend is aliased to avoid repeated method lookup.

It’s still stringing together a bunch of potentially tiny pieces, that have to be created as separate objects, of course. I thought perhaps I could avoid that by using a byte regex. After realizing several corner cases, I came up with what seemed to pass informal tests:

import re

pct = re.compile(b'%([^%]{,2})')
_hextobyte = None
_hexdig = '0123456789ABCDEFabcdef'

def _myparse(string):
    global _hextobyte
    if _hextobyte is None:
        _hextobyte = {(a + b).encode(): bytes.fromhex(a + b)
                      for a in _hexdig for b in _hexdig}
    return pct.sub(lambda m: _hextobyte.get(m.group(1), m.group(0)), string)

but it’s actually slower this way. My best guess: re.sub has the overhead of creating a bunch of match objects (which need a view onto the original string), and then function calls on those are needed to get the hex code for lookup.

A slightly modified benchmark gives maybe a clearer picture:

#!/usr/bin/env python
import timeit
import sys

from urllib.parse import quote, unquote


UNQUOTED_STRING = '=' * 10000
QUOTED_STRING = '%3D' * 10000


def measure_quote():
    return quote(UNQUOTED_STRING)


def measure_unquote():
    return unquote(QUOTED_STRING)


if __name__ == '__main__':
    number = int(sys.argv[1])
    repeat = int(sys.argv[2])

    # measure the time it takes to execute quote
    quote_times = timeit.repeat(measure_quote, number=number, repeat=repeat)
    best_quote_time_avg = min(quote_times) / number

    # measure the time it takes to execute unquote
    unquote_times = timeit.repeat(measure_unquote, number=number, repeat=repeat)
    best_unquote_time_avg = min(unquote_times) / number

    print(f'quote, {number} loops, best of {repeat}: {best_quote_time_avg * 1000:.3f} msec per loop, {len(QUOTED_STRING) / 1024 / 1024 / best_quote_time_avg:.2f} MB/s')
    print(f'unquote, {number} loops, best of {repeat}: {best_unquote_time_avg * 1000:.3f} msec per loop, {len(UNQUOTED_STRING) / 1024 / 1024 / best_unquote_time_avg:.2f} MB/s')

This gives locally (AMD Ryzen 7 5825U):

$> python3.12 unparse_benchmark.py 1000 10
quote, 1000 loops, best of 10: 0.263 msec per loop, 108.59 MB/s
unquote, 1000 loops, best of 10: 1.068 msec per loop, 8.93 MB/s

I think this is roughly what @Rosuav was looking for – while it doesn’t directly give a ratio of unparse v/s full request time, it helps informing it. I’d posit that, for the most part, URL requests take much more than a millisecond to be satisfied, while at the same time they don’t contain such degenerate cases, so maintaining an optimised, C-written version of this function might not be worth it.

@nggit you mentioned in your opening message that you worry about speed. Be mindful of these results: these numbers are sub-MB/s, while json coding/decoding often runs at 100s of MB/s, if not GB/s, depending on the library you use. If you managed to optimise urllib.parse to be competitive with PHP’s implementation, you’d probably be reaching the order-of-magnitude performance you’d get now with JSON; until then you’ll be paying a hefty price.

I’m sure there’s some places it could be faster, but it’s already pretty fast and clever. Part of the reason I switched Werkzeug away from our own old copy of urllib to just using urllib directly is because it was about 35% faster (although some of that was from less encoding shenanigans left over from 2/3 compat).

1 Like

For what it’s worth, there’s ada-python, which is a CFFI wrapper around the ada URL parsing library. I’ve not done thorough benchmarking, but I expect it’s faster for many parsing tasks.

System Software Overview:

System Version: macOS 14.2.1
Kernel Version: Darwin 23.2.0

python 3.11.7

Time taken for quote: 2.896042 seconds
Time taken for unquote: 14.939389 seconds

python 3.12.1

Time taken for quote: 1.920144 seconds
Time taken for unquote: 10.274296 seconds