Add a URL builder class to urllib

Currently, when building URLs using the Python standard library, it is necessary to combine functions such as urljoin and urlencode and define a custom helper class. This requires additional boilerplate and effort.

from urllib.parse import urljoin, urlencode 
base_url = urljoin("https://example.com", "search") 
params = {"q": "python", "page": 1} 
full_url = f"{base_url}?{urlencode(params)}" 

I propose providing a dedicated class that makes it easy to construct URLs in a clear and structured way.

1 Like

Would you be able to provide an example of how such a class might behave?

I’m no urllib expert, but either of the following seem pretty straightforward to read/write:

from urllib.parse import urlunsplit, urlencode
full_url = urlunsplit(('https', 'example.com', '/search', urlencode({"q": "python", "page": 1}), '')

or

from urllib.parse import SplitResult, urlencode
full_url = SplitResult(scheme='https', netloc='example.com', path='/search', query=urlencode({"q": "python", "page": 1}), fragment='').geturl()
1 Like
import urllib.parse

urllib.parse.build(
    scheme="https", 
    userinfo=None, 
    hostname="example.com", 
    port=None,
    path=["search"],
    params={"q": "python", "page": 1}
).get_url()

For example, I propose adding a new function like this.
The return value could be one of the existing classes.

httpx ships with a URL class that I find incredibly handy. I use that library over requests in a lot of projects because of that class and the builtin async client support.

Their implementation is in pure python, and I think it could be a good model for anything like this that would end up in the standard library.

4 Likes

Indeed SplitResult (and ParseResult) can be constructed manually and that’s what I use for now. But I think there’s value in a dedicated urllib.parse.build function that also takes care of percent-escaping and related transformations intelligently.

1 Like

There is a big problem that I often encounter, at least 2 kind of third party URL classes are in common use:

  • yarl.URL
  • httpx’s URL class

These are not interchangeable, yarl cant be used with httpx for example. So I propose at least to include a Protocol - URLLike (name derived from os.PathLike just as example), to ensure some kind of interoperability. This can be done via a third party package, but a minimum standard is far more likely to get used if it comes from a central authority. It also is less likely to causes the xkcd 927 situation if a basic URL protocol is defined in the stdlib.

It probably only has to be as complex as os.PathLike aka just one method that returns a string version of the URL object, libraries can then just use their own class internally.

1 Like

I don’t think a protocol really solves the problem. URL implementations do things slightly differently. On the other hand, using str still leaves parsing and unparsing each time the URL crosses an API boundary, with all the performance and incompatibility penalties.

Ideally if we were to start all over again, there should be one and only one fully-tested, standards-conforming implementation (the industry standard now is the WhatWG URL spec) provided by the standard library or as a first-party upgradable package. This may still be possible if there is enough interest to pick an existing implementation (or write one) and maintain it (and soft-deprecate urllib.parse, and discourage third-party implementations to deviate from it unless with very good reasons). The significance is very much here though: URLs are so critical that users deserve a better implementation that can be shared by different packages. The same story goes for datetime which should see a modern, calendar-aware replacement similar to ECMAScript’s Temporal[1].


  1. 1 Which I have tried implementing in Python but it is just very complicated to implement. ↩︎

2 Likes

I apologize for the delayed response due to illness.

In this proposal, I would like to focus first on improving the URL construction API. To avoid ambiguity in behavior, I propose aligning the implementation with the WhatWG URL specification.

As an initial step, and in order to limit the scope of impact, I suggest introducing a function tentatively named build. The goal would be to allow it to coexist with existing types such as ParseResult, enabling incremental adoption with minimal disruption.

In the longer term, if appropriate, these capabilities could potentially be integrated into a dedicated class.