Currently, when building URLs using the Python standard library, it is necessary to combine functions such as urljoin and urlencode and define a custom helper class. This requires additional boilerplate and effort.
Indeed SplitResult (and ParseResult) can be constructed manually and that’s what I use for now. But I think there’s value in a dedicated urllib.parse.build function that also takes care of percent-escaping and related transformations intelligently.
There is a big problem that I often encounter, at least 2 kind of third party URL classes are in common use:
yarl.URL
httpx’s URL class
These are not interchangeable, yarl cant be used with httpx for example. So I propose at least to include a Protocol - URLLike (name derived from os.PathLike just as example), to ensure some kind of interoperability. This can be done via a third party package, but a minimum standard is far more likely to get used if it comes from a central authority. It also is less likely to causes the xkcd 927 situation if a basic URL protocol is defined in the stdlib.
It probably only has to be as complex as os.PathLike aka just one method that returns a string version of the URL object, libraries can then just use their own class internally.
I don’t think a protocol really solves the problem. URL implementations do things slightly differently. On the other hand, using str still leaves parsing and unparsing each time the URL crosses an API boundary, with all the performance and incompatibility penalties.
Ideally if we were to start all over again, there should be one and only one fully-tested, standards-conforming implementation (the industry standard now is the WhatWG URL spec) provided by the standard library or as a first-party upgradable package. This may still be possible if there is enough interest to pick an existing implementation (or write one) and maintain it (and soft-deprecate urllib.parse, and discourage third-party implementations to deviate from it unless with very good reasons). The significance is very much here though: URLs are so critical that users deserve a better implementation that can be shared by different packages. The same story goes for datetime which should see a modern, calendar-aware replacement similar to ECMAScript’s Temporal[1].
1 Which I have tried implementing in Python but it is just very complicated to implement. ↩︎
I apologize for the delayed response due to illness.
In this proposal, I would like to focus first on improving the URL construction API. To avoid ambiguity in behavior, I propose aligning the implementation with the WhatWG URL specification.
As an initial step, and in order to limit the scope of impact, I suggest introducing a function tentatively named build. The goal would be to allow it to coexist with existing types such as ParseResult, enabling incremental adoption with minimal disruption.
In the longer term, if appropriate, these capabilities could potentially be integrated into a dedicated class.