Urlparse() can sometimes raise an exception. Should it?

For (2), we do document that urlparse raises a ValueError in some situations. I doubt we exhaustively list all situations and do not think we should try to do so. Perhaps all you’re asking is that this mention of an exception be pulled further up in the doc? I’m would not personally assume a Python API does not raise something like ValueError when given an unreasonable value even if not explicitly documented. Something not willing to raise an exception at all should document itself as such and have related regression tests if that is an important API trait.

We also state that urlunparse((urlparse(value)) is not guaranteed to return the original value. It is unreasonable to assume that it does. We specifically document that it may not return something identical to value if attempted.

We cannot do (1) and should not try. Loosening the API will hurt many existing applications depending on what little non-guaranteed validation it already does perform. ValueError makes sense for things that do not appear to be valid URLs. Not doing so and should always return a ParseResult even on non-sense input or crafted malicious input are fundamentally at odds with what I’d call secure API design best practices. Always returning would basically be telling each and every user that “you’re on your own, you all need to anticipate everything malicious and every possibly way it might pass through our internal parsing implementation intentional or not and reinvent your own validation logic and repeat all of security bugs in your application”. While we already advise people to check the results, going further down that path and doing less for them is not good for the world.

We’re realistically stuck with these URL related APIs. Not breaking existing users is our top priority unless the existing use is an outright security failure in the common widely used actual use cases from people’s existing application and library code. So we tighten things a bit when feasible given the non-designed legacy 1990s implementation or url parsing and public APIs that get both widely used and abused that this code sits upon. It isn’t even always feasible.

Some previous relevant discussions to be aware of and a huge pile of existing bugs.

If someone wants a URL parsing library with majorly different behavior or design, they’re best off doing that on PyPI.

7 Likes