Add URI normalization functions to the urllib.parse module

Recently I was desperately looking for an URI normalization Python library providing syntax-based normalization (case normalization, percent-encoding normalization and path segment normalization) and scheme-based normalization, as specified in RFC 3986.

I could only find this Gist by Mark Nottingham:

It was great but outdated so I updated it in this fork (Python 3, RFC 3986 compliance, Unittest framework and a few corrections):

I think it could be a nice addition to the Python standard library, so I contacted Mark and he is fine with that too. More precisely, we could add the normalizing functions defined in my Gist to the urllib.parse module:

  • normalizes: normalize an URI;
  • normalize: normalize URI components;
  • remove_dot_segments: remove the dot-segments in a URI path component.

What is your opinion on this?


For URI handling nowadays, I think everyone uses the code pulled out of Twisted:


I dunno about everyone, but hyperlink’s certainly one of the better options. I may be biased though; I added the normalize() method to hyperlink, the docs of which are here:

If there’s real demand for this in the stdlib, I’d be happy to help. It can get a little contentious at times, especially around balancing the fundamental URL behaviors versus the creative interpretations browsers make.