Recently I was desperately looking for an URI normalization Python library providing syntax-based normalization (case normalization, percent-encoding normalization and path segment normalization) and scheme-based normalization, as specified in RFC 3986.
I could only find this Gist by Mark Nottingham: https://gist.github.com/mnot/246089
It was great but outdated so I updated it in this fork (Python 3, RFC 3986 compliance, Unittest framework and a few corrections): https://gist.github.com/maggyero/9bc1382b74b0eaf67bb020669c01b234
I think it could be a nice addition to the Python standard library, so I contacted Mark and he is fine with that too. More precisely, we could add the normalizing functions defined in my Gist to the urllib.parse module:
-
normalizes: normalize an URI; -
normalize: normalize URI components; -
remove_dot_segments: remove the dot-segments in a URI path component.
What is your opinion on this?