Mimetypes: Allow to override media type to file extension mapping

It seems currently it is rather hard to override mimetypes.guess_extension result despite mimetypes.init docs says

Each file named in files or knownfiles takes precedence over those named before it.

I suggest to allow overrides for mapping of MIME types to file name extension. I am realizing that this change may be a breaking one for some users and I would be happy if somebody will propose a better solution.

My idea is that MimeTypes.add_type call should give priority to the passed extension, so that next guess_extension returns it. In addition, fallback=False argument may be added to add_type and read methods to add association only if there is no current mapping.

I have a patch, but it is recommended to create an issue for each pull request and feature request should be discussed at first.

Consider the following case. At first glance the issue may be considered as a rather specific one and applicable only to combination particular Python version and Linux distribution. It can not be reproduced with Python-3.12 (#97646), but since IANA media types registry receives updates, I anticipate similar troubles in future.

Debian 12 bookworm was released with Python-3.11 having

'.js': 'application/javascript'

mapping. On the other hand the media-types package contains

/etc/mime.types:text/javascript                                 es js mjs

due to changes recommended in RFC9239: Updates to ECMAScript Media Types (https://datatracker.ietf.org/doc/rfc9239/)

The result of different pace of registry changes adoption is

python3 -m mimetypes --extension text/javascript
.es

It can not be changed to “.js” by calling mimetypes.init with a custom mapping file. Or by an additional call of add_type. From my point of view, it is contradiction with the documented behavior.

The ultimate goal is to make guess_type and guess_extension mapping reciprocal and to avoid at least for canonical mappings

python3 -m mimetypes 'test.js'
type: text/javascript encoding: None

python3 -m mimetypes --extension text/javascript
.es

For the code see diff at GitHub.

P.S. I hope, I would not be considered as a spammer due to posting 3rd link in this thread.

Is this an issue to take up with debian?

So the problem is that if there are a set of extensions then how do you pick a preferred extension from the set?

I think, both the Debian package and the Python module should be improved.

On the Python side API should have means to override any mapping and it
should not matter what is its source: _types_map_default,
/etc/mime.types, Windows registry.

.es for text/javascript is just a real life example. Such case are
unavoidable in future and I would prefer to have a reliable workaround.

It is unclear if earlier or later lines have higher priority in the case
of ambiguity and the same is applicable for multiple extensions in the
record for a media type.

The proposed change gives priority to later lines and to first extension
in the record. It may be discussed with maintainers of mime.types in
Debian and Fedora projects.

Anyway behavior of current code is rather inconsistent: earliest pair is
taken for guess_extension, latest one for guess_type.

The current documentation of mimetypes.init:

Each file named in files or knownfiles takes precedence over those named before it.

contradicts with the documentation of mimetypes.MimeTypes.add_type:

When the extension is already known, the new type will replace the old one. When the type is already known the extension will be added to the list of known extensions.

As we know that mimetypes.init internally calls mimetypes.MimeTypes.read and mimetypes.MimeTypes.readfp, and at last calls mimetypes.MimeTypes.add_type repeatedly, it seems that the documentation of mimetypes.init should be revised.

I wonder that mimetypes.MimeTypes.read, mimetypes.MimeTypes.readfp, and mimetypes.MimeTypes.add_type are probably mimicking the behavior of other applications that consume mime.types-like files. It would probably be better to first survey how those applications, such as Apache and nginx, deal with duplicated mappings.

For better interoperability with other applications and to prevent breaking downward compatibility, I don’t think we should change the behavior unless other applications do so. Adding a new parameter or method to support an alternative behavior is OK, though.

1 Like