Gettext API extension

jeanas · August 12, 2023, 2:04am

Sphinx has an issue where it is running into the limitation that if gettext(msg) == msg, the application has no way to know whether msg was returned because there is no translation for msg, or because the translation for msg is the same as msg (e.g., the paragraph to translate is `PyPI <https://pypi.org>`_, which should remain the same in the translation).

The message catalog classes do support an add_fallback method, but this takes another catalog object as argument. It’s obviously intended as adding an actual other catalog and it’s not clear from the documentation that adding anything else is allowed or what minimal interface the fallback object is required to have (does it need to provide gettext? pgettext? and the like?). Neither it is clear that returning a non-string in a NullTranslations subclass used for the fallback is allowed, which would be needed to reliably detect that the fallback is being used. Another problem is that there is no documented way of unregistering the fallback, while it may not be acceptable to mutate the translation catalog, if it is passed to you by a library, for example.

The popular Babel internationalization library reuses gettext’s API for message catalogs. So does flufl.i18n.

An obvious way to solve this would be to add kwarg-only fallback parameters, e.g., gettext(str, fallback=None), where a failed lookup returns None, allowing to distinguish.

However.

Another wish I have for the gettext module is the ability to list the contents of a MO catalog. The GNUTranslations class stores the catalog content as a dict (the _catalog attribute), so this would be just a matter of doing one of these:

Make NullTranslations a subclass of dict (thereby also making GNUTranslations a dict subclass), making it expose the full mapping interface.
Expose its catalog attribute as a public attribute, or create a public method to return it.
Make NullTranslations (and GNUTranslations) iterable, yielding the internal dict’s items().

The reason I’m bringing this up in a thread about fallbacks is that this would trivially obviate the need for fallback kwargs. If you wanted to distinguish between no translation and an identical translation, you could just do catalog.get(msg) (taking the first bullet point as an example).

This should be very easy to implement.

Thoughts?

CC @AA-Turner

AA-Turner · August 12, 2023, 2:20am

We could add a NullTranslations.__contains__ or NullTranslations.has_message() API for this, which I think would be an easier compromise than making NullTranslations a dict subclass.

I originally tried implementing the feature by subclassing NullTranslations, having gettext() always return '', and adding that new EmptyTranslations class as the fallback to all translators, but that caused runtime issues as all _(...) calls now returned '', as by default English doesn’t use a message catalogue.

My preferred API would therefore be the membership test style, if possible.

A

cc: @Barry @malemburg @merwok as internationalisation experts.

storchaka · August 12, 2023, 10:22am

If gettext(msg) is not msg, it was translated, even if gettext(msg) == msg. Does it help?

The opposite is not true, gettext(msg) is msg can be even for translated message (for example for empty string or the Latin1 character), but it is hardly a problem.

jeanas · August 12, 2023, 10:38am

It does not help, sorry. As long as CPython does the small string interning optimization, this is not a reliable way to tell an untranslated message from a translated one.

jeanas · August 12, 2023, 10:46am

Could you elaborate? In what framing is it a compromise?

Being able to list the contents of message catalog is definitely something I’ve wanted to do at times. For example, I wish I could do pprint.pprint(catalog.asdict()) to quickly check that the catalog is the one I expected (e.g., it was loaded from the right directory).

However, now that I think about it, the full mapping interface may be too much because of fallbacks: if you do del catalog[msg] and msg is not in the catalog but is in the fallback, it’s not clear to me whether that should delete it from the fallback.

Maybe the best is to keep _catalog private but add a method returning a read-only mapping, like Types.MappingProxyType.

barry · August 12, 2023, 5:15pm

I like the idea of adding NullTranslations.__contains__() as a simple API for checking whether a translation catalog has a particular source string.

Note that it’s not required to implement a translation catalog as a dictionary, so I wouldn’t want to expose a full dictionary-like API or force NullTranslations to inherit from dict. It’s likely not a burden to add __contains__() though, and probably some API to list or iterate over the source strings in the catalog. For the latter, what about adding __iter__() defined to iterate over the source strings in the catalog?

jeanas · August 12, 2023, 5:41pm

Sorry, I don’t understand. What do you mean by “it’s not required to implement a translation catalog as a dictionary”?

barry · August 12, 2023, 6:00pm

What I mean is that NullTranslations essentially defines the interface for translation catalogs. GNUTranslations happens to keep the internal mapping from source strings to translated strings as an internal dictionary, but that implementation detail isn’t required by the API. I could imagine for example, an implementation which talks to some external catalog service and doesn’t use a dictionary.

storchaka · August 12, 2023, 6:04pm

If you want to know whether there is translation of the specified string, you can create a class which always fail and add it as a fallback.

class NoTranslations(NullTranslations):
    def gettext(self, message):
        raise NoTranslationsError
    def ngettext(self, msgid1, msgid2, n):
        raise NoTranslationsError
    def pgettext(self, context, message):
        raise NoTranslationsError
    def npgettext(self, context, msgid1, msgid2, n):
        raise NoTranslationsError

Or you can make it storing input strings and returning default implementation.

jeanas · August 12, 2023, 6:27pm

Raising an exception is an interesting idea, but:

In the specific case of Sphinx, the message catalog objects are shared with with all extensions (Sphinx has a large ecosystem of extensions), so we can’t make them raise exceptions on untranslated messages.

barry · August 12, 2023, 9:53pm

You can reach in and set my_translation._fallback = None but of course that tampers with the non-public API. We could codify that by defining add_fallback(None) as the way to spell removal of the fallback.

AA-Turner · August 13, 2023, 12:52pm

Due to the design of chained fallbacks I think it isn’t entirely obvious here would be which fallback add_fallback(None) would remove (the last in the chain? the fallback of the translator that the method was called on?).

I intend to open an issue and PR to add __contains__ in the next few days unless there are objections, as I think that would be a useful improvement regardless of a potential improvement to the fallback management API.

A

jeanas · August 13, 2023, 1:09pm

I object. I wanted to open the PR myself as an easy first contribution

barry · August 13, 2023, 6:38pm

There are ways to handle that. You can already add_fallback(None) but I think it essentially has no effect. It will chase the fallbacks to the last in the chain, which already has no fallback, and set it to None. Not very useful behavior.

Changing that is technical a backward incompatible change, but since the above semantics are pretty useless, I don’t know whether they need to be preserved. But if so then just add a remove_fallback() method to break the top most fallback link.

barry · August 13, 2023, 6:39pm

Just tag me in the PR either way

storchaka · August 13, 2023, 9:21pm

You can make it conditional.

    def gettext(self, message):
        if self.flag:
            raise NoTranslationsError
        return super().gettext(message)

You can make it just logging untranslated messages.

    def gettext(self, message):
        self.untranslated.add(message)
        return super().gettext(message)

orise · August 14, 2023, 8:54am

I can see the current semantics being useful if you want to add a fallback dynamically (or maybe based on configuration) with an option of adding no fallback at all.
Of course this can be done with a simple if, but there may be code that does this already and would break badly by the change.

jeanas · September 9, 2023, 7:53pm

Sorry for the delay (I was on vacation).

I’ve hit a stumbling block: should msgid in catalog return True if msgid is present in catalog but with a context? Or should we make that return False, and instead allow testing whether a message can be translated given a certain context with (msgctxt, msgid) in catalog?

jeanas · September 9, 2023, 7:55pm

Also, there is a bit of an issue with plurals since a message may have some of the forms translated and some not (but I guess that might be rare? I’ve never worked with message catalogs containing pluralized forms).