Wouldn’t you like to write your i18n code using gettext like this?
msg = _(t'Hello {name}')
msg = ngettext(t'{n} snake', t'{n} snakes', n)
flash(_(t'Category "{cat.title}" moved to "{target_cat.title}"'))
instead of this?
msg = _('Hello {name}').format(name=name)
msg = ngettext('{n} snake', '{n} snakes', n).format(n=n)
flash(_('Category "{}" moved to "{}"').format(cat.title, target_cat.title))
Because the latter isn’t super pretty to begin with and prone to errors such as calling .format() inside the gettext call instead of outside, or using positional multiple placeholders (and some languages may require a different order).
Now to the tricky part: How to convert the expressions from the t-string interpolation to names that are suitable in format strings and translations.
My proposed way to solve this is to derive a format string field name from the expression, and reject anything that’s too complex or too dynamic. My current implementation covers simple cases such as plain variables, accessing attributes and items, and calling a method w/o arguments:
PS: Maybe an interesting fact: A less powerful version of this exists in Jinja2 templates for a very long time - it lets you use plain variables inside {% trans %} and uses the variable name during extraction, but the value during translation.
Yes, it was intended as a bad example that is avoided altogether when using t-strings: “or using positional multiple placeholders (and some languages may require a different order).”
The easiest way to do this correctly (w/o t-strings) would be to simply use named placeholders (or explicit positional placeholders ({0} and {1}), but those would easily confuse translators)…
We explored the use of t-strings for i18n back when PEP 750 was under development. We determined it was not a good fit and was outside the scope of t-strings, both of which are totally fine. The right tool for the job, IMHO is my library flufl.i18n which was originally developed for GNU Mailman, still used there, but useful in any other context. It’s built on top of stdlib Template strings which uses $-syntax which is much more friendly to translators than %-strings[1]. GNU gettext supports Python directly, so that toolset is well designed for use in i18n’d Python code.
and was developed exactly because translators understandably got %-strings wrong in translations ↩︎
What made it not a good fit? Simply the fact that more complex expressions would not work well in there?
and was outside the scope of t-strings
Keeping it outside the scope of PEP750 made perfect sense IMHO. Doesn’t sound like a good reason against adding it independently from this…
The right tool for the job, IMHO is my library flufl.i18n
At least in the Python webapp world I have yet to see any major webapps that use anything but Babel… So realistically, pretty much everyone uses something else.
A quick search on GitHub also confirms this: 32 matches for flufl.i18n vs over 125k matches for babel in common Python dependency files
and was developed exactly because translators understandably got %-strings wrong in translations
Yep, they do. But regardless of the syntax, the main mistake I see is people translating placeholders…
In any case, applications like transifex (and probably weblate as well?) tend to highlight placeholders separately to make this less likely to happen.
And in fact, my proposed solution using t-strings actually avoids all the problems that come with more complex format placeholders, because only the name is part of the extracted strings - any format/conversion specs live only in the code, and thus translators never see those. So from the simplicity level it’s literally $foo (w/ flufl.i18n / string.Template) vs {foo} (w/ t-strings). Both looks equally easy to get right (or wrong) for translators… At least babel also adds python-brace-format to the pot file metadata for these strings, so tools used by translators know what kind of placeholders to expect.
You may want to compare what is possible with t-strings and what fluent does to be capable of handling languages that don’t map 1:1 with English and how it seperates what developers need and what localizers need better than extracting strings from source code in any format. I would not recommend t-strings here, nor would I recommend any large project go with a gettext based solution for new localization (yes, don’t throw away the existing work if you have already managed to get your app localized)
Yeah, gettext is not as powerful as you sometimes need for perfect translations since you can’t take into account gender, etc. But realistically it’s still what nearly everyone uses… I don’t think it’s really in scope here?
FWIW I completely agree that t-strings make no sense for this type of translation, where you usually just use some identifier for a message, and have the message w/ all the details, conditions, etc. outside the Python code.
Okay, more directly: Even if there weren’t other issues[1] with t-strings for i18n that make it worse than gettext in it’s current form, I wouldn’t think it’s worth changing gettext to support this, partly because it would be ideal if we weren’t pushing users towards gettext-based solutions.
To me, this is what makes this proposal better for a PyPI package than stdlib. There is no single obvious answer, and in the stdlib we only have one chance to do it right. A package on PyPI, on the other hand, can be much bolder in picking something that works, change later (with backwards-compatible improvements that don’t require a new Python version, and incompatible changes that won’t hold people back from upgrading all of Python), or even offer customization.