not 100% sure this is the right forum category but here goes. I’d like the Python ecosystem to have more accessible multilingual content. Almost all Python projects I’m involved with use the gettext module and GNU gettext as the foundation of their translations (of any user interface, user docs, contributor docs). And so far all those projects I’ve done accessibility reviews of share the same issue(s), that seem to come either from a lack of gettext capabilities, or a lack of understanding of accessibility requirements.
The issue is – projects have content in mixed languages within the one web page, without annotating what language a given word or run of text is (with the lang HTML attribute). This is a problem for users of assistive tech. For example, speech synthesizers use this information to correctly pronounce words. The words are unintelligible if they’re pronounced in the wrong language. It’s a clear accessibility fail, and also arguably an inclusivity issue in that this only affects people who aren’t using the content’s source language.
This is described in the Web Content Accessibility Guidelines (WCAG) 3.1.2: Language of Parts (Level AA):
The human language of each passage or phrase in the content can be programmatically determined except for proper names, technical terms, words of indeterminate language, and words or phrases that have become part of the vernacular of the immediately surrounding text.
In addition to the issue for real-world users, failure to meet this aspect of WCAG also means falling short of legal requirements. For example Section 508 for the federal sector in the USA, European accessibility act for some of the private sector in the EU. And many more around the world.
Examples
First off if you want a good example, I’d recommend the WCAG 2.0 French translation, for example the CAPTCHA definition. But in addition, here are examples of real-world content with this issue across multiple Python projects:
- Python 3.14 docs Glossary in French, very first item, The default Python prompt of […].
- PyPI homepage, in the footer in French, “PyPI”, “Python Package Index”, and the Blocks logos […].
- Django French docs - How to customize the
shell
command. Everything but the “Documentation” heading is in the wrong language. - Choosing a build backend - in the Japanese Python Packaging User Guide, The
requires
key is a list of packages […].
With the warning this is pretty cringe, here’s a recording of NVDA reading those French docs with mixed english on YouTube (thanks to Assistiv Labs for making this available for my project!)
What can be done
We tried to consider the options for Wagtail in early 2024 and didn’t get anywhere. We have more pressing accessibility issues to solve, but we still need to make a plan to address this one – hence why I’m here. I suspect there isn’t much specific to Wagtail here. The need is simple – add a lang
attribute wherever needed. In practice this likely means:
- Finding a way to detect whether for a given string, a translation is available or not.
- If a translation exists, great, that translation matches the language of the overall page and there’s nothing further to do.
- If the content is untranslated, determine the source language.
- In that scenario, add a
lang
attribute on an HTML element around the string, with the source language as the attribute value.
Even assuming all of the above is possible, there’s still pretty challenging aspects:
- This will bloat all UI code where content strings are sparse. For example Django templates with the
{% translate %}
template tag - imagine if every single use of that tag was preceded with a{% if %}
and language check an output of alang
attribute. - This will mean a lot more forwarding of data between Python code where translations are often defined (
_()
helper functions), meaning a lot of code changes.
Anyway. Since this seems like such a prevalent problem I’d really like to see this addressed in Python directly rather than having to do a lot of research and devise workarounds for Django or Wagtail only. I’m not sure though if this would require changes to the gettext
module, or it’s simply a matter of better official docs and community best practices. Or if this requires even bigger changes like a switch to MessageFormat 2 or similar more modern options.
But for now – I’d love to hear if others have thought about this / solved this, see examples of projects that might have solved this (in Python or other ecosystems), or just get feedback on whether people agree with my framing of the problem.