Supporting non-English development in Python

Note: Admittedly I did not search the old mailing list archives. I did take a quick peek at the search results here in Discourse, and didn’t immediately find anything relevant. If this topic has been discussed to death, feel free to gently tell me to bugger off, I won’t be hurt.

Last night, after watching the Python documentary, something kept occupying my mind, and it’s related to this thread by @AlSweigart: Accented letters in non-English function names?. When I first learned Python in 2010, I recall the language barrier being a relatively low point of friction for myself, but much higher for many of my Danish co-students at the time.

When your teacher is explaining programming concepts like “while loops” and “return types”, it adds a mental toll to recall those words in English, rather than the language you’re naturally learning these concepts in.

So to that point, in the effort of including a more diverse set of programmers to adopt Python, it may be worth considering how we can break this language barrier.

I don’t have a concrete proposal today for how we’d achieve this, but before we spend any effort trying to find a meaningful answer to that question, I wanted to broach the topic and see if it’s even something that there’s an appetite for solving for. I think it would be a great way to continue this community’s longstanding norm of trying to be inclusive of new people in our field.

I will get out of the way the obvious solution which I am not proposing: Simply translating each of the existing keywords, and making all of those reserved words in the language as well (Giving us ~100x as many reserved words, and arguable a backwards incompatible Python 4.0 release given that existing code may use those words as variables).

Some related discussions:

6 Likes

Compatibility issues aside, the bigger problem is that simply translating the existing keywords helps very little because for code to actually read well in a different natural language, all names of modules, classes, functions, methods, parameters and attributes of built-ins and stdlib if not also popular third-party packages need to be also translated, and that is going to be a monumental task.

As unfair to non-English speakers as it may seem, I don’t think it’s overly unreasonable to ask Python learners to also learn a language as popular as English at the same time, which is never going to be a bad investment of time. It’s actually how I learned English myself when I was a kid because I wanted to learn programming so much I was motivated to learn English, and that benefited me in so many more ways in my life than just learning to program.

9 Likes

I just finished watching the documentary, and would like to ask some questions for the community regarding this and I’ll give some of my thoughts at the end.

  1. Are making internationalized versions of the stdlib and builtin attributes like “_dict_” worth doing at all for any reason?
  2. Are making these versions worth doing in the core repository or in separate language packs or third party libraries?
  3. If it’s not worth doing in the core repository, should there be tools in the core repository that help other people do this?
  4. If any of these are worth doing, what would be the best way to implement any of these?

These are my answers:

  1. It’s worth doing as much as how much value is placed on having alternative natural languages. Maybe it isn’t worth it only for purely financial reasons. But for artistic/creative reasons, the answer is a definite yes.
  2. It’s probably problematic to do this in the core repository, there are at least performance, security and compatibility issues that have to be dealt with. Optional language packs might work, but then should there be superposition of the languages or a substitution of languages (or depending on what items we are talking about)? Third party libraries could work, but users might not know about it. It is also harder for third party packages to exist when the language itself doesn’t help.
  3. This is probably the best solution to have something similar to gettext. I briefly tried, but it seems like I was only able to translate quoted strings. Also having standard tools will actually make it easier for the users.
  4. Maybe some parser that outputs class names, attribute names, etc? Others will definitely give better suggestions than me.

@blhsing English has a 35 year head start in python and more in computer programming. I’m definitely not advocating for removing English, but ways to allow users to use alternative languages. People speed run certain games in Japanese and Gangnam Style was the first video to hit a billion views on youtube, etc. Maybe there will be a popular Spanish package that will be internationalized to English. It’s a monumental task, but it doesn’t need to be a sprint.

English is made up of approximately:

  • 26% Latin (direct borrowings, especially in science, law, and religion)
  • 29% French (Norman French and Parisian French)
  • 26% Germanic roots (Old English, Old Norse, Dutch, German, etc.)
  • 6% Greek (philosophy, science, medicine)
  • 4% proper names (places, people)
  • 10% from other languages combined

The numbers may vary, but the point is that, most of the time, English is already using the borrowed word, so there’s nothing new to translate.

There is a consensus in your thread that this is best solved using a translator in Editor or a similar tool. It falls outside the scope of the language itself. You can think of it in the same way that natural English has nothing inherently to do with translating into French.

Hi Elis,

I understand your point and the implications of affecting the core language. But to continue your analogy, say there’s a sentence “I want to run tomorrow”. It would be helpful if there is a tool to return verbs and nouns for creating the dictionary. If in the future you change the structure to “I want tomorrow run”, I will know how to parse the structure.

It is not anyone’s responsibility here though, which is definitely understandable.

Actually, I think we were mostly in agreement. The transpiler in my thread does some of the work (but not all), the question now is if something like that should be in the standard library? You seem to say no, which is also understandable.

Disclaimer: English is my first language and I am well aware that this is a privilege/luxury with respect to programming.

I do not believe that this is the type of project that should be pursued within Python. I’m not even sure if any language can support it without it being an explicit design goal of the language. There may be interesting possibilities for transpilation.

The issue I see is that keywords are only a very small part of the language as it is used. The full context in Python covers at least:

  • all keywords
  • all built-in symbol names (dict, isinstance, None, etc)
  • all module names in the stdlib
  • all symbol names in all stdlib modules
  • all attribute names thereof, including dunders (dict.items, dict.__getitem__, etc)
  • all third party package names (numpy, attrs, etc)
  • all module names thereof
  • all symbol names thereof
  • all CLI entry point names (pip, yaml2json, etc)
  • all CLI option names (--extra-index, --output, etc)
  • all special string literals accepted by functions (e.g., open(name, 'rb'))

For simplicity, consider the “language” of Reversed-English. Would the following be acceptable?

morf frozendict tropmi frozendict

But maybe the word order is not appropriate here for Reversed-English. Maybe Reversed-English uses post-position particles, not prepositions. Should it not be this?

frozendict tropmi frozendict morf 

Maybe we can and should overlook such details. But even so, are we satisfied with this?

rof (yek, eulav) ni tcid(atad).smeti(): ...

rof (yek, eulav) ni frozendict(atad).items(): ...

This is why I think support for third party packages, and their names, is somewhat essential. frozendict is meant to “rhyme with” dict. A system which breaks that symmetry may actually do harm to users.

And I don’t think we can back off from handling names and attributes from builtins and the stdlib. Many of the mnemonic names like all only make sense in a language specific context. I would find a keywords-only solution extremely dissatisfying.

Once we add in the CLI context, this becomes extremely difficult. So maybe omit the CLI space, at least at the start. And I don’t even know what to suggest for names like yaml2json, which is a pun.

If you can design a workable and pluggable transpiler which takes translations for 3rd party packages, I think that would be really cool. But I don’t think that’s a project which should be considered for the language itself.

My summary opinion on this: although a transpiler could be a very fun project, I doubt it would be really useful. If we want to work on internationalization, why not put effort into documentation? That’s a well understood need, and many Python projects have English-only documentation. Most projects don’t even have the knowledge to bootstrap contributions to translated docs. Direct contributions or meta-doc as guidance could be very valuable here.

5 Likes

Nice point about word order, but a language with a fixed word order is easier for foreigners than one that supports all six word orders.

All of these make perfect sense to an Albanian speaker, who can naturally translate them accurately, preserving their meaning while capturing subtle stylistic nuances:

SVO – I eat apples
SOV – I apples eat
VSO – Eat I apples
VOS – Eat apples I
OVS – Apples eat I
OSV – Apples I eat

For example, ‘Apples eat I’ would be translated as ‘Mollën [e ha] unë’ = ‘I eat apples,’ not ‘Molla [më ha] mua’ = ‘Apples eat me.’ Note that ‘Unë ha mollë’ translates as ‘I eat apples,’ although ‘mollë’ is singular. Using the plural form, as in ‘Unë ha mollat,’ means 'I eat the apples.

In contrast to Albanian, English is missing the synthetic features that Old English had, which would have allowed all six word orders to convey the same meaning in English as well.

Who wants Old English in the source code?

3 Likes

Hi Elis,

I think you misunderstood what I said, and in thinking of examples, I thought of a use case which already can happen in existing python code.

Say I wrote this and you want to use it:

class 動物:
    def __init__(self):
        self.名字 = '名字' # notice the attribute is here

class 貓(動物):
    def 叫(self):
        return '喵叫'  # notice the function name is in here

class 狗(動物):
    def 叫(self):
        return '㕵叫'  # and here

But maybe you want it to read like this?

class Animal:
    def __init__(self):
        self.name = '名字'

class Cat(Animal):
    def speak(self):
        return '喵叫'

class Dog(Animal):
    def speak(self):
        return '㕵叫'
    

and maybe I want to write it like this:

類 動物:
    函 __始__(吾):  # 吾 - self, but less collisions
        吾.名字 = '名字'

類 貓(動物):
    函 叫(吾):
        回 '喵叫'

類 狗(動物):
    函 叫(吾):
        回 '㕵叫'

Would you prefer to write

c = Cat().speak() or c = 貓().叫()?

And let’s say one day I want to deprecate the Cat speak function, would you like the English code to deprecate the change also?

Again, I don’t think it’s a priority, but just want to point out that the problem could exist now.

In my earlier message, what I meant was that I’d like to know which 叫 (verb) and 名字 (noun) to translate.

Why should we do that? You should use state-of-the-art algorithms, such as LLMs, for translation and transliteration, because addressing these tasks was a key motivation behind their development.

You can try translating the code above using any popular large language model, and you will get error-free Python code.

I mean that someone could write Python code in another language, then translate it into English to debug the actual Python code. It’s similar to writing in a low-level programming language and checking the assembly code for bugs. This approach ensures there will always be a single source of truth.

1 Like

:+1: That works too!

By the way, in case anyone is offended, I did not mean to be political and I hope no one saw or will see it that way. Please feel free to substitute the awesome language of Reversed-English above in my example.

1 Like