I’m working on a project, which aims to remove language barriers in coding. Python, like most programming languages, is based on English and the Latin script, which excludes billions of people who don’t use these systems.
The goal is to allow developers to write Python in their native script while ensuring full compatibility with standard Python codebases. Some ideas include:
A syntax mapping layer that allows code to be written in different scripts.
A function aliasing system for external libraries and tools.
Translating error messages and documentation into different languages.
I know this isn’t an easy problem, but I’d love to hear thoughts from the community. Has this been attempted before? Would anyone be interested in exploring solutions together?
In its base form, yes. But there have been a number of Python variants which make small adjustments (such as one-to-one keyword replacement) which can be directly runnable. Probably the biggest of these would be Chinese (there’s just a lot of people there), but I know it’s been done for other languages too.
Surprisingly, it isn’t - or at least, simple parts of the problem aren’t. You could fairly straight-forwardly write a translator that is aware of only a very few syntactic constructs (quoted strings, comments), and outside of those, does a fairly naive transformation from one keyword to another. The result would then be a perfectly runnable Python script. But there are a few harder problems to solve:
Are all your language’s characters valid in a Python identifier? They normally will be, but it’s worth checking.
If your language is written right-to-left (Hebrew, Arabic), what does that do to indentation?
What about libraries - both the standard library and ones you might import?
Unfortunately, there’s no way you’ll be able to truly translate the entire Python ecosystem into another language. But you may be able to get a fairly long way by first creating a keyword translator, and then creating wrapper libraries. Some wrappers will be easier than others, though.
To clarify, my goal isn’t just about using non-Latin variable names or translating error messages. It’s about enabling full programming in native scripts. That includes:
Translating keywords (e.g., def, if, else, for) into native scripts.
Allowing built-in functions (print(), len(), etc.) to have native script aliases while staying compatible with standard Python.
Ensuring that external libraries and APIs can be used seamlessly without requiring English-based syntax.
Essentially, the idea is to make Python readable and writable in any language while maintaining full compatibility with existing Python codebases.
You’re absolutely right. Basic keyword replacement is the easy part, and it has been done before. But the real challenge is making Python truly usable in a non-Latin script while maintaining full compatibility with existing codebases and libraries.
Some key questions I’m exploring:
Libraries & Imports: Wrapper libraries could help, but have there been any serious efforts to make Python’s standard library functions callable in native scripts?
Right-to-Left Scripts: For languages like Arabic or Hebrew, do you see feasible workarounds for handling indentation and syntax alignment?
Would love to hear any thoughts on these! Also, if you know of any past projects that attempted library translation or ecosystem-wide support, I’d love to study them.
A worthy goal, but extremely difficult. For example:
Maintaining compatibility when making a wrapper around the sys module means handling assignment to sys.displayhook. That’s some extra complication right there.
And if you want to maintain full compatibility, every single method and attribute has to be simultaneously accessible via its original English name and also its translated name. Again, not too hard in the simple cases (you create properties that forward everything to the original name), but there are going to be edge cases. What’s in an object’s __dict__ ? What happens when you iterate over that? Do you see the original or the translated version in help(obj) ? Etcetera.
You can probably get away with a cut-down job if the purpose is to introduce people to the language, but if your goal is to let people have the full power of Python while never once reading or writing an English word, it’s an incredibly daunting task.
Even for native speakers of English, there is some learning curve to understand what the keywords actually do. print does not require you to own a printer, and lambda being a Greek letter has nothing to do with what it actually does. Sure, an English speaker does not need to learn what if means, but you do not need to be able to speak English to learn its meaning.
For beginners, some translated Python-like language would be fine. But if you want to do more serious work, you will need the real Python. At that stage, you will probably also need to be able to read English, since most libraries don’t have translated documentation.
That is very true, but it’s worth remembering that you and I have keyboards that can easily and conveniently hammer out Latin letters. I’d like to hear from people whose native languages involve other alphabets, and how they go about writing code - does it involve switching keyboard modes, or is there a convenient way to enter those keywords? (Unless, as is common, they write code in the Latin script.)
Computer users in non-Latin-alphabet-speaking areas have a Latin keyboard layout configured, and switch between them as needed. E-mail addresses are effectively Latin-only, and most URLs are too, so you can’t use a computer without the Latin alphabet.
I see your point that learning the meaning of keywords like print() isn’t necessarily harder for non-English speakers since even native speakers need to memorize their function. That makes sense. But given that aliasing keywords in native scripts wouldn’t change functionality, do you think adding them would really be an obstacle?
You mentioned that people already have to switch to Latin script for emails and URLs - which is true. But does that mean we should keep that as the standard, or should we explore ways to make coding more seamless for those who prefer their native script?
Your point about keyboard layouts is really important. I’m in discussions with programmers in Sri Lanka to hear their real-world experiences. How they type code and whether switching layouts is a barrier for them. If anyone here codes in a non-Latin script language, I’d love to hear how you handle this. Do you switch keyboards, or have you fully adapted to Latin script for coding?
The thing to focus on is the target audience. If this is about teaching people the basics of programming, then a Python-derived language would be a fine way to do it. But if this is to be more serious, this will end up building silos and making it harder for people from all over the world to collaborate, and there will be limitations of the translation that people will hit, sooner or later.
There might already exist programming languages aimed at teaching programming that use a your desired natural language for their syntax.
I don’t code in a non-Latin script language, but I learned English by reading The C Programming Language, so it was worth the effort. Translated books often feel out of touch.
I continue to work exclusively in English, even for documentation, comments, and everything else. The cognitive overhead of translating and rewriting from English to my native language is too high.
There are precedents of programming languages supporting multiple natural languages, but I don’t see how that would help if you can only replace the programming language’s reserved keywords. Everything else would still be written using Latin scripts, including the standard library, almost all external libraries, the Python implementation, and more.