Keywords and builtins in alternate natural languages like Chinese

I am aware of this statement - There should be one-- and preferably only one --obvious way to do it. So I do expect to get a bunch of downvotes. However, I hope to get some constructive feedback even if the suggestion is not welcomed. I have written some code to test adding keywords and builtins in Chinese after changing Grammar/python.gram. (Keywords and builtins are subject to change/poll)
make regen-keyword
make regen-pegen
make

I read this discuss.python.org/t/lets-make-as-alternative-keyword-for-lambda/22225/3 regarding some drawbacks of having alternate keywords. This does break some existing code, but because it’s a different script, there is less chance of collision. This could cause more collisions in languages that use a similar Latin based script. (e.g., if may mean something else in another lanugage). It might also introduce additional security concerns, but I have not investigated these implications. There are also better ways to implement what I did. (e.g., a mapping of English to a list of alternate languages while going through the rules). Last, there are probably performance implications, but this is just a test.

The diff is here: https://pastebin.com/wPN9ZA6Z

A simple unittest (which pass) is here (not comprehensive): https://pastebin.com/m73gtdgw

(I know i and j can be in unicode, but this would be how I prefer to write that part of the code)

I’m not sure if this will change usage patterns, as currently all python developers know English keywords. However, I do somehow like reading the new code.

If this (as I expect) gets rejected, is it still possible to develop this on a fork as this is (mostly) a superset of python and see if there is any interest in usage?

1 Like

A related discussion: Making Python Accessible in Non-Latin Scripts

2 Likes

The idea to promote Python to people whose progress are hindered by their lack of English familiarity is good, but I don’t think the demand is big enough (unless you can show evidence otherwise) to justify the enormous costs involved in making the keywords fully configurable (with indirect mappings) and making all the Python libraries and toolings alternative-keyword-aware because we are talking about only a few tens of English words to learn to get started after all. Maintaining your own forks of CPython and other toolings is theoretically possible but is likely going to be unsustainable if you want to keep the forks in sync with CPython all the time.

I think your best bet to making this actually happen is to write a preprocessor instead to translate all imported codes written with alternative keywords into codes with normal keywords. This preprocessor can be implemented with a meta path finder installed via a module imported by a custom sitecustomize module.

As a side note I personally find most your choices of Chinese keywords rather unfitting (such as elif being replaced with , which literally means “affirmative” and has no meaning that I know of that remotely resembles “else if”). But that’s going to be an easy fix if/when the support for alternative keywords is in place.

1 Like

Maybe not.

Related problem is using unicode identifiers. Sometimes it’s handy to translate some pseudocode literally, from the paper, instead of inventing some new names or use latin transcriptions. The CPython does NFKC-normalization while parsing identifiers. Other languages allow more (e.g. Julia).

Yes, something like I did for unnormalized unicode identifiers: Unnormalized unicode — ideas 0.0.38 documentation. André Roberge’s ideas project can be used to play with such things.

Though, I doubt if OP can embed such preprocessor with the package to be enabled automatically, on import.

1 Like

Hmm right, to make Python code truly read as if it is written in a different natural language all identifiers in builtins and stdlib need to be accessible in that language as well, and that involves several hundreds rather than a few tens of English words, which makes the effort to realize this idea slightly easier to justify because learning several hundreds of unfamiliar words is more often a non-insignificant barrier.

I suggested performing the import of the preprocessor module in sitecustomize/usercustomize so that even the __main__ module can be preprocessed. But one can always make the __main__ module a two-liner script that imports the preprocessor and then the real main module (albeit using the normal import statement).

I never found a reliable way of doing this without a wrapper executable.

What might make sense is for python to expose a reliable preprocessor hook somewhere instead of requiring workarounds.

2 Likes

Yes, but that’s already being done (via translated docs). I don’t know a single language, where keyword and/or function translation didn’t split its community.

Take ms Excel for example, where function names are set by localization setting. Which in turn forces the translated version onto user subgroup. Seeking help must be done in this translated version, or user needs to translate answer’s functions themselves. And like the “else” example, such translation isn’t direct, therefore trivial. And Excel didn’t even touch the grammatical discrepancies across the globe. How to translate a if c else b in non cryptic way?

4 Likes

Can’t IDEs do this upon opening a python file, but still save the file with English keywords? Then it only affects the way it’s displayed. And it would also work with older Python versions.

If it’s done as a preprocessor module as I suggested then the new community it forms is no more than those of many other frameworks built on top of Python, such as Jinja2, where the grammar is both a subset and a superset of Python. The users of that community will be equipped with knowledge specific to that framework to help each other out.

I think the OP was trying too hard to represent many of the English keywords with single Chinese characters, most likely in the interest of reducing the number of keystrokes necessary to write the code, when in all cases those keywords and logics can be unambiguously and uncontroversially translated into Chinese with multiple-character phrases.

For the elif example I would translate it as 否則若 where 否則 literally means else and literally means if.

In the few special cases like a if c else b where the structure of the grammar doesn’t work in another language I think it’s fine for the preprocessor to alter the order of the keywords or even come up with new keywords for the translation to read well, so I would translate a if c else b into 若 c 則 a 否則 b, where again means if, means then and 否則 means else.

Of course the result is technically a new programming language but one that can map directly to and from the original Python.

2 Likes

Thanks for the feedback. I’m thinking through some of the ideas I’m reading about.

It does seem the work having to completely support alternate native languages in the core code is quite big and it could be better to just process the file into English. But related to @Nineteendo’s point I think there would still need to be some grammar parsing involved, rather than just a display issue.

@blhsing and @Alex-Wasowicz I do agree that demand may not be big now, but I do disagree with splitting the community. I think users who want to learn English will still do, but users who prefer their native language should have an option to. I personally enjoy reading both English and Chinese and learning others. Translation should also get easier over time.

Regarding the keywords as an aside, @blhsing I agree that there could be better characters. You are right that I was just trying to see if I can find single characters to best represent the keywords without collisions. As for your specific example, I guess my thought was 如果…然而…另外 and taking the first character. You are right that 然 means the affirmative, but it does seem like it could also mean a sort of “but” in some contexts https://baike.baidu.com/item/然/5874772. I wasn’t as concerned as “saving keystrokes” as compared to a more simplistic structure, using the higher information density in a character.

To be fair, a 若 c 否則 b or a 如 c 另 b reads the same to me as a if c else b. The first two are actually easier to read due to the different scripts (it feels like it’s script highlighting).

As a native English speaker who speaks zero other languages (not counting some casual interest in Japanese) I would wholeheartedly support the ability to customise otherwise-unchangeable names (i.e. keywords and builtins), if it could be implemented in a way that’s viable in practice.

In my case, it’s because I frequently want to use a name that’s either a keyword (and therefore I must choose a different name) or a builtin (and therefore I opt to choose a different name to avoid causing confusion by shadowing). But I’d really rather just rename the keyword/builtin (to something else in English) so that I can use the most natural name for the thing in question.

Naturally, anything that would let us arbitrarily rename keywords/builtins from one English name to another English name would let us give them names in other languages too, so I think our interests are aligned here.

I’ve never raised this idea myself since I pretty much expected it to be laughed off of the table, but the use case for renaming things into other languages is much more significant.

At this point the question becomes: where should Python look for these customisations?

I don’t think it’s sensible to include them in the script itself: that’d surely be too cluttered, and you wouldn’t be able to import one from another script without an import keyword (and even if you did that, it’d be unnecessarily onerous to the compiler to have to change its customisations upon hitting that import statement). So what strikes me as a better idea is to have a separate flat file at project root, simply mapping the original names to the new names, like so: (to blindly copypaste an excerpt from your diff)

for = 為
async = 離
try = 試
while = 當
...

Hmm… a separate flat file at project root. Maybe it could go in a new, dedicated section in pyproject.toml, which the compiler would look for?

Importantly [to me], if you did rename something in this way, it would have to have the result of freeing up the old name (since my whole use case is to free up a keyword/builtin by renaming it), rather than just making both names do the same thing. Maybe there could be a way of optionally specifying whether the old name should be freed up or not, on a name-by-name basis, if you wanted the ability to keep both as aliases of each other.

Yeah to be perfectly unambiguous we should stick to multi-character phrases at all times because almost all single characters have multiple meanings. Even can mean like as an adverb even though it’s used more commonly as if and we can easily tell by context when it’s meant to be used as a conjunction, while the phrase 如果 has absolutely no meaning other than if.

Hmm in antiquated Chinese indeed you’re right that can mean but though today it has to be used in a phrase like 然而 or 雖然 to mean but. And then but is still quite different in meaning to else if so is unfit to represent elif at any rate, and I would still stick to 否則若 or 否則如果 for elif. Anyway the point is that we should for the most part stick to multi-character phrases to be clear.

I think those who are going to write Python in an alternative language are also going to name variables in that language so there likely won’t actually be much benefit of “script highlighting”.

And again a 若 c 否則 b doesn’t really follow the Chinese grammar, which is why I suggested making it 若 c 則 a 否則 b instead (or 如果 c 則 a 否則 b).

Furthermore, changing keywords is going to be the easy part. The more difficult task is to properly identify names that belong to builtins and stdlib, along with their attributes and methods. It will involve a lot of static analysis of the scope of each name, and it won’t be able to translate attribute names obtained via getattr calls or even the __dict__ attribute, though that can be documented as known limitations. That is, unless you’re willing to maintain your own fork of CPython. :slight_smile:

1 Like

Hmm in antiquated Chinese indeed you’re right that can mean but though today it has to be used in a phrase like 然而 or 雖然 to mean but . And then but is still quite different in meaning to else if so is unfit to represent elif at any rate, and I would still stick to 否則若 or 否則如果 for elif . Anyway the point is that we should for the most part stick to multi-character phrases to be clear.

As you mentioned, the keywords are the trivial parts. I think knowing that it is fairly simple to change (or add) the grammar from another natural language given what has been written in cpython has already scratched an itch of mine. (Thanks to the writers of the code!). It definitely would be optional what people want to do with it.

(Aside: keywords can make as much sense as elif anyway :slight_smile: )

Edit:

I think those who are going to write Python in an alternative language are also going to name variables in that language so there likely won’t actually be much benefit of “script highlighting”.

Right, I just meant in that particular case.

Thanks for your input. I first tried to do a substitution of the keywords for testing, and you would have to change how cpython itself is compiled since existing keywords are used to build python. In my experience it was easier to add an alternate keyword than to switch them out.

Simply translating the source code on the fly would be enough. For example, you write in your preferred language, and it gets translated to Python on the fly. You’d still be able to see the original code as well. That’s what I do when reading code written in other languages. So why should it be hardcoded?

Hello, @chemelnucfin , and thank you for your idea.

You might be interested in looking into the other times this idea has been explored. See Non-English-based programming languages (Wikipedia), and ZhPy (project website).

1 Like

That’s a plausible approach indeed, though it will limit the translation to only the very tool that supports translation. If one writes code in Chinese via a translation-enabled IDE but pushes English code to, say, GitHub, then the code/diffs as displayed on the site won’t be quite comprehensible by the author him/herself.

Thanks for the links to prior art. As can be seen from the version of Python 2.1 that ZhPy is based on, the approach of translating Python as a fork is easy to implement but difficult to keep up to date, which may actually be OK if the main purpose is education, though at some point it’s going to be so outdated that it won’t be useful even for education.

1 Like

That’s actually the approach I’ve kept talking about in this thread (translating on the fly via a preprocessor module). :slight_smile:

2 Likes