Python 101: enabling a restricted subset of Python

aroberge · January 5, 2022, 9:54pm

Summary: I propose that a new compile time directive be available to restrict the Python syntax to a strict subset. This would facilitate the teaching of Python to beginners as well as the work of people that write tools intended to help beginners learning Python.

= = =

Note: in what follows, I refer to text (blog posts, tweets, comments on mailing lists) by various people. In doing so, I may not accurately represent their opinion. Any such misrepresentation is definitely not intentional.

~ ~ ~

Since its creation, the popularity of Python has been steadily increasing. According to some measures, such as the Tiobe index, Python is now considered to be the most popular programming language in the world. For many years, Python has been a favourite first language to teach to beginners, even famously replacing Scheme as a first programming language taught at MIT in 2009

However, Python is also becoming more complex, as can be seen by reading this post by Brett Cannon. As more and more features are added, Python becomes harder to learn for beginners. To be fair, Brett Cannon does point out in his post how the introduction of advanced features and of more helpful features to beginners can sometimes be interrelated. However, I would argue that it might be possible to decouple the two as described below.

One of the first concepts one encounters when learning programming for the first time is that (most) languages have reserved words (keywords) that cannot be used as names for objects in a user’s program.

With the introduction of Stuctural Pattern Matching PEP 635 (see also PEP 636) Python introduced two “soft keywords” (namely ‘match’ and ‘case’) that can be thought of as “context dependent keywords”, something that is not as straightforward as the idea of reserved words.

Due to the addition of these soft keywords, parsing Python code became much more complex and required a change to a new type of parser (PEG parser) which even experienced programmers find challenging to master.
Bolz-PEG

This ever-increasing complexity can make it more difficult to create and maintain tools designed to help beginners understand what went wrong in their program or to help them avoid errors. In addition to well-known linters, such as flake8 and pylint, such tools include Pyta, Pedal, the editor Thonny with its integrated “Assistant” which can be further extended using error-explainer, friendly, and undoubtably many others I am not currently aware of.

While there is no doubt that many more features will be added to Python in the future, I propose that a subset of Python’s syntax, which I will refer to as Python 101 hereafter, be identified and made available via a directive: with a Python 101 mode activated, any syntactical construct outside of this restricted subset would give rise to a SyntaxError.

Something like Python 101 would flag advanced syntax as invalid and help educators teach their students the basics of Python programming before moving on to more advanced topics.

Like Greg Wilson wrote

It’s easy to say, “Just ignore decorators and async I/O and the := operator in class,” but that’s disingenuous. Newcomers will bump into these things as soon as they search online for help, because they actually are helpful for people who are programming in the large; that’s just not my use case.

The idea of having a reduced syntax for a given language as a teaching help is not new. For example, Racket lists five different language subsets designed to gradually introduce different concepts in a teaching context. As an aside: in addition, Racket includes other dialects such as a version that includes type annotations.

I do not propose to go as far as Racket, but simply to have one reduced subset of Python as a useful tool in a teaching context.

Proposed syntax to enable Python 101

With its from __future__ import ..., Python has already a mechanism in place for conditional parsing of different syntactical constructs. So, from that point of view, having a restricted subset of Python’s syntax recognized as valid is not something that would be strictly speaking new.

However, instead of requiring an actual “normal” line of code to be added to a program to restrict its grammar, I suggest that a directive enabled by a top comment, like what is done for specifying an encoding, would be preferable. Such a directive could look as follows:

# syntax: Python-101

While beginners are taught that “comments are ignored by Python”, this is certanly not the case for encoding declaration. These encoding declarations are one of the first topics covered in the official Python tutorial. I believe that having a second small inconsistency would be worthwhile as it would enable people writing books and tutorials today to tell beginners to write such a directive at the top of their program, letting them know, while this comment has no effect on existing Python versions, that future versions of Python will make use of this comment.

What to include in Python 101

Even for those that agree with the idea of enabling a restricted subset of Python available via a directive, I suspect that there might be significant difference of opinions as to what should be included in Python 101. Below, I simply offer an opinion. I am much less attached to any individual suggestion mentioned below than to the principle of having something like Python 101 made possible.

As a first principle, given a specific Python version (say Python 3.12), any program that is syntactically valid with a Python 101 mode enabled by a top comment directive, should remain so and have an identical runtime behaviour if that top comment directive were to be removed. This means that no additional syntactical construct, such as having a repeat keyword as is done in TigerJython or my own Reeborg’s World. While it might be pedagogically useful to have additional keywords such as nobreak or noexcept instead of using else in loops or try/except blocks, such keywords should not be part of Python 101 unless they were valid in a “full” Python version.

As a second principle, there should be no ambiguity as to what a keyword is. Thus, Python 101 would not include context-dependent (soft) keywords such as match and case .

Going through the first 10 sections of the official Python tutorial, prior to the quick tour of the Standard Library, I did not see any mention of the assignment expression operator, :), introduced in PEP 572 Given the risk of confusion with the normal assignment operator for beginners, I would argue that Python 101 should not include this relatively new operator.

Similarly (and somewhat to my surprise), I did not see any mention of decorators in the official Python tutorial, even in the context of discussions of classes. This is consistent with my perception of decorators as a topic beyond the scope of what would be expected to be included in Python 101; they would thus be excluded.

I would also argue that the keywords “async” and “await” should be flagged as SyntaxError, with a custom message, when Python 101 mode is enabled. The relevant usage of these keywords requires a level of understanding beyond what should be expected of beginners taking a first programming course in Python.

I feel somewhat the same, but not as strongly, with the keyword “yield” and creating generators.

While the keyword “lambda” is thought to be “confusing and non-intuitive”, I would argue that it should be included in Python 101 as it is often required to make simple Tkinter based applications, which may be covered near the end of a beginner’s class.

Finally, I would argue that type annotations should not be allowed as valid syntax in Python 101. While there is no doubt that type annotations are useful for advanced programmers, especially those working with large code bases, I would argue that they add unnecessary complications to Python’s syntax for beginners. For example, see [Miss annotation, but no NameError? - Ideas - Discussions on Python.org], https://twitter.com/reuvenmlerner/status/1290317124997648386. Also, see The current state of typing PEPs, a long discussion on the Python-dev mailing list, including, in particular, this comment by Christopher Barker.

Respectfully yours,

André Roberge

TobiasHT · January 5, 2022, 10:54pm

This is an interesting detail of your concern. I do agree with the fact that Python is changing really fast, and sadly the changes are not going to stop. However, the very basic syntax of python has not changed. Python 3.8 added the walrus operator but very many developers write interesting code with out it. Python 3.7 added the async and wait keywords but I still know of my students who didn’t know python had async/await functionality until recently when I brought it to light, and they were comfortable with writing python code that they had already known.

So basically, I would not advise there being a python variant for learners and another for advanced users as to some point, the learners have to advance in their knowledge, and keeping them locked up on a limited variant of python might lead them to think that -that is all there is with python, and yet there are more useful features.

I generally think one can teach a course in Python without dwelling too much in advanced stuff if one structures the course properly. One has to know where to draw the line for students that are just getting into the language. If that were the case, then we’d be hearing of cases where a limited set version of Java is presented to students to learn, or C/C++ for that matter. So if I am to structure my course in python, I wouldn’t think about teaching how to write CPython extensions in C or how to embed Python in C for a beginner course, though you can do that in Python, I just draw the line between beginner, intermediate and advanced content.

Anyways, that’s in my opinion. I’m sure other people might have different points of view.
One Question though, how should the limited set version of Python respond if a curious student tried writing some valid advanced python code with it, which isn’t supported by the version? Should it respond with a SyntaxError? Should it warn the student of the correctness of the code but inability of the limited version to support it? Should it provide options to unlock the full pack of Python? Should it carry a different name (perhaps PyLearn) from the well known version of Python so that the student knows not to try writing some certain kinds of code with it?
Honestly that is what is confusing to me.

TobiasHT · January 5, 2022, 11:04pm

Oh, I’ve seen you suggested that Python 101 raise a syntax error if unsupported code is written. However, wouldn’t this corrupt the mind of the student to think that some kind of syntax is not valid python code and yet it is? what if they graduate with this stuck in their minds?

I have to agree though, you have this well thought out

apalala · January 6, 2022, 4:04am

PEP-0602 was the right decision after the 2.7->3.0 trauma. Yearly releases and deprecations force us to keep up with Python installations, but they do not force us to use every new feature.

There’s no need to change the language or its environment to teach it one step/feature at a time. Of course, students may just search the Web for all features of the latest Python, and want to use them. You only need to tell them that your linter for evaluation will be set to Python 3.x (tools like Pycharm can be set to check such compatibility).

Raymond was “An excellent driver” while he was only allowed to drive on the driveway.

steven.daprano · January 6, 2022, 10:04am

André Roberge said:

"These encoding declarations are one of the first topics covered

in the official Python tutorial."

For many users, that Python tutorial leaves a lot to be desired. For

complete newbies, with no experience in programming at all, parts of it

are (I think) bewildering. “Unix shell’s search path”? "Environment

variables"? “GNU Readline”? And that’s just from the first four

paragraphs.

We can interpret that as either:

a failure of the tutorial to be simplified for an audience of

complete beginners;

or reflecting the fact that the official tutorial is not aimed at

complete beginners.

I think that 2 is closer to reality than 1. Either way, I don’t think

that we should take the python.org tutorial as the Gold Standard for

what beginners should be taught.

It is great that Python is used by beginners. So are Javascript, Java,

C, and many other languages which are just as complex or even more so

than Python.

I don’t think that the Python language needs a cut-down, dumbed-down,

half-Python children’s mode built into the language definition. This is

not something that every interpreter (Stackless, PyPy, IronPython,

Jython, GraalPython, MicroPython…) should be forced to provide in

order to call itself a Python interpreter.

We aren’t responsible for filling every niche in the Python ecosystem.

If people are keen on this, I encourage them to create their own tooling

to support Python 101, just as Jupyter/IPython, IDEs, linters, mypy etc

do. Instead of adding functionality, you can take it away.

Or dig out the source code from Python 1.5 (it’s on the website), and

port it to modern OSes, and there you go: Python without decorators,

async, with statements, comprehensions, etc.

There are many ways that people can develop your Python 101 idea, and I

encourage people to play with it. If people can write LolPython and

LikePython, they can come up with ways to develop Python 101 without

needing to build it into the interpreter and the language definition.

http://www.dalkescientific.com/writings/diary/archive/2007/06/01/lolpython.html

steven.daprano · January 6, 2022, 5:01pm

Some more comments. Sorry, this is going to be long.

André wrote:

"Due to the addition of these soft keywords, parsing Python code became

much more complex and required a change to a new type of parser (PEG

parser) which even experienced programmers find challenging to master."

You have the causality backwards there.

The PEG parser was not introduced in order to add the match statement

and soft keywords. Soft keywords were included because we had already

changed to the PEG parser, which made it much, much easier to include

soft keywords instead of the old LL(1) parser.

Which actually wasn’t precisely LL(1).

The old LL(1) parser did, sometimes, use soft-keywords, e.g. at one

point “as” was a soft-keyword, only becoming a full reserved keyword in

Python 2.6 (with a warning in 2.5).

I’m not terribly impressed by the Twitter thread you linked to. The only

concrete complaint made about PEG parsers in general was:

"One complaint about PEGs is that they are very hard to compare against

the theoretical framework of the rest of the parsing world."

in other words academics don’t like PEG because it doesn’t fit their

theories.

In any case, the difficulty of writing or understanding a PEG parser is

irrelevant here. You don’t have to write a PEG parser or understand how

CPython’s parser works in order to read Python code.

André said:

"This ever-increasing complexity can make it more difficult to create

and maintain tools designed to help beginners understand what went wrong

in their program or to help them avoid errors."

Does it? How? These tools are surely not writing their own parsers for

Python code, instead of using the stdlib AST libraries to use Python’s

own parser.

In any case, do you imagine that these tools will only support your

Python 101? Linters, code checkers, IDEs and other tools are extensively

used by experienced developers who want to use the full set of Python

language features.

Even if Python 101 exists, these tools will still have to support the

full set of Python features.

André again:

"Something like Python 101 would flag advanced syntax as invalid and

help educators teach their students the basics of Python programming

before moving on to more advanced topics."

This sounds like an horrific nightmare scenario. Imagine doing a course

where you are forced to program below your capability, punishing the

advanced students while not actually helping the under-achievers.

Aside from academic fraud, which is pretty easily handled at this level:

"Hello student Fred, can you please explain what lines 30 and 31 of your

script are doing?"

how do you think students are writing code that is beyond their

capability? If students haven’t learned about comprehensions, do you

think they just pushed keys at random and a working comprehension

appeared in their code? Obviously not.

So why punish them for writing comprehensions, if they know how to use

them?

André quotes Greg Wilson:

"It’s easy to say, “Just ignore decorators and async I/O and the :=

operator in class,” but that’s disingenuous. Newcomers will bump into

these things as soon as they search online for help, because they

actually are helpful for people who are programming in the large; that’s

just not my use case."

Right. And they are also useful for newcomers too, which is why Python

101 is a terrible idea. Python 101 is essentially a way of punishing

students who can and do learn from unapproved sources such as on-line

resources.

When the student tries to run some Python code, they find it doesn’t

work. If they are an advanced student who understands that they are

programming in a straight-jacket with Python 101, they will feel

frustrated that they have to dumb down their code just to satisfy the

compiler and instructor.

And if they are not advanced, they will be utterly perplexed why the

code doesn’t work. Have they done something wrong when the internet says

it should work?

No. The error comes from the instructor, who claims to be teaching

Python, but is actually teaching another language, a subset of Python.

aroberge · January 6, 2022, 6:30pm

Many well thought out comments not included.

Thank you Steven for taking the time to write such a detailed response (in two parts).

I wrote my proposal after reading on various Internet forums many expressions of dissatisfaction about Python becoming more complex and less user-friendly to beginners. I was also impressed by the consideration given to beginners by the creators of Racket.

Rather than simply complaining about this (either on various Internet forums, or simply silently to myself like the grumpy old man I am becoming), I thought it worthwhile to put together a detailed proposal, putting my best arguments forward. If those arguments are not convincing enough to at least elicit some support, then, so be it.

Still, I want to address three specific comments that you wrote.

The first one is about forcing students to write in a “less complete” dialect:

The Racket community has a much stronger focus on the educational and pedagogical aspect of teaching programming than the Python community does. Given their built-in support for five different languages/dialects, to be used at a different stage of learning, I believe that they would strongly disagree with your stated opinion. I am of the opinion that their experience should be given some serious consideration and not simply dismissed with an off-hand comment.

That being said, Python is not Racket, and their general audiences are not the same. And, for the record, I am only interested in contributing to the Python ecosystem.

In an earlier reply, you wrote:

In spite of your choice of examples given, I chose to read this comment as positive rather than dismissive criticism. For the record, I am well aware of this, as I created AvantPy and Ideas.

I did put the AvantPy project temporarily on hold to focus the error handling part and became friendly/friendly-traceback, something useful to current Python users and has already been used successfully in other projects even though it is far from complete.

Actually, yes some do: when all you get from Python is “SyntaxError: invalid syntax”, you might have to come up with a different parsing strategy to provide users with useful suggestions as to how to fix their code. I know this because this is what I have done with friendly-traceback. However, I know that my “heuristic parser” cannot cope with the pace of change in Python’s syntax, and I do currently focus on a subset somewhat larger than that which I described as Python 101.

brettcannon · January 6, 2022, 8:59pm

That’s somewhat of the goal of my syntactic sugar blog posts; identify the minimum viable Python (MVPy) that is still useful. Part of the trick, though, is who decided what is “useful” and thus what should (not) go in.

That’s a big ask if you’re expecting CPython to ship such a thing. This starts to get into having to maintain a massive __future__ statement, any disagreements over what should (not) be included in such a “Python 101” definition, etc.

Ignoring the fact that beginners will very likely forget to include that comment, you could use a file encoding declaration for this. You can then parse the source to AST, walk it, and throw an exception for any syntax used that you wish to prohibit. That wouldn’t require any special support and shouldn’t be too costly to implement.

Otherwise you might be better off with linting rules or even a fork of a Python implementation that doesn’t even have what you’re after so there’s zero chance of something slipping in.

steven.daprano · January 9, 2022, 2:44am

André wrote:

"In spite of your choice of examples given, I chose to read this comment

as positive rather than dismissive criticism."

It absolutely was not meant to be dismissive.

I am very fond of LolPython and LikePython, I think they are cute and

funny even if not very serious examples of transforming code to Python.

It never occurred to me that others might read it as dismissive. Sorry.

I could have referred to ChinesePython or Teuton

http://reganmian.net/blog/2008/11/21/chinese-python-translating-a-programming-language/

http://www.fiber-space.de/EasyExtend/doc/teuton/teuton.htm

or Mulan:

but I think that they are forks (particularly Mulan) rather than

preprocessors.

Or Coconut:

With all the certainty of somebody who doesn’t intend to do the work

himself wink I can say that it should be easy to write a preprocessor

that takes Python source code, looks at the AST, and rejects features

that you wish to reject.

Personally, I don’t think that I would use this Python 101, or recommend

it to beginners, but I could be wrong in my opinion and I encourage

people to experiment.

I think having a big, rich Python ecosystem is great, the more

interpreters and tooling and preprocessors the better.

(Aside: I think it is a disturbing sign that we’ve gone from at least

four major independent Python implementations back in the Python 2.5

days, CPython, IronPython, Jython and PyPy, to just two supporting

Python 3.)

If your Python 101 turns out to be a good idea, then it might end up

being to education what Jupyter and IPython are to scientific Python.

And if not, then no harm done to the language and ecosystem.

Julian-O · January 9, 2022, 9:21am

To be overly frank, I am not convinced by the idea. Raymond Chen, a blogger from Microsoft, talks about new features starting with -10 priority points. The default is to reject them, unless the reasons to implement them are very strong. I haven’t been convinced the value is worth it. The good news: I am just a Internet random. My opinion is value-less, and there is no need to convince me!

However, I will make a suggestion: the use of “101” to indicate a basic level of training is a very US College-centric concept. While there are exceptions, it doesn’t seem to be widely used in universities elsewhere, and many learners won’t be familiar with tertiary education idioms in any case. I suggest finding a more universal description for the feature. (Not “primer” though - that is another word that only seems to be used by US writers!)

cben · January 16, 2022, 9:51pm

See also https://hedycode.com/ research project. It builds a progression of (much) simpler languages that are NOT subsets of Python but build up concepts and syntactic skills to eventually arrive at a proper subset.
https://www.felienne.com/wp-content/uploads/2020/07/Hedy_paper_website_draft.pdf

EDIT: I think any effort in similar direction should invest heavily in error messages. Just enforcing a subset by raising SyntaxError would not give much value — tailored messages are how you reap benefits from the restrictions. Something like friendly only more customized…

aroberge · January 16, 2022, 10:50pm

I (obviously) agree. For clarity: I’m the person who started this thread, and also the creator of friendly. friendly was an offshoot of AvantPy, which I mentioned in a previous message. AvantPy, currently put on hold, has [even more custom error messages].(Friendly error messages — AvantPy 0.0.15a documentation). Currently, AvantPy is “broken” due to changes to friendly, but I plan to restart it soon, making sure that it can use friendly/friendly-traceback.

davidfstr · April 25, 2022, 1:16pm

Racket’s language levels are definitely quite interesting, and I could see an attempt being made to define particular subsets of Python that would be useful for learning Python.

However I don’t think the Python language itself should be in the business of defining what the subset is or enforcing it at the language level (via a __future__ or similar mechanism). Seems to me it would be more-straightforward to create your own custom “linting”-like tool that would flag use of Python constructs outside of whatever Python-sublanguage was in use.

You may also be able to use an import hook to do the checking at import time.

The tool itself would presumably parse the Python source code, perhaps using the builtin ast module, and just flag lines that contain code outside the desired Python subset.