Make `class` a soft keyword

Since the introduction of soft keywords, I have been wondering more and more whether some of the existing keywords could be made soft keywords.

One benefit of being a soft keyword is, it can be a valid identifier. And one of the existing keywords that I would really like to be able to use as an identifier is class. class is a very frequently used term, and is meaningful in all sorts of ways. Since it is almost unavoidable, many programmers find different spellings to solve this, such as:

regex used

language:Python /[^a-zA-Z_]{alternate class spelling}\b/

I could not find a proper regex to search for cls but I assume it is quite high.

While it is not a debilitating issue, it does kill the charm in many cases. e.g. in fastHTML it breaks the 1:1 mapping with HTML:

@app.route("/")
def get():
    return (Title("Hello World"), 
            Main(H1('Hello, World'), cls="container"))
other potential candidates

I am not opposed to making more existing keywords into soft keywords, e.g. I would find the following quite useful as well:

  • if: do(something, if=predicate)
  • async: some_func(*args, async=True)
  • from: case_converter_factory(from="pascal", to="snake")
  • in: find(needle, in=haystack)

I am advocating for only class here, because that is the one I find myself needing most often. And I don’t feel there’s any chance of it being confusing if it were made a soft keyword.

In terms of backwards compatibility, the only concern in my opinion is existing educational material. Here, I am a bit unsure whether the tradeoff is worth it. If this change were made, I don’t think out-of-date educational material will have any adverse impact in practice.

9 Likes

+1

I’d advocate for from too for its common usage as a name for the beginning of a range, which happens to me often when mapping existing table columns of date ranges to object models and when wrapping third-party API calls.

3 Likes

Maybe case could be useful too, e.g. some fn(arg, case=some_case_identifier).

Generally I’m for adding class though, I’ve found myself doing class_ before.

case is already a soft keyword since it was introduced when the language was already mature.

Oh yeah I remember.

The whole point of soft keywords is to be able to use as a keyword something that may have already been used as an identifier.

Trying to use as an identifier something that is already a keyword adds unnecessary confusion. Personally, I don’t find your example motivating enough.

Sorry to be a downer, but my personal opinion is that Python should be going the other way: In an ideal world, we would slowly make it impossible to use soft keywords as identifiers (match, etc.) But I’m not advocating for that either.

5 Likes

Why is that the whole point of soft keywords? Why does whether a word is made a keyword first or an identifier first matter at all in determining the worthiness of a soft keyword? If type, match and case had been keywords since day 1, you would’ve still seen requests like this thread to make them available as identifiers, and the end result should’ve been the same, that they would’ve become soft keywords, because these words are that common.

Unless you deliberately write confusing code like match match: or class class:, I fail to see how soft keywords can make code confusing. As keywords and identifiers they are used in distictively different syntactic constructs after all.

One of the main draws of Python is that it reads like a natural language by making use of many English words as keywords in a way that makes sense in a sentence. If variables and parameters have to be named with awkward spellings because of conflicts with keywords that quality is diminished. Soft keywords are a good solution so long as they are backwards-compatible.

10 Likes

Soft keywords have a performance penalty because the parser might need to backtrack & try different grammar rules. Personally, I don’t think it’s worth changing semantics of the language for pure aesthetics: special cases aren’t special enough to break the rules.

In the case you cite, you could use dict-unpacking, uppercase, or pluralisation:

Title('spam', **{'class': 'viking café'})
Title('spam', classes='viking café')
Title('spam', CLASS='viking café')

A

7 Likes

The performance penalty won’t be that bad because in this case it takes only reading one token after class to decide whether it’s a class definition or an identifier. The penalty is far greater for constructs like a list comprehension, where the for keyword may take many tokens to reach after a bracket token.

It isn’t pure aesthetics. Hard keywords makes it hard to write generic wrappers, and we consequently have to special-case names that conflicts with keywords when mapping them, which is the opposite of Python’s philosophy.

2 Likes

Backward compatibility.

I have said this many MANY times, and it does seem that not everyone respects it properly. Python does not break valid, working code without a very good reason. Even when the code is demonstrably broken (eg bad backslash escapes, or relying on dictionary iteration order in Python 2.7, or relying on certain semantics of locals()), there is a strong push to avoid breaking people’s code, and as much as possible, ways to make it easier to port.

That’s why the order matters. If something’s been a keyword since forever, like class, there is no code out there that will suddenly start working by making it a soft keyword. But when there’s code out there using match as a name (like, maybe, in the standard library itself), it’s safer to make it a soft keyword and allow that to continue to work.

Soft keywords have a price. In a language with no keywords whatsoever, it’s harder to be sure what you’re reading. That impacts both humans and computers - backtracking and changing your mind about how to interpret something can result in misinterpretation. This happens in English, not just in programming languages, and it’s a problem (though it can be used for good effect too, such as in WS Gilbert’s “I love, alas, above my station” // “He loves a lass above his station” in HMS Pinafore). Hard keywords give a point of certainty. There’s no way to interpret the word class other than as a keyword. If you see elif, you can’t possibly misinterpret it as a variable name, so you can be sure that it is connected with an existing if statement. Etcetera.

I suppose, if it WERE just performance, we could argue that the ambiguity is fine, and computers are made for this sort of work. But it’s not - this sort of ambiguity results in worse error messages, as something could be interpreted very differently. As a very simple example, the hard keyword class results in an error right there, whereas match doesn’t become a problem until later:

>>> x = match(status):
  File "<python-input-0>", line 1
    x = match(status):
                     ^
SyntaxError: invalid syntax
>>> x = class(status):
  File "<python-input-1>", line 1
    x = class(status):
        ^^^^^
SyntaxError: invalid syntax

Every flexibility comes at some sort of price. The question is, is it a price worth paying? And that often depends on how much previously-existing code will be affected.

9 Likes

What you say may be true if the soft keyword being proposed here were something that can occur in the middle of an expression, but it isn’t.

class currently can only occur at the beginning of a statement and if it’s made a soft keyword it would take only reading the next token to understand what construct it’s in. It hardly impacts neither humans nor computers.

2 Likes

That’s BECAUSE it’s a hard keyword. What you’re describing is the exact advantage of hard keywords: You know for sure that it can’t occur in the middle of an expression.

So, if you’re NOT at the beginning of a statement, you currently know that you have an error. However, if it were made a soft keyword, you would no longer know that. That’s the problem. That’s the cost.

There’s very little advantage to be gained here; all existing code already copes with the fact that class is a keyword. It’s completely different from match in that way.

I don’t see a problem here. By starting the statement with x = it already clearly signals the intent that the statement is to be an assignment, so the error message pointing out the occurrence at the extra colon at the end is perfectly reasonable, while pointing out that class is the point of error is actually a consequence of class being made a hard keyword, not a benefit.

3 Likes

I don’t think performance is that important in this case; I suspect any parsing inefficiency of the class statement is dwarfed by the actual work of calling the metaclass, creating the class object and filling its namespace. Anyway I find the general argument of making things into soft keywords unpersuasive.

There always will be special cases, be they hard keywords or some other syntax. Even if you have a language where every keyword is a soft keyword, still no one would expect the programming language to be able to pass non-alphanumerics, percent-escapes, &amp; escapes etc. unscathed in its syntax. Domain-specific languages are specific to a domain for a reason.

match/case/type/_ are soft keywords to add new features to enhance Python the language, since pattern-matching and type hints have become common, and match and co were plain identifiers before. class started out as a keyword and not a valid identifier, only if there were proposed new uses of classin Python that would make it act like non-keyword (which I can’t imagine any), would there be a compelling reason to make class soft from the other way round.

I don’t fully agree with any of the statements made in this comment (except the one you preface with “Personally”). Yes, soft keywords were introduced to not break backward compatibility, but I don’t think that precludes other possibilities.

It can be confusing, but that is not by definition. The reason I proposed class and not others is because it can appear in a very narrow context. OTOH, when used as an identifier, I think depending on the usage, it can be completely obvious.

Your last statement, while I could agree to this if you presented some reason (e.g. performance penalty), but without that I don’t understand this hard line. The primary job of keywords in a language is lowering the parsing cost. But if (for example the PEG) parser can workaround that with negligible performance cost (I don’t know if it does), then I don’t think this is a valid counterpoint.

That’s a valid point, and if the performance cost is noticeable, that may be too much to justify this proposal. But I simply have no idea how much this would cost in terms of performance. I would assume negligible.

I cannot define this in the function signature. In any case, all the examples have the same drawbacks as the one I cited, these are ways in which you work around the language design.

Those are fantastic references! Thanks!


I am looking for the same answer, but only restricted to class. If we leave the parsing penalty aside, this change, at least, should not break any existing code. Then the question is the cognitive load. It is not possible to speak for all python users, but personally, do you feel this will be ambiguous?

I expect to see things like this code from pigshell, by replacing klass with class:

        class = HTTPSConnection if urlcomps.scheme == 'https' else HTTPConnection
        headers = self.headers.headers
                headers.insert(0, cookie)
        target = class(host, port)
2 Likes

It’s a matter of opinion, of course. I find it easier to read code where one identifier has exactly one meaning everywhere.

It also has the benefits of:

  • making search-and-replace work much more easily
  • making grep work (like the grep in this idea itself)
  • making syntax highlighting work even when you don’t have LSP

First, you may not initially write code like that, but eventually, you need a match statement with your variable named match, and so either you have to go back and rename the variable or else not use the match statement. I think it’s better to just avoid the keyword from the beginning.

1 Like

As has been discussed in previous iterations of this discussion, from is not available because of yield from. A line like yield from (a, b) would be ambiguous.

11 Likes

Yes, but not the parsing cost of the compiler (which I think is what you’re saying), but rather the parsing cost of humans. When I see cls in Python code, I’m pretty sure it’s a variable whose type is a subclass of type. When I see class, I expect to be declaring a class. Overloading the use of the keyword makes it harder for humans to parse.

100% agree with you. That is the question.

3 Likes

This is unfortunately an example of “this should have been decided at the beginning”. In addition to what Adam and Chris (in some sense) said, I’d like to mention that making class a soft keyword will:

  • prevent any code using this new feature to be compatible with previous Python versions (parser changes can’t be guarded with sys.version_info; we could do it if we had pre-processing macros as in C where the if-guard is evaluated at parsing time and not at runtime but that’s another topic). Future statements wouldn’t help either.
  • even if libraries don’t care about this (namely they assume their users use the latest Python version), they would still need to replace the function signatures (e.g., changing the parameter klass to class; any code using func(klass=
) is either broken or some deprecation period will be needed, and thus more maintenance burden).

We add features to solve existing issues, or to make the code more readable. class is also a reserved word in JS (ES6) and thus it’s still not possible to use it as a variable name. It would be possible to use it as a property x = {class: '...'} but there’s, AFAIK, no realistic way to create a class variable. It’s also not possible to use it as a function parameter.

Since JS is directly tied to web development, I don’t necessarily think it’s better to do it on our side. Both languages suffer from the same “drawback” but both of them have a longstanding history of working around this. Now, it’s still possible to annotate the function signature using typed dicts (they must still be created using a functional syntax, e.g., K = TypedDict(‘K’, [(‘class’, str)]) and so it’s possible to indicate the expected keywords, though I admit this appears to be too complex). The variable used would still have a different name.


I’ve played a bit with the parser, and it appears that it shouldn’t pose too much of a problem. Performance drops also seem negligible:

$ ./python -m pyperf timeit -s 's = "class a: pass"; from ast import parse' 'parse(s)' --compare-to=python3.15 --python-names ref:new
Mean +- std dev: [ref] 2.06 us +- 0.04 us -> [new] 2.09 us +- 0.04 us: 1.02x slower

$ ./python -m pyperf timeit -s 's = "def a(): pass"; from ast import parse' 'parse(s)' --compare-to=python3.15 --python-names ref:new
Mean +- std dev: [ref] 2.57 us +- 0.05 us -> [new] 2.50 us +- 0.04 us: 1.03x faster

./python and python3.15 are locally compiled with HEAD@23adbf53c5b and --with-lto=yes and --enable-optimizations. When combining classes and function definitions:

$ ./python -m pyperf timeit -s 's = "class abc:\n\tdef meth(self): pass"; from ast import parse' 'parse(s)' --compare-to=python3.15 --python-names ref:new
Mean +- std dev: [ref] 4.68 us +- 0.13 us -> [new] 4.53 us +- 0.07 us: 1.03x faster

$ ./python -m pyperf timeit -s 's = "def abc(outer=None):\n\tinner = None"; from ast import parse' 'parse(s)' --compare-to=python3.15 --python-names ref:new
Mean +- std dev: [ref] 6.12 us +- 0.10 us -> [new] 5.89 us +- 0.18 us: 1.04x faster

Considering the magnitude of the standard deviation, I think we can simply say that it’s just noise (I however don’t know how it can be faster
). I doubt we will be way faster or way slower and I think it’s a matter of tuning. I also don’t know a reliable way to just benchmark the parsing phase, and as it was already said, the bottleneck will not be the parsing itself but likely the code generation. I tried to play with some corner cases (like, having class everywhere) but I couldn’t find a way to break the parser (it’s essentially the same as with match). Maybe there is a corner case I didn’t consider but the performance impact would not be my first argument for rejection as adoption of this feature would put too much burden on maintainers (and possibly on users too!)

9 Likes

Yes, all of that makes sense. It seems clear that the cost outweighs the benefits. Thanks for the detailed response! Thanks everyone!

3 Likes