Should Python allow reserved keywords as property names?

"""This is not ok because from is a keyword. My proposal is to allow this:"""
class Test:
  from: int = 10  # Throws a SyntaxError
  to: int = 10

test = Test()
print(test.from)

This is especially useful for dataclasses from API responses that especially tend to use the from keyword in for example times:

import dataclasses

@dataclasses.dataclass
class TimeInterval:
  id: int
  from: datetime.datetime
  to: datetime.datetime
1 Like

I ran into a similar situation many years ago, when implementing an API to an external simulation program. The modeling language had an element with various properties, two of which were ‘print’ and ‘return’.

In our ORM, you could do stuff like this, but those last to sent us to use setattr instead of the obvious reserved word naming.

part = model.Part()
part.mass = 100
part.location = 10,0,20
debug = model.Debug()
debug.print = True  # Oops.  Works after 3.0, but in the 2.x days it failed.
debug.return = "OnError"  # Still fails.

If I’m remembering right, your idea has come up before, now that the PEG parser would fairly easily allow all these traditionally reserved words to become soft keywords.

2 Likes

Thank you for the input. If I understand you correctly this idea has been brought up before and is also something you would have found useful in a project of yours a while back? And it is probably easier to implement now that CPython uses a PEG parser?

I tried searching for an active issue about this in the CPython GitHub repository, but couldn’t find any. I thought of creating an issue about this myself. The only thing holding me back is any potential obvious caveats that I have not thought of, so I decided to ask here first :grin:

Yes to all three of your questions.

My recollection is that there was discussion on the python-ideas mail list back when PEG was formative, this being one of the arguments to go ahead with that project. I don’t recall if it predated the svn → git conversion, but may very well have (or been coincident with the switch), so I’m not surprised there’s nothing on github.

Could be something in here: PEG Parsing Series Overview. My series of blog posts about PEG
 | by Guido van Rossum | Medium

The tradition is to append a _ to the name, e.g. from_ when something clashes with a keyword or built-in object name.

Yes, but that isn’t actually a good thing. We purposefully only use that for extreme situations where new syntax strongly makes sense to, like with match where there was a chance of clashing with pre-existing variable names. We still prefer to keep the grammar relatively simple.

5 Likes

Sure, but (in my cases at least) it’s likely you’ll be de/serializing the class into JSON or whatever, so now you have to configure your serialization library to do a rename. Which is generally possible, but frustrating when it’d be so much easier if the language let you use the right name from the start :wink:

I’m assuming that’s because you don’t control one of the endpoints and thus the format isn’t under your control. In which case that sucks and I’m sorry that Python puts you through that if you’re converting this into object attributes (which I will assume the maintainer of cattrs is doing :wink:), but that’s not going to convince the SC to make all keywords soft for the entire language (I was a part of that discussion around match and bringing in the PEG parser so I’m not speculating here).

5 Likes

I would like to see the ability to specify an arbitrary string as attribute name for purposes like this.

AFAIK there is no keyword that can ever directly follow ., so I think it should not be too hard to argue that . would force the following word to always be parsed as an identifier rather than a keyword.

For cases where there is no . before the identifier name, such as class attribute annotations as mentioned above, but also kwarg names at call sites (e.g. when constructing a dataclass), perhaps it could be allowed to use a leading . to likewise parse the following word as an identifier.

For consistency, it would also be allowed to use . to indicate an identifier even if there wasn’t a keyword collision. This would allow people to use . for every name in a context where at least one name required it, which’d be a bit tidier than only using this workaround for keyword collisions. So this would become valid:

@dataclass
class Foo:
    .from: Bar
    .to: Bar

foo = Foo(.from=some_bar, .to=some_other_bar)
print((foo.from, foo.to))

An alternative is to use a syntax including single or double quotes to specify “treat this string as an identifier name”. The Zig language uses the syntax @"name", as in, foo.@"from". In Zig this syntax allows any valid string to be used, so you could have foo.@"bar$baz" (resulting in accessing an attribute named bar$baz, which might be helpful if you’re somehow interfacing with something in Java or JavaScript where $ may appear in an identifier) or even things like foo.@"a.(very ; strange\"name". If Python were to use similar syntax I’m not sure if it should restrict the content to still match /[a-zA-Z_][a-zA-Z0-9_]*/ or if it should allow any arbitrary string like Zig does, but it should at least allow to overcome keyword collisions.

@dataclass
class Foo:
    from_: Bar
    to_: Bar

foo = Foo(from_=some_bar, to_=some_other_bar)
print((foo.from_, foo.to_))
1 Like

Have you seen getattr and setattr? :slight_smile:

1 Like

@apalala @Rosuav

Yes, to both of you, those are the solutions I use.

The first doesn’t work when the attributes you’re dealing with come from some external source that isn’t bound by Python’s keyword list, and the second is somewhat untidy.

I’m really just putting my 2 cents out here for consideration.

You can preprocess using import keyword.

The new PEG parser in Python can actually manage the meaning of keywords depending on context, but having worked with languages that do that (COBOL, NATURAL, PERL), and considering the small set of Python keywords, I’d opt for keeping the set of keywords small, and avoiding them as identifiers in programs.

Because the original example is over Foo and Bar it’s difficult to recommend alternate naming, but usually a longer and more explicit name resolves the collision with keywords, for example from_position, or initial.

In the end a language should be designed for the easy reading (by humans) many times, versus ease of writing.

I agree that making all keyboards soft is probably too disruptive and can easily make code unreadeable (like, imagine a comprehension with variables named for or if
)

However, ub the specific case of from maybe it can be relevant? I’ve found myself several cases in the exact situation of the original post, needing an additional rename step to serialize/unserialize data using “from” and “to” fields”.

Since from can only be used in very specific context (from ... import ...), maybe making it a soft keyword has a positive balance?

REXX has no keywords whatsoever, and thus you really could do that sort of thing
 it’s a blessing and a curse! The blessing is that you can have extremely situational keywords (eg the “PARSE VALUE x AS y” statement, in which “value” is a keyword - and keywords are case insensitive, so it’s great that that doesn’t stop you from using “value” as a variable name), but the curse is exactly what you say: it’s entirely possible to write extremely unreadable code.

Having a number of hard keywords helps a lot with hard error messages. When there are multiple ways you could potentially parse something, it’s possible - and all too common - to have “garden path” sentences where you have to back up a long way and reinterpret what you thought you already understood. (You’ve probably heard that time flies like an arrow, and fruit flies like a banana.) That then leads to errors being reported a long way from the actual bug, since everything prior to that was perfectly legal and grammatical, but might not have had the interpretation you intended.

Good use of keywords can help to prune the grammatical tree early, enforcing what is to be interpreted. Yes, sometimes it excludes an otherwise-valid interpretation, but that’s the price paid for the 99% of the time when it’s beneficial. Languages with almost no keywords tend to have a lot more grammar words to them (like REXX that I mentioned earlier), to help guide that interpretation.

3 Likes

I totally agree with your conclusion, I never heard about REXX, really interesting! Do you have an opinion on the case of from in particular?

It’s a language that I spent many years working with, and have no regrets about! Great language. Imagine a shell scripting language that gets enhanced with some more features to make it a general-purpose language. Think like bash scripts, but even more so. Now add in the ability for extension libraries (where shells generally just spawn subprocesses for everything), some GUI libraries (VREXX, VPREXX, VX-REXX), and a few things like that, and you have a quite viable scripting language. Plus, OS/2 made it really easy to call on the REXX interpreter from another program (think like embedding CPython, only the interpreter’s actually provided by the OS), so REXX scripting was the single most popular embed language on OS/2, making it the universal language.

The strongest opinion I have is “don’t change it”, and that’s not a particularly strong opinion. But there needs to be a strong case for the change, and I think the reasons given here are not much stronger for ‘from’ than for any other keyword.

And I do have a strong opinion on “make them ALL soft”, which is what I said above. So while I wouldn’t stand in your way if you think ‘from’ is special, I’d also not get behind that argument without a bit more explanation of what makes this, in particular, either more beneficial for non-keyword use or more harmful as a keyword.

From personal experiences, it is close-to the only one I ever encountered. This is also supported by a quick count in the stdlib. The most common usage as an identifier is assert_, next is from_, followed by class_. The testsuite also uses import_ a lot.

  • assert_ is basically only used in a single file, it’s a helper function in wsgiref/validate.py that behaves like a normal assert statement except that it doesn’t get optimized away. Not sure why it exists.
  • class has a well known ok-enough spelling cls. Specfically class_ is used about as much as from_ in the stdlib
  • import is in the stdlib at least only used by tests or importlib, which are kind of specialized usecases
  • from_ is used a bit more all over the place, tkinter (and it’s family, idlelib and turtle) and mailbox use it, as well as tests for unicode.

from OTOH doesn’t have a clear alternative spelling I am aware of and it is quite a common name for a parameter in more abstract definitions of protocols. In python it is only used for the from ... import statement, which means it’s only ever used in a very clear context and in relation to another keyword, so typo detection wouldn’t really be hampered by turning it into a soft keyword. (although ofcourse, relative imports from.module import ... come to mind as slightly conflicting.

Not that I am necessarily in favor of changing it’s status. But it is IMO a bit special in contrast to most other keywords.

4 Likes

Okay, out of curiosity I did a quick GitHub search of all keyword_ usages in Python files, which is the canonical way to bypass hard-keyword limitation :grin:

Here are the results (sorted from highest count to lowest count):

Keyword Count Link to search Notes
class_ 158k Link
in_ 99.1k Link Notably used by SQLAlchemy
from_ 89.6k Link 85.5k results with from_ and to (or to_) in the file (link)
or_ 79.6k Link Notably used by SQLAlchemy
assert_ 72.4k Link
and_ 69.6k Link Notably used by SQLAlchemy
is_ 44.8k Link Notably used by SQLAlchemy
lambda_ 32.1k Link
not_ 26.1k Link Notably used by SQLAlchemy
raise_ 23.3k Link
import_ 14.5k Link
else_ 10.6k Link
async_ 10k Link
as_ 9.3k Link
with_ 9.2k Link
if_ 7.3k Link
pass_ 6.8k Link
True_ 6.8k Link
global_ 6.3k Link
return_ 6.1k Link
def_ 5.9k Link
except_ 5k Link
yield_ 3.8k Link
False_ 3.7k Link
continue_ 3.4k Link
for_ 3.2k Link
break_ 2.8k Link
while_ 1.6k Link
await_ 907 Link
elif_ 640 Link
finally_ 414 Link
nonlocal_ 147 Link
try_ 104 Link

So, we can see that from_ is really high in this list, and most of the time indeed in a from/to context.

I don’t know if that’s enough to justify making it special, but I think that show this is a fairly common problem!

4 Likes

It’s also part of raise and yield. But in the case of import, the word from is at the START of the statement. That makes it a bit trickier to replace, since that’s a good way to end up with a garden-path statement. Currently, if you see the word from at the start of a parsing context (say, when you’re expecting a new statement), you expect it to be followed by an importable thing, then import, and a thing to import from it. But if from were a soft keyword, the word on its own would be a valid expression, something like this:

try:
    raw_input # an expression with just a name
except NameError:
    raw_input = input

So, if I have statements like these, where are the bugs?

from (spam) import ham
from spam; import ham

It’s entirely possible to interpret these as from-imports, but also as function calls or perhaps assignments. Where do you pinpoint the error, and how much confusion will it cause?

Soft keywords make a lot of sense in contexts where they can’t possibly occur at the start of a statement. For example, “await” can’t (normally) appear outside of a function declared with “async def”, which means that the keyword was able to be introduced as a soft keyword - at top level, you could say “await = 1” or “def await(x)” without issues, but if you wanted to say “await thing()”, that would happen in the clearly-defined context of an async function. Similarly, “as” could be made a soft keyword, I think (although there’s not a lot of call for it), since every use of it follows some other keyword (import, with, except, match), so it would be unambiguous.

I’m not really surprised about class in your list. Even with the common abbreviation cls, it’s still going to be extremely common. But it also would be a poor choice for a soft keyword IMO, since it always starts a statement. in might be a better choice, and from as mentioned is somewhere in between. It all depends on what kinds of confusion it would cause by permitting it, compared to what kinds of confusion you get by rejecting it.

Oh right, yield from and raise ... from forgot those, I almost never use them. Those do actually make me not want to have from as a soft keyword. yield from and forgetting the rest of the statement does not seem like an impossible mistake nor does yield from seem like an unlikely intended statement. Your two example with from ... import to me seem less likely, but do ofcourse exists. Also, “beginning of a statement” is clearly not an indicator since we have match and case as soft keywords [1]. Quite the opposite IMO, if soft keywords might be part of expressions as keyword I would find it more confusing and harder to produce good error messages for (so I like in even less that from. from in expressions is at least a context dependent special form)


  1. TBF, those lines also end with :, make it a bit clearer what the user meant. Similar goes for class ↩

1 Like