`keys` as a special method name and part of the data model

Gouvernathor · July 13, 2023, 6:56pm

See also :Why the ** syntax uses keys() instead of __iter__

Methods used internally by operators, in Python, are all dunders described in datamo. With one exception.
When using the unary * operator, for *args, Python calls __iter__. However, for the unary ** operator, for **kwargs, it calls the method keys, expected to return an iterable of keys to the mapping (which themselves are expected to be strings, in the case of a function call).
As mentioned in the thread, the keys() method is also called by dict.update(d), and the **expression syntax can also be found in {**d}, not restricted to string keys this time.
That makes keys treated the same way as reserved method/attribute names such as __iter__, __neg__ or __divmod__ (edited after initial post).

It poses a series of issues, since an object with a keys attribute or method being mistakenly used as ** (or passed to dict.update, although that’s probably less likely to happen) would raise a different exception than **object() would. Worse, if that method were to return an iterable, or a string since they are iterable themselves, the interpreter would lookup these keys with getitem on the object and only then report an error if that doesn’t work. And if the keys method returns an empty iterable, no error is even reported.
No part of the documentation (to my knowledge) warns about this and treats keys as a reserved name. (see below for a more detailed discussion about this)
Some existing code seems to use keys not as a dict method :
(1), (2), (3) which inherit google.protobuf.Message which doesn’t implement the mapping interface, also they are properties and not methods ;
(4) ;
(5), (6) which implement some of the dict methods but not __getitem__ which is necessary for ** to work correctly ;
(7) which is in the stdlib (!) and doesn’t implement __getitem__ either.
I only skimmed over uses of keys as a function, but a class or instance attribute would be just as much problematic and likely even more in use.

I see two solutions to this : ~~changing the behavior to using __iter__, or only~~ (removed after discussion) changing the documentation to officially reserve the name.
I imagine maintainers will prefer the low-risk second version, but I think the merits of the first deserve to be discussed. I suppose other solutions could be found, I only processed this one but I’m open to alternatives.

What would be the behavior ? It would call __iter__, which in the case of mappings is documented as iterating the keys, and test that they are all strings. It would then test for the presence of __getitem__, and in its absence raise the same exception that it currently does in the absence of keys and that it now would in the absence of __iter__. It would then query the values using getitem, and from there on the behavior is the same.
What would be the cases where it avoids an error ? Those I described earlier on, using keys as an ordinary method or attribute.
What would be the cases where it would result in unintended behavior ? Passing an iterable would not result in the exact same behavior as before, as it would try indexing with the resulting values instead of just raising an error. But if even one of the iterated values is not a string, the operation would still fail, presumably before index lookup. And even if all the keys are strings, indexing with them will not work, raising an IndexError (or a TypeError for non-sequence iterables). It would let **() pass, though, as well as other empty iterables. But I find this to be more acceptable behavior than the current one. It would only leave an unintended behavior pass without exception in the case of a class with __iter__ and __getitem__ but without a mapping behavior, and that doesn’t sound like something deserving official support.

We should describe keys as a reserved method name, it should also be flagged by type checkers and linters when defining a method taking any other signature than just (self), and if possible in classes defining keys but not the other mapping methods : __iter__ and __getitem__. But that may be hard to do if and when you don’t have full view of the base types.
I know PSF/PSC doesn’t enact type checker changes, but better to say it anyway. And an official statement that “keys” is reserved may help checkers and linters take it into account.

tjreedy · July 14, 2023, 7:04pm

The fact that Python itself uses public method names of built-in classes does not and should not make them ‘reserved’, whatever that would mean. ‘keys’ is not especially unique in this regard.

Python is by default duck-typed, and will remain so. This mean that objects can partially or wrongfully masquerade as a duck when they will not work as a duck. Optional typing is a solution for this.

Proposal 1: Calling keys eliminates objects that are obviously not mappings. Calling __iter__ directly would be worse.
Proposal 2: Users should be careful about reusing any of the public method names of builtins. The call doc says " If the syntax **expression appears in the function call, expression must evaluate to a mapping, the contents of which are treated as additional keyword arguments." This seems clear enough to me. Passing something that does not have proper keys, __iter__, and __getitem__ methods (the actual requirements here) is a user bug. If one wants specific warnings before execution, use type annotations.

Because an empty iterable might be correct. The interpreter cannot read minds.

Gouvernathor · July 14, 2023, 9:06pm

I agree. Python may do whatever it wants with methods, documented or not, on its built-in classes. The problem is that the unary ** operator also calls keys on things that are not built-in dicts. Precisely because Python is duck-typed.

Also, what “reserved” means is in the documentation. It explicitly include dunders (which keys, admittedly, is not), then points to the very datamodel section listing all operator overrides, which keys is.

Do you have another example of an operator (let’s be broad and include ., ( and ( in the definition of operators) which uses a non-dunder method ? For example no part of Python that I know of relies on an object having an append method and then treats it as a mutable sequence, and that’s what makes keys unique in my view. The datamodel’s reliance on dunders for every emulation of a builtin behavior, and others, is a dedication to making these overrides of operators accident-proof, to making sure it was the user’s intent.

Ok, if we’re sticking with the current behavior, I agree. My issue is that the datamo page doesn’t say that, as you say, keys is required for, nor specific to, being a mapping. It says instead :
“The first set of methods” (meaning the len, getitem, setitem, delitem, missing, iter, reversed and contains dunders) “is used to emulate a sequence or to emulate a mapping.” […] “It is also recommended” (emphasis mine) “that mappings provide the methods keys(), values(), items(), get(), clear(), setdefault(), pop(), popitem(), copy(), and update() behaving similar to those for Python’s standard dictionary objects.”
As you can see, not only is keys only “recommended”, not described as required, it is also put on an equal footing with the other dict methods which to my knowledge aren’t used by any builtin mechanism or operator.

That’s my proposition 2 : making this clear and documented, adding keys to the list of method names described in that page, as the name associated with the unary ** operator. Or at least saying that it’s used by the system to recognize mappings (even without saying in what situation), to tell people that what they write can have unexpected behavior, just like any dunder listed in the page.

Rosuav · July 14, 2023, 9:14pm

I actually can offer one example from Python 2, and that’s the next method on iterators. And in Python 3, that was changed to __next__. Does anyone know whether keys was considered for the same change at the same time, and if it wasn’t, what the reasoning for the difference was?

da-woods · July 14, 2023, 10:16pm

The dunder reserved methods tend to match up with the C API type slots (which __next__ does but keys() doesn’t). Although I’m sure there’s exceptions to that rule.

So in my mind “reserved” means “we may associate this name with a C type slot in future”, which tends to come with restrictions on signatures and other other sightly unusual behaviour. While just using the method in some duck-tired typed Python API is much less drastic.

TeamSpen210 · July 14, 2023, 11:44pm

For match, using non-dunder names wasn’t a concern since it requires objects to inherit from Mapping. I wonder if it’d be a good idea to change **kwargs to also check for that, and deprecate then remove just looking for keys()? Might not really have sufficient improvement though to justify the churn.

Gouvernathor · July 15, 2023, 12:33am

My thinking was this :

If we add a new function to do it, like __keys__, we need to support the current way at least during the transition period and possibly for ever. The new mapping classes are prettier, sure, but the old name stays effectively reserved since ** keeps using it as long as it doesn’t find __keys__. Not good enough.
Playing with only what we have in current mappings lead me to proposition 1. __iter__ is supposed to exist and return the keys, and only mappings (not sequences) take their iterated values as indexes. That should work with existing mappings.
Checking for something other than a method’s absence, presence or behavior, checking the inheritance directly, would make the ** unary operator be the single one operator not to only rely on that : the only operator not to be duck-typed, as Terry put it. I don’t think that’s a good idea. I’ll concede that it’s not unheard of in Python though : the raise/except mechanism relies on the objects being actual (not even virtual) subclasses of BaseException. But in this context, I still think we shouldn’t go that way.

That’s not what it means. There are a lot of uses of dunders, in the stdlib usually, which have nothing to do with type methods or even attributes. For example, Signature’s __validate_parameters__ parameter, the __main__ and __init__ module/file names…
Also, to my limited understanding of the code linked in the other thread as the implementation of **, it seems to expect something called “PyMapping_Keys”, which sounds an awful lot like a C type slot name to my uneducated ears.

ajoino · July 15, 2023, 8:21am

Is this an issue in practice? I have never seen it brought up before which leads me to think the current behaviour is fine.

Gouvernathor · July 15, 2023, 4:37pm

If the current behavior is fine, then let’s document it.

Gouvernathor · July 28, 2023, 5:13pm

I did a PR and an associated issue.

tjreedy · July 29, 2023, 1:06am

Because accessing a.__iter__ or the equivalent as a.keys().__iter__ avoids iterating at least some non-mappings.

There is no unary ** operator in the sense of something that maps an object to an object. By itself, **something is a SyntaxError rather than an expression that can be evaluated. In particular contexts, it is a special syntax construct. In function definitions, it specifies a particular handing of call argument objects. Elsewhere (only in call and displays I believe), it is equivalent to **{}.update(something) where the ** indicates ‘unpacking’.

The linked text begins

2.3.3. Reserved classes of identifiers
Certain classes of identifiers (besides keywords) have special meanings. These classes are identified by the patterns of leading and trailing underscore characters:

It then discusses the 4 cases of things beginning with _'. The __*__ entry links to the Special method names section of the Data model chapter (3.3). This section is about dunder names.

Method keys() is ***not*** an operator override. If we think of '**' as a special syntax operator, then the override is the special method keys().iter`. As said above and discussed on the initially linked thread, the indirection is for duck typing, not for any actual computation.

They are free to do so because ‘keys’ is not a reserved word.

[keys] is also put on an equal footing with the other dict methods which to my knowledge aren’t used by any builtin mechanism or operator.

.items( and .values( occur about 980 and 310 times in the Python-coded part of the 3.12 stdlib. (I don’t know the proper C equivalents to search the C code.)

Gouvernathor · July 29, 2023, 2:00am

By that, you mean that that the keys method doesn’t actually provide a computational feature inaccessible by other means, but is rather a marker of “being a mapping”/“not being a mapping” ? (or “supporting/not supporting keyword unpacking”)
Yes, sure. But I don’t think it changes much anything to the problem : the interpreter will use .keys().__iter__() to implement unpacking, so using that method has an impact on the behavior of the object in the context of the ** kw-unpacking syntax. Much like __iter__ changes how an object behaves in pos-unpackings and in the for ... in syntax, which I would almost consider to be an operator (I understand why you disagree and why it’s not true strictly speaking).

In other words, if keys (or keys().__iter__ if you prefer) is not an operator override, then neither is __iter__ ; but if __iter__, because of its role in native Python syntax, deserves to be a reserved name, then so does keys.
I understand the distinction you made between operators overrides and not operator overrides, but practically speaking there is a reason why __divmod__ and __sub__ are cited very closely to each other, as well as __floor__ and __neg__, despite the first ones being builtin function overrides and the seconds being proper operator overrides. They are reserved names for behaviors native to Python which act in a duck-typed way. As the doc puts it, they are “special method names”.
So practically speaking, the -a override has the same status as divmod override, which has the same status as the for ... in and *expression override, which should have, in my view, the same status as the **expression override.

That’s as much a problem as if they were using an __iter__ attribute to print the GDP of Saskatchewan in the console and return None. I don’t understand why it would be a problem in the case of __iter__ but not in the case of keys (or keys().__iter__).

Ok, but that doesn’t mean the methods are used in order to assume whether or not something is a mapping. Consider the difference between these two different examples:

def fop(d:dict, **kwargs):
    """
    Uses items because kwargs is necessarily a dict
    Only supports d being a dict or a subclass of dict (and knowingly fails otherwise, even in the case of mappings)
    """
    for k, v in kwargs.items():
        for val in d.values():
            ...

def bor(par):
    """
    Tests whether something is a mapping based on the presence of unreserved methods
    Also poorly manages exceptions
    """
    if hasattr(par, "items"):
        treat_as_mapping(par)
    
    try:
        treat_values(par.values())
    except:
        pass

The non-dunder methods on builtin types, to my knowledge, are used for the former, and keys is apparently the only one being used for the latter.

dmoisset · August 3, 2023, 4:59pm

The proposal of using __iter__ may lead to running code that used to fail, and should be reasonably expected to fail.

Consider the following

a = [0, 0, 0, 0]
d = {**a}

This code fails today. If the proposal to stop using keys() was implemented, this would set d to {0: 0}, which is useless, unexpected, and most likely hiding a bug

Gouvernathor · August 3, 2023, 6:47pm

Oh, fair point. Ok, I removed that from my original post, in favor of the documentation update.

Gouvernathor · August 4, 2023, 3:17pm

@tjreedy (and @rhettinger if you both agree), you didn’t clarify your reasons to disagree with what I wrote a week ago, so I’m summing up the different points.

Yes, ** / keys / keys().__iter__ is not an operator override, but it is a syntax override much like __iter__, __divmod__, __neg__ and many others. Why do you think some syntax or operators should be handled by methods with protected names, but others not ?
(NB: I didn’t say “with dunder names”, I said “with protected names”)
Do you believe it is fine to let a name used as a syntax override be used for methods with random purposes ? Do you think it would be fine if what was done using keys in the examples I linked earlier, was done using __iter__ ?
You say people are free to use keys as a name to do just about anything because it’s not a reserved keyword. But if you’re saying that because it doesn’t start and end with underscores, then I think it’s circular logic. The dunders reservation rule exists because dunders’ (main) use is for special method names (do you disagree with that ?), so a syntax method with a non-dunder name should be protected just as well and for the same rationale. If you’re unhappy with that discrepancy in the data model, then I wholeheartedly agree, that’s why I wanted to change it in the first place ; but if you don’t want to change it, why don’t you want to fully support and document it ? Or, if it’s not going to change anytime soon, at the very least warn about its existence ?
You mentioned values and items being present often in the base code, but as I had said earlier I have no problem with builtin code calling methods, documented or not, on builtin types. I just don’t see how that’s relevant. You have described keys as “an indirection for duck typing”, which if I understand correctly (and you didn’t correct me if I didn’t) means it serves as a marker of whether something should be considered a mapping or not. That’s what makes keys unique among other methods of builtin types like append, intersection, open, close, values or items. They are used more than 1290 combined times in the base code, sure, but is any of the others treated as a duck-typing marker, even once ?
In the Issue you said “OP claims that ‘is it a mistake’ to use ‘keys’ with meanings other than the one intended for mappings. Not in python. It is routine to reuse builtin names and and attribute names.” Is it “routine” to do that with names which are used for duck-typing by the syntax interpreter itself ? Can you find one example ? @Rosuav, earlier in this thread, did not.

Someone in the issue told me to “listen carefully to arguments, concerns and objections made by others”, and I have, that’s why I modified my mentions of “operator override”, renamed this thread, and later reversed my stance on changing the behavior, for instance. But I can’t listen to arguments if you don’t make them, and in that case it’s not very fair to ask me to do that.
If you only stonewall with simple “for/against” comments, or only answer when people are moving forward and opening issues/PRs, that’s not encouraging people to play by the rules and engage in productive discussion down here.