Supporting (or not) invalid identifiers in **kwargs

I have only Python 3.10.4 at hand and here it throws a (not unexpected) exception:

>>> "my.domain.name".split(**{S('xxx'):'.'})
sep
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: null argument to internal routine

The interpreter does not crash.

…ok I did not have experience with SystemError. According to the documentation it should not appear:

https://docs.python.org/3/library/exceptions.html#SystemError

Aha. You should totally file a bug report on GitHub for that. Thanks for finding this!

3 Likes

I meant that I just get a === RESTART: Shell === in 3.8.

However, in 3.10.2 with the same S in a file, after printing sep, I get this traceback:

Traceback (most recent call last):
  File " ... \kwargsbomb.py", line 19, in <module>
    c = u.split(**{S('xxx'):'.'})
SystemError: null argument to internal routine

Amazing how much re-work this code is getting from one version to the next.

And the same on 3.11.0b4

I can do that. (Custom str subclass as keyword argument leads to System Error · Issue #94938 · python/cpython · GitHub)

2 Likes

(Custom str subclass as keyword argument leads to System Error · Issue #94938 · python/cpython · GitHub) has been fixed and closed. Jeff’s code above now gives

TypeError: invalid keyword argument for split()
3 Likes

Glad that bug got fixed! – but where are we on the documentation question. It seems there are two options:

  1. This is a supported feature of Python

or

  1. This is an implementation detail that may or may not be supported in other versions / implementations of Python.

I think not documenting it is a clearly bad idea.

If (1) then it should be documented so other implementations know that it needs to be supported. And some clarity on exactly what is supported would be good.

if (2) then it needs to be documented so users are warned that they are relying on an implementation detail.

Given that there is code in the wild that uses this feature, and has for years, I think (1) is probably the way to go.

That leaves a clear definition of what “this” is. Here’s a stab at it:

In all instances where a python name is specified/stored in a container or variable, the name must be a string, but may be a non-valid python identifier.

Examples:

__dict__ and **kwargs may not have non-string keys, but may have non-valid strings as identifiers (e.g. "this-name").

getattr() and setattr() will except only strings, but those may not be valid identifiers. (e.g. setattr(obj, "this-name", value)

As for the “wild side of Unicode” – I have no idea how to define or describe that – but it would be good to do so – at least to say that the results may be undefined.

Along those lines – I recall a fairly recent conversation somewhere (maybe python-dev?) about the normalization of Unicode in identifiers – e.g. various versions of letters that are essentially a font different (like Blackboard Bold) are normalized to the same letter by the interpreter:

In [6]: ℂhris = 5

In [7]: Chris
Out[7]: 5

(that C is "\u2102" in the first instance, and a regular C in the second)

But this transformation is not done by setattr(), nor I assume (Not tested) by any direct setting to dict and the like.

In [14]: setattr(obj, "ℂhris", 5)

In [15]: getattr(obj, "Chris")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-ddf0ae72b865> in <module>
----> 1 getattr(obj, "Chris")

AttributeError: 'Dummy' object has no attribute 'Chris'

In [16]: getattr(obj, "ℂhris")
Out[16]: 5

This should probably be documented as well.

1 Like

I completely agree about the need for a clarification.
I’m not convinced by any of the arguments for allowing and documenting the current behavior, your option (1). I understand the argument for leaving the code as it is, but not for pledging support for it in the future.

I don’t think we ought to support the code that currently uses it, and we should warn of its unstability and that it’s an implementation detail as soon as possible.
To my knowledge, such things have already been done. For example, the following function signature worked in py2 (and possibly later) def place(label, (x, y), color="#f00"), while in py3 it doesn’t. Granted, it’s the py2-3 transition, but I never saw any warning about it in various things I read about that transition, or in the PEPs describing the current arguments syntax (pos-only arguments, keyword-only required arguments…). It was never documented, and deserved to go without a proper funeral when it was deemed necessary.

I think __dict__ would probably be the only exception, I’d even be fine with it accepting any and all keys, much like the globals() dict.
In any other case, it should be left unguaranteed, as an implementation detail, that 1) non-indentifiers and 2) non-strictly-string objects (meaning string subclasses and other classes) are supported. Even if my point 1) goes as documented, we should leave 2) as an implementation detail.

Lastly, I believe the last part about Unicode normalization is another issue. It should be adressed, and probably be deemed an implementation detail, but I don’t think this thread is the place for it - though it’s a good illustration of the dangers of allowing non-identifiers.