Normalize attribute names in default object get/set attribute

thatbirdguythatuknownot · October 14, 2024, 1:29pm

Attribute names are considered identifiers when accessed or set directly, which causes them (if they’re unicode) to be normalized. However, getattr(), setattr() or the default object attribute getter/setter doesn’t do that. This results in weird behavior like this:

class Foo:
    µ = 0

print(Foo.µ)  # 0

setattr(Foo, 'µ', 1)
print(Foo.µ)  # 0
print(getattr(Foo, 'µ'))  # 1

Foo.µ = 2
print(Foo.µ)  # 2
print(getattr(Foo, 'µ'))  # 1

[Character taken from this weird identifier normalization post.]

As demonstrated above, the direct identifier attribute refers to a different object from the dynamic string attribute, which can be a source of unexpected behavior.
Therefore, why not normalize the strings passed to getattr(), setattr(), or the default attribute getter/setter?

MegaIng · October 14, 2024, 1:44pm

Why not is easy to answer: It’s less work and doesn’t affect 99.9% of programs. Also, these functions accept a lot more than just identifiers, they in fact accept any string. Checking/Normalizing would mean a probably non-trivial amount of work for all programs, just to improve semantics in a specific edge case.

But I also agree that this is confusing and surprising behavior - Although I would almost say it would be more reasonable to reject identifiers that aren’t normalized in the source code.