Can this bug be detected? + Discussion about duplicate keys

Nineteendo · November 9, 2024, 1:23pm

Is there a linter that can find the bug in this code? I’m using flake8, mypy, pylance (pyright), pylint and ruff. (at the strictest settings)

class DuplicateKey(str):
    """A key that can appear multiple times in a dictionary.

    >>> k1, k2 = DuplicateKey('key'), DuplicateKey('key')
    >>> {k1: 'value 1', k2: 'value 2'}
    {'key': 'value 1', 'key': 'value 2'}
    """
    def __eq__(self, value):
        return self is value

    def __hash__(self):
        return id(self)

Spoiler, this is the bug:

>>> k1, k2 = DuplicateKey(), DuplicateKey()
>>> k1 == k2
False
>>> k1 != k2
False

saaketp · November 9, 2024, 2:01pm

Probably subclass-builtin (FURB189) | Ruff
It is in preview, so you’d have to run with --preview for now.
ruff check --select=FURB189 --preview file.py

Nineteendo · November 9, 2024, 2:22pm

Ruff for VS Code is still on v0.7.1, so this rule isn’t yet available for me, but this issue isn’t exclusive to builtins:

class FooBase:
    def __init__(self, bar=0):
        self.bar = bar

    def __ne__(self, value):
        return not isinstance(value, FooBase) or self.bar != value.bar

class Foo(FooBase):
    def __eq__(self, value):
        return self is value

    def __hash__(self):
        return id(self)

>>> f1, f2 = Foo(), Foo()
>>> f1 == f2
False
>>> f1 != f2
False

Note that a user writing a subclass can’t easily determine which of the 2 methods is implemented. Let alone be aware of this problem.

saaketp · November 9, 2024, 2:31pm

Having that not isinstance(value, FooBase) condition in __ne__ probably just means that the class doesn’t support subclassing if this was in real code. If you mark FooBase as @final, then mypy and pyright will catch it.

Nineteendo · November 9, 2024, 2:39pm

That condition just ensures the bar attribute exists for the second check. (which is fine) You could also return NotImplemented:

def __ne__(self, value):
    if not isinstance(value, FooBase):
        return NotImplemented
    return self.bar != value.bar

saaketp · November 9, 2024, 2:43pm

Ah sorry, yes that condition doesn’t necessarily mean it should be @final.
Though making it @final does prevent this .

JamesParrott · November 9, 2024, 3:29pm

Interesting. I think it’s a carefully crafted run-time bug in the program’s logic, not detectable by static typing analysis or enforcing linting rules. Is that your point?

If you’re going to add customised weird implementations of methods like __eq__, or any other hacks, extensive tests also need to be added to guarantee all behaviours are as intended. Don’t rely on Ruff et al. to check for every possible bug under the sun.

Or is there some compiler, e.g. Rust or other tool that would catch this before run-time, in any language?

Nineteendo · November 9, 2024, 3:38pm

Not per se, I believe you can just check that if you overwrite __eq__(), __ne__() should also be overwritten if defined in a super class.

Very true It’s just that this could have slept under my radar if I hadn’t used != in my tests. Although Ruff should probably warn me if __hash__() is overwritten, but not __eq__().

JamesParrott · November 9, 2024, 3:46pm

Have I not understood the need to override __ne__? If that method is deleted, the equality operations work more normally:

class FooBase:
    def __init__(self, bar=0):
        self.bar = bar


class Foo(FooBase):
    def __eq__(self, value):
        return self is value

    def __hash__(self):
        return id(self)

f1, f2 = Foo(), Foo()

assert not f1 == f2
assert f1 != f2

Nineteendo · November 9, 2024, 3:49pm

That’s because if it’s not defined it falls back on not obj1 == obj2.

JamesParrott · November 9, 2024, 4:03pm

Ah. So, I’m not saying duplicate keys are a good idea (far from it! ) , but was the original issue in the str sub class because str does something custom in str.__ne__ ?

This allows k1 == k2 and k1 != k2 (hopefully that was the intent!):

class DuplicateKey(str):
    """A key that can appear multiple times in a dictionary.

    >>> k1, k2 = DuplicateKey('key'), DuplicateKey('key')
    >>> {k1: 'value 1', k2: 'value 2'}
    {'key': 'value 1', 'key': 'value 2'}
    """
    
    _count = 0
    _counted = False
    
    
    def __eq__(self, value):
        return self is value

    def __hash__(self):
        if not self._counted:
            self._counted = True
            self._count = DuplicateKey._count
            DuplicateKey._count += 1
        return hash((self._count, str(self))) 
        
    __ne__ = object.__ne__
        

k1, k2 = DuplicateKey(), DuplicateKey()

k1, k2 = DuplicateKey('key'), DuplicateKey('key')


assert not k1 == k2
assert k1 != k2
print({k1: 'value 1', k2: 'value 2'})  # Expected: {'key': 'value 1', 'key': 'value 2'}

Nineteendo · November 9, 2024, 4:19pm

It uses rich_compare (See python - Should __ne__ be implemented as the negation of __eq__? - Stack Overflow):

github.com

python/cpython/blob/6293d00e7201f3f28b1f4512e57dc4f03855cabd/Objects/unicodeobject.c#L11250-L11254


      
          else if (op == Py_EQ || op == Py_NE) {
              result = unicode_eq(left, right);
              result ^= (op == Py_NE);
              return PyBool_FromLong(result);
          }

Nice, I thought I had to define my own function (I can do the same for __hash__().

Nineteendo · November 9, 2024, 5:00pm

The stack overflow answer says this:

The __ne__ method follows automatically from __eq__ only if __ne__ isn’t already defined in a superclass. So, if you’re inheriting from a builtin, it’s best to override both.

Rosuav · November 9, 2024, 6:47pm

The bug is here:

class DuplicateKey(str):

You’re doing something completely unnecessary and it’s giving you behaviours you don’t need.

Solution:

class DuplicateKey:
    def __init__(self, lbl):
        self.label = lbl
    def __repr__(self):
        return repr(self.label)

I wouldn’t expect a linter to detect all bugs though.

elis.byberi · November 9, 2024, 7:12pm

This is true only if the value is the object itself.

There is no bug in your subclass. The only issue is with the statement: ‘A key that can appear multiple times in a dictionary.’ These instances may have the same string values, but they are two separate instances. I am not aware of any case where two different instances share the same memory, except for primitive data types.

saaketp · November 9, 2024, 7:28pm

Yeah, I feel this is the right answer. What OP is considering a bug is that __ne__ is not the opposite of __eq__, but the language allows defining both, so this can happen if you are not careful in how you define both (or maybe by design in some libraries).

Nineteendo · November 9, 2024, 7:37pm

I should have used an abstract example instead of a hack to have duplicate keys (the discussion wasn’t about that):

class FooBase:
    def __ne__(self, other):
        return not isinstance(other, FooBase)

class Foo(FooBase):
    def __init__(self, value):
        self.value = value

    def __eq__(self, other):
        return isinstance(other, Foo) and self.value == other.value

But no your solution isn’t ideal for me. By subclassing string I don’t need special handling in the encoder of my json library and it works with a DuplicateKey type from another library without needing to register it somehow.

elis.byberi · November 9, 2024, 7:44pm

Where do you see these duplicates? Could you elaborate further?

Nineteendo · November 9, 2024, 8:11pm

There’s no syntactic restriction that prevents JSON objects from having duplicate keys.
But programming languages usually don’t support this, to preserve the exact structure of the file additional keys are converted to DuplicateKeys, which are a duplicate of the regular key:

'{"k": "v1", "k": "v2"}' -> {"k": "v1", DuplicateKey("k"): "v2")}

Rosuav · November 9, 2024, 8:11pm

So what you’re saying is, you want to be able to generate malformed JSON without having to specially code it. That’s a pretty weird thing to do, so I don’t really see why it’s a problem to have to explicitly write a method to do it (rather than just messing with equality and subclassing).