Expression `x in dict` causing an endless loop - a flaw in Python design?

ptsl · July 4, 2024, 5:01pm

Hi,

I’ve seen the following code written by a colleague:

class MyDict:
    def __init__(self, d):
        self._d = d

    def __getitem__(self, key):
        return self._d[key]

It’s clear that he forgot to implement the __contains__ method.

Later, he used an x in mydict test, which caused the entire application to free.

Here is what the documentation says regarding this topic:

Lastly, the old-style iteration protocol is tried: if a class defines __getitem__(), x in y is True if and only if there is a non-negative integer index i such that x is y[i] or x == y[i], and no lower integer index raises the IndexError exception.

Correct me if I’m wrong, but I don’t see any end condition here.
Is this a flaw in Python design?

kknechtel · July 4, 2024, 5:12pm

This could be counted as a bug, or at least an enhancement request, I suppose. For example, this part of the process could also require __len__ to be defined (by EAFP and converting the resulting exception), and only looping as far as that value specifies.

It’s true that contains is allowed to work on any container, and containers don’t need to have a __len__ - for example, it could be a text file, where the length (as counted in lines of text) isn’t known in advance, but the in operator is defined (it will check whether any line of the file is equal to the specified value). However, in these cases one would expect __iter__ to be defined, so it should be handled by the previous step described in the documentation.

zware · July 4, 2024, 6:01pm

I think there’s another bug somewhere in your example program causing the endless loop; Python returns a KeyError for me:

$ cat contains_test.py 
class MyDict:
    def __init__(self, d):
        self._d = d

    def __getitem__(self, key):
        return self._d[key]


md = MyDict({})

'test' in md
$ python3.12 contains_test.py 
Traceback (most recent call last):
  File "/tmp/contains_test.py", line 11, in <module>
    'test' in md
  File "/tmp/contains_test.py", line 6, in __getitem__
    return self._d[key]
           ~~~~~~~^^^^^
KeyError: 0

Perhaps the d passed to MyDict was a defaultdict?

$ cat contains_test.py 
from collections import defaultdict

class MyDict:
    def __init__(self, d):
        self._d = d

    def __getitem__(self, key):
        if isinstance(key, int) and key > 2**10:
            raise RuntimeError('infinite loop detected')
        return self._d[key]


md = MyDict(defaultdict(int))

'test' in md
$ python3.12 contains_test.py 
Traceback (most recent call last):
  File "/tmp/contains_test.py", line 15, in <module>
    'test' in md
  File "/tmp/contains_test.py", line 9, in __getitem__
    raise RuntimeError('infinite loop detected')
RuntimeError: infinite loop detected

I won’t argue that this isn’t a bit of a sharp edge, but I’m not sure there’s a reasonable fix for it.

jamestwebber · July 4, 2024, 8:46pm

Not without straight-up removing that protocol, which seems problematic.

Here’s another fun one:

>>> import itertools
>>> c = itertools.count()
>>> 1 in c
True
>>> 'a' in c
[ignores Ctrl-C, I had to manually kill the process]

cameron · July 5, 2024, 12:14am

I’d take the line that things are not necessarily mappings or sequences.
On this basis, the OP’s code is just an “overly vague” definition of
some kind of container. You can’t write “sanity check this class” stuff
because you do not magicly know what the coder is trying to implement.

But what the OP could be doing is subclassing one of the Mapping or
Sequence classes in types. Then the abstract class stuff will kick
in and require more specifics from the coder.

But if you’re writing something which is container-ish which doesn’t
obey these particualr specifications for a special purpose, having
something prevent you writing this stuff would be a bug.

A couple of relevant design rules from my sig quote file:

UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things. - Doug Gwyn

and:

There’s a fine line between cleverness and stupidity.
Nigel Tufnel, Spinal Tap

ptsl · July 5, 2024, 9:44am

I didn’t test the code in isolation, I just copied it from an application we’re working on.

In reality, the MyDict class is a façade to another dict-like object, which implements its own __contains__ method. The fact that MyDict didn’t have this method was an obvious bug.

ptsl · July 5, 2024, 9:47am

Do you, guys, think that I should escalate this issue? Or just leave it as it is?

JamesParrott · July 5, 2024, 10:03am

Not yet. Neither Zachary nor myself can reproduce an endless loop. If yourself or anyone else comes up with an MRX that causes an infinite loop, that’s worth posting here at the very least.

But there are a lot of gotchas when overriding dunder methods. In particular when defining __getattr__ naively for example, it’s incredibly easy to create an endless loop. Perhaps this is just one more.

My attempt:

\loop_bug.py", line 11, in <module>
    print(1 in m)
          ^^^^^^
  File "\loop_bug.py", line 6, in __getitem__
    return self._d[key]
           ~~~~~~~^^^^^
KeyError: 0

JamesParrott · July 5, 2024, 10:13am

This produces an endless loop, albeit only by doing something silly:

class OmniDict:
    def __init__(self, d):
        self._d = d
    def __getitem__(self, key):
        return "Omnivalue"

class MyDict:
    def __init__(self, d):
        self._d = d

    def __getitem__(self, key):
        return self._d[key]
        
        
m = MyDict(OmniDict(dict(a=1,b=2)))

print(1 in m)

ptsl · July 8, 2024, 8:53am

In the legacy code, my original example involves an inner object (referenced by self._d), which is itself a container. Its __contains__ method either accesses another dictionary or delegates the task to a callback if the dictionary does not exists. It’s quite complex, but it has worked for many years until I tried to use the class in another context.

Thank you all for a fruitful discussion!

JamesParrott · July 8, 2024, 9:25am

You’re welcome Peter - it’s an interesting problem, and presuming the container is a Sequence to check membership is a potential source of bugs, that could be prevented.

But if __contains__ is defined, Python should never try the old style iteration protocol. It all depends on the callback.