I am aware keys must be unique (set like), and values may not be unique.
d1 = {'a': 1, 'b': 2}
d2 = {'b': 2, 'a': 1}
d3 = {'a': 1, 'b': 2, 'c': 3}
d4 = {'a': 1, 'b': 222}
# Comparing whole dictionaries
print(d1 == d2) # True, as I expected
print(d1 == d3) # False, as I expected
print(d1 == d4) # False, as I expected
# Comparing keys
print(d1.keys() == d2.keys()) # True, as I expected
print(d1.keys() == d3.keys()) # False, as I expected
print(d1.keys() == d4.keys()) # True, as I expected
# Comparing values. Answer always False. Why?
print(d1.values() == d1.values()) # False; I expected True. Or NotImplemented
print(d1.values() == d2.values()) # False; I expected True. Or NotImplemented
print(sorted(d1.values()) == sorted(d2.values())) # True, as I expected
This behaviour was surprising to me. Particularly noting that comparison of entire dicts works on keys and values.
Would it be better for dict.values.__eq__ to throw NotImplemented, rather than the (surprising, to me) values of always False (even on comparison to self)? Or to do a logical comparison behaviour (as dicts as a whole (seem to) do?
This seems to me to be a possible source of errors.
@methane wrote “There is no reasonable semantics for values view. Keep it unimplemented.”
I’m not clear on why there are no reasonable semantics? Perhaps they or someone could shed some light on that for me? (Noting it does seem to work with reasonable behaviour on dicts as a whole)
Or why it doesn’t throw NotImplemented? Which may be a safer option?
# Comparing values. Answer always False. Why?
print(d1.values() == d1.values()) # False; I expected True. Or NotImplemented
print(d1.values() == d2.values()) # False; I expected True. Or NotImplemented
I think you are misunderstanding the ==. == is designed to work for any objects, even if they don’t support == operator. In other words, NotImplemented becomes False.
>>> "0" == 0
False
So, d1.values() == d2.values() become False matches your expection “Or NotImplemented”.
keys(), values(), items() have consistent order (ditto in Python 3).
d1 = {'a': 1, 'b': 2}
d2 = {'b': 2, 'a': 1}
print(d1.values() == d2.values()) # What returned here?
What do you expect here? In Python 2 semantics, this is 50% True and 50% False.
I believe no one think “True in 1/len(d) possibiity, False otherwise” semantics is good.
There are some possible semantics:
Compare as sequence
Compare as sets
Compare as multisets
I don’t think none of above is good default semantics.
I think the documentation is misleading to the point that it is wrong.
dict.values equality eventually falls back on the default semantics for equality, which tests for identity.
>>> d = {'key': 1}
>>> x = d.values()
>>> x == x
True
The reason d.values() != d.values() is not because values objects always compare unequal, but because each time you call the method, it returns a new (distinct) object.
two values objects are equal if they contain the same elements, the same number of times, in any order;
or two values objects are equal if they both are views of the same dict.
The second test is almost as cheap as an identity test on the values objects themselves. Instead of:
return self is other
we return self.owner is other.owner (except that the dict owning the values object is not exposed to Python code, so I guess this test will can only be written in C).
The first test is more expensive. I cannot think how to do the test efficiently. If they were hashable, we could use Counter() but they aren’t always hashable.
Only the first way guarantees the expectation that if d1.items() == d2.items() then d1.values() == d2.values().
The computation is O(n**2) is the general case, but the algorithm can try the cheaper common cases first and fallback to slower algorithms if needed.
Step 1: len(v1) != len(v2) -> False # O(1) screen out trivial mismatches
Step 2: Counter(v1) == Counter(v2) # O(n) if the values are hashable
Step 3: sorted(v1) == sorted(v2) # O(n log n) if the values are orderable
Step 4: slow way # O(n**2) using only an equality relation
c = list(v1)
for v in v2:
try:
c.remove(v)
except ValueError:
return False
return not c
That is, order is not important and equality works (on each and all elements) for whole dict, keys() and items(). Understanding values are not necessarily hashable or orderable, and may have duplicates, it still seems to me that this behaviour is surprising (it was for one of my students, anyway).
How is comparison for whole dict working?
A question is, should this return True? Following principle of least surprise, I suggest that it should.
The problem is matching up values to check for equality. For keys() and items(),you have the keys that have to be hashable, so you can loop over one and then look up the pair in the other. But for the values(), there can be duplicates and unhashable, even unorderable values.
We could reasonably see that your example should return true, but what about these?
Every value in the first is equal to a value in the second and vice versa, but you’d potentially have to match up all 16 pairs to figure out that the item numbers don’t match.
Yes, @TeamSpen210, that is a good example. Yes, coming from the keys() is useful (and “easy”). And I guess that’s what eq on the whole dict or items() is doing.
The values of these dicts are not hashable, but they are orderable, so @rhettinger 's algorithm of
sorted(a.values()) == sorted(b.values())
would “work” (that is, compare all the values and check that there were the right number of each)
Good point. I have two dicts of data - keys are the same - have any of the values changed? Old versus new.
The problem I see with equal on values returning False is that it implies that they (the values) are different.
Although, I admit, in this case, a straight compare of the dicts as a whole would work.
I think now you know why I said “no reasonable semantics”.
There are four possible semantics. But none of them are “reasonable”.
Multiset semantics would be the most consistent with .keys() == and .items() == semantics.
But it is slow and complex because values can be not hashable, total ordered, and even not comparable.
Unless this semantics is the common use case, I don’t feel it is reasonable.
User knows about values in dict and their use case.
So user can chose collect comparison from:
Yes, I can see that. I don’t think my example was good, but I still get your point.
And I get that the default comparison for objects is identity, and that is why this gives False.
I am still concerned about the return of False on comparison, though. I think it’s misleading, as it implies it is doing a “reasonable comparison”. That it always returns False (even on comparison to self) could be confusing. (I understand why that’s happening, I’m just considering whether it is helpful or confusing for it to do that.) I note Julian in Issue 12445 had the same confusion, so I suspect I might not be alone. There is a difference between “not comparable” and “not the same”. But unfortunately == on values returns a False (in my mind, implying not the same) when what it really means is “not comparable”.
I do note that the docs make a point of this, but that might not be the first place someone looks. And I’m putting on my thinking cap for a good use case.
One option could be that it raise NotImplemented, although that is not usual, and might have other issues I am not aware of. I am wondering whether a NotComparable might be relevant?
Alternatively, doing the tests that @rhettinger listed? (Noting a significant performance implication, and a bit of work to enable).
Just saw @rhettinger doing a Twitter poll on “what does this do” - very interested to see the outcome!
Yes. .values() are an unordered bag of objects. You have the same objects in each bag, so they are equal.
“I don’t think that is uselful comparison.”
In that specific example? Maybe not. It depends on why you are comparing them, and what the semantics of your dict is.
Regardless of whether the behaviour of .values() equality is changed or not, the documentation is still wrong if it claims that they always compared unequal. values equality tests for object identify.
It is quite unfortunate that the documentation literally states:
This also applies when comparing dict.values() to itself: >>> d = {'a': 1} >>> d.values() == d.values() False
The statement literally means that an equivalence comparison of identical objects would return False, which isn’t so. The example code that follows does not constitute a comparison of a dict.values() object to itself. Rather, it compares two objects to each other that have the same content. That documentation ought to be revised.
EDITED 2x on April 6, 2022 to make a slight revision to the above text, and to add the following:
The statement could be revised as follows:
This applies even when comparing two individual dict.values() objects that are derived from the same dictionary:
…
Perhaps a better example. I have some employee data, pulled from different sources, loaded into two dicts. One is by employee id, the other by their initials. Have I got everyone?
User know what they want to do. But dict and values_view doesn’t know.
So manually chose comparison is reasonable. (“Explicit is better than implicit”)