Add Difference Operators To Dict

eladshoshani · April 21, 2024, 1:13pm

Why is it a problem that the values from the second dict are ignored? Its reasonable and as I showed above has many use cases that would benefit the - and -= operator.
If you would look at the examples above you would also notice that this operator works great combined with the |= existing operator (for example feature |= cfeature - feature).

Rosuav · April 21, 2024, 1:17pm

TBH I don’t think it is a problem. However, you haven’t shown a good reason for the semantics you’re pitching. I can offer one example of a language with “ignore the value” semantics, which you were unable to find, but it’s still just one language and you’ll need to put in some leg-work finding others if you want to convince people of this.

$ pike
Pike v9.0 release 2 running Hilfe v3.5 (Incremental Pike Frontend)
Ok.
> (["a": 1, "b": 2, "c": 3]) - (["a": 1, "c": 4]);
(1) Result: ([ /* 1 element */
              "b": 2
            ])
> (["a": 1, "b": 2, "c": 3]) - (<"b">); // subtracting a set also works
(2) Result: ([ /* 2 elements */
              "a": 1,
              "c": 3
            ])

Start researching!

elis.byberi · April 21, 2024, 1:19pm

a = {1:{1:2}}
b = {1:{1:2}}

print(a == b)  # True

a = {1:{1:2}}
b = {1:{1:2000}}

print(a == b)  # False

If you ignore values, a - b would be empty for every possible value, which is not true. That can be very confusing.

eladshoshani · April 21, 2024, 1:21pm

We do not need other languages that have this feature to be convinced that it is a good feature for python.
The research I have done shows that this feature is extremely useful in python, about half of the python files that use dict comprehension can benefit from it.

eladshoshani · April 21, 2024, 1:23pm

That the same problem as here:

a = {1:{1:2}}
b = {1:{1:2}}

print(a | b) # {1:{1:2}}

a = {1:{1:2}}
b = {1:{1:2000}}

print(a | b) # {1:{1:2000}}

That’s very confusing, but we have this feature anyway because of its upsides.

Nineteendo · April 21, 2024, 1:26pm

I think it would be annoying if only dicts were supported, as you would need dummy values to remove a set of keys:

dct = {"a": 1, "b": 2, "c": 3}
dct -= {"a": None, "b": None}
print(dct)  # {"c": 3}

And as we don’t need values to begin with (unlike __or__), I would focus on subtracting setlikes / iterables.

franklinvp · April 21, 2024, 1:28pm

Try next composition of dict.

Rosuav · April 21, 2024, 1:33pm

No, but we DO need to know what has been done and why. Sometimes the result will be a decision to do something different, but even then, the decision is made in the light of the way other languages have tackled the same problem. PEP 308 is a great example of that.

And when you have multiple languages that do nearly the same thing but with different meanings, it gets extremely confusing to try to explain. Remember, not everyone writes Python code and absolutely nothing else. (I won’t say anything about how many Python programmers also use other languages because I’ve no idea, beyond that it’s a non-zero percentage.) Doing something the same way has some benefit directly; knowing what others have done has huge benefit.

elis.byberi · April 21, 2024, 1:43pm

That’s not true, and please try to stay on topic. If you think it is true, please open a thread in the Help section.

If I were to say that the difference of two unequal dicts is an empty dict, it wouldn’t make sense. You should either find another operator or think about a method instead.

I always check if two given dicts are equal because key values matter. If they are not equal, I have a homemade function to show their differences (including value conflicts). This is used only during testing. Please stop thinking of dicts as merely sets of keys; they are more than that.

Nineteendo · April 21, 2024, 1:46pm

As all the examples of other languages so far support keys, I would start with that.
The only advantage of allowing dicts is that dct1 - dct2 is 7 characters shorter than dct1 - dct2.keys() which isn’t sufficient to justify the confusing behaviour.

tjreedy · April 21, 2024, 10:58pm

Some of the supposed use cases involve d1.update(d2) but with the priority shifted from dict2 to dict1. Hence, d1.update(d2-d1) to remove duplicates before the update. It seems to me that this should be better handled with a new keep='new' parameter passed as keep='old' for the alternate behavior. This avoid creating an temporary dict. To not do inplace, `d3 = d1.copy().update(d2, keep=‘old’).

blhsing · October 18, 2024, 3:42am

I don’t see how you can possibly get O(1) if the iterable is a set or a dict. Can you please elaborate? I’m pretty sure the time complexity would be O(min(len(dict), len(set))) in that case.

I think allowing dict - iterable makes a lot of sense, or make it a dict.exclude(iterable) method.

jamestwebber · October 18, 2024, 3:56am

I think you misunderstood what Paul was saying. It was probably more obvious as part of the conversation when it actually happened, 6 months ago.

The point is that key in iterable is O(n). If dict - iterable were a valid operation, it would be tempting to use it, leading to an O(n^2) performance trap as Python iterated over each key in the dictionary and checked for its presence.

blhsing · October 18, 2024, 4:07am

I don’t think misunderstand what Paul was saying. Paul was implying that to perform dict - iterable one would have to iterate over the dict to perform an O(n) lookup on the iterable, resulting in O(n ^ 2), when in fact the iterable can be iterated over to perform O(1) lookups on the dict for each item, resulting in O(len(iterable)), which isn’t much worse than O(min(len(dict), len(set))) when right operand is set-like.

But I now see that Alice already responded to Paul with what I wanted to say, so my apologies for repeating what was already said.

blhsing · October 18, 2024, 4:53am

Oops I really should’ve read through the thread before responding to the post above right away. I now see why you believed that one should iterate over the dict instead of the iterable.

Instead of copying everything in the dict on the left and then iterating over the iterable on the right to delete keys from the dict copy, which incurs a lot of memory overhead, I think it would cost a lot less to create a a temporary set from the iterable with only items that are in the dict so to perform dict - set in the usual way:

def dict_exclude(d, iterable):
    to_exclude = set(filter(d.__contains__, iterable))
    return {k: v for k, v in d.items() if k not in to_exclude}

xitop · October 18, 2024, 10:02am

I don’t feel qualified to add new insights, but there are use-cases for both deleting given set of unwanted keys (using the word set as a general term, not as a Python type) and deleting keys NOT present in the given set of allowed keys.

If there will be a result of this proposal, I think it should come in the form of a pair of two functions or operators.

jcampbell05 · October 19, 2024, 10:51am

To me a d1 - d2 operator would make more sense as union the two dictionaries and subtract the values of the matching keys

So

{“a”: 1, “n”: 2} - {“n”: 1}

Would result in

{“a”: 1, “n”: 1}

eladshoshani · October 19, 2024, 11:18am

You can already achieve this easily by using d1.items() - d2.items().

Also, the use common use cases for the new operator would be to replace things like:
{k:v for k,v in d1 if k not in d2}

בתאריך שבת, 19 באוק׳ 2024, 14:01, מאת James Campbell via Discussions on Python.org ‏<notifications@python1.discoursemail.com>:

mwr · October 19, 2024, 12:02pm

Is a new set being created there?
It seems to me that this isn’t always the best approach if these are collections of millions or even more items. It seems to me that in this case it’s better to have the result in the form of a generator or even just create a difference in place.

jamestwebber · October 19, 2024, 3:04pm

If this proposal were to move forward^[1], that’d be the equivalent of d1 - x and d1 & x. That is, the first is difference, the second is intersection with some collection of keys.

We already have that though, that’s a Counter. It only applies when you have a dictionary of objects that work with subtraction. It’s much less generally useful than set operations on the keys, in my opinion.

I’m not particularly confident that it will, to be honest ↩︎