These functions are making the most elementary things so that they are modular.
Note that the OP’s question is simply solved by :
first_encounter, seen_items = elem not in seen_items, seen_items | {elem}
Which doesn’t make the proposal of adding a method to set sufficiently justified.
Exactly, any special -more complicated- operations on builtin collections specifically required to have atomic properties would find a pretty place in an atomic module. Then people doing multithreading stuff would be able to populate their code with these ops and feedback proposals to populate that module incrementally (without having to convert the full collections module to atomic functions !).
Out of curiosity, wasn’t dict.pop here all that long ?
I’m not sure what point you are making by suggesting this code. It is not reasonable to use this in single threaded code because it makes the linear operation of adding N items to the set quadratic. It also does not help for multithreaded code since two threads can end up having first_encounter = True.
Sure (dict.pop was added in Python 2.3, in 2003) , and list.pop was there from day #1. But the purpose of that is usually to return the value removed (like “pop the stack”), while the purpose of dict.pop is usually to return the value associated with the key being popped. If you merely wanted to get rid of a key, and didn’t care about the value, del dict[key] was the way to spell that.
IIRC, one of the main motivations for dict.pop was thread safety without bothering to add explicit locks (the GIL made the C implementation atomic by magic).
While it’s up to them, I don’t think the OP would like that. Exception-driven control flow is “heavy” for ordinary cases. And I doubt the behavior of set.add() will ever change.
Which leaves the possibility of adding a new method. like
set.aartin(element)
Which is obviously short for “add and return True if new” .
I think the mathematical construct being the most generalized blindspot, lacking here is the union-intersection op (Like divmod makes division and modulo at the same time, but for union and intersection).
union, inter = setA.unint(setB)
This could be o(a + b) (+o(min(a,b)) and atomic if i am not mistaken.
That is, a triple (elements unique to A, elements in both, elements unique to B).
comm is in honor of the Unix™ command of that name, which does a similar thing for lines in sorted files. str.partition(fence) is also similar in spirit.
Not exactly but a bit diverging, OP is about inplace modification and the concept I was inputting was that if a new method is added for aartin, it probably has more added value by making it applicable with an entire external set than only with a single element, it might fit into the difference_updateand friends methods collection. It might be some kind of union_update_returning_intersection or whatever.
A pretty name would be comm_update but I am unsure it can be made semantically consistent with the comm operation, given it has three outputs, maybe with some keywords it will provide a flexible method filling the gaps, idk.
The very reason that there are requests for aartin, add_if_exists, add_or_raise_if_exists, etc. is the need of atomic operations — otherwise, the methods can be written as one-liners or three-line functions. While the need for atomic operations exist even under the GIL, the fact that the free-threading project has gained a lot of traction simply mean that more people will be trying to write multithreaded code.
For builtin dicts, lists and sets, we can provide atomic operations on the types themselves, but in general we can’t guarantee operations on their subclasses will be atomic as well. Atomicity is one of the characteristics of methods that cannot in general be inherited. Providing an atomic module with common atomic operations for exact classes of builtin mutables will be an option.
But in such a module we still won’t be able to cover all of the use cases that require atomicity. There will also be cases where people want to synchronize on multiple objects (given they are sufficiently aware of the deadlock risk) or otherwise do something different. Perhaps we can expose the per-object lock in Python in some way to let people write their own atomic functions?