There are two facts that led to me personally getting interested in this proposal. (1) I needed an object like a List
but that raises an error when I try to append something to the list that is already in that list. And I need it in a few different packages that I’m working on so I don’t just want to spin up a quick helper class every time. (2) I was working in an environment where I did not have access to PyPI so it was very inconvenient to use any of the many (suitable) options on PyPI. So fully disclosure that my motivation is pretty highly “convenience” motivated (as opposed to, e.g. performance).
For now my use case is perfectly suited by (other than the introduction of an external dependency), for example:
setlist
incollections-extended
(setlists — collections_extended 2.0.2 documentation)OrderedSet
insortedcollections
(Ordered Set Recipe — Sorted Collections 2.1.0 documentation)- And many other third party packages.
See the source code in these examples for what might be desired in stdlib.
To adress comments/questions 1-5:
- What new problems can be solved that can’t currently be solved by just using
dict.fromkeys
? The latter is what I use in the rare cases where the need arises. It is not a fully featured set but gives fast membership testing and ordered iteration which is all most people would want.
oset = dict.fromkeys([key_0, key_1, key_2])
needs to know all keys during instantiation, or if you want to add elements to it afterwards you need to do the awkward oset[key_3] = None
(or use a wrapper which does this which is pretty much exactly what I’d like in stdlib). Also you can’t access elements of a dictionary via an index. That is I might want oset[0]
to return key_0
.
- What should be returned by
{1, 2, 3, ..., 200_000} & {2, 1, 0}
? If the answer is{1, 2}
rather than{2, 1}
, how would that be computed without being 100,000 times slower than what we have now?
I do not know the answer to this question. Possibilities are: (1) binary options that exist on sets simply aren’t defined on ordered sets. No binary operations. (2) binary operations are considered to break ordering so that, for example, the union of two ordered sets returns a regular unordered set. (3) The first (or second) set in the binary operations gets precedence for determining ordering of the resulting set. That is, the new set is equivalent to adding one by one, in order, all the elements of the first set followed by adding, one by one, in order, all the elements of the second set. As mentioned above, my use case doesn’t (yet?) require good performance so I’d be willing to pay a, possibly large, performance price for having the ordering I’m looking for. But also what do you mean by “what we have now”? Do you mean merging/updating two dictionaries? If so I’d think/hope an implementation based on existing dictionaries would have similar performance?
- Would OrderedSet get a custom literal notation such as
o{1, 2, 3}
, or would it have an inconvenient notation likeOrderedSet([1, 2, 3])
. The latter is what we have now for frozensets? No one really likes the latter.
No, I don’t think OrderedSet
needs a custom literal notation. OrderedSet
seems ok to me at
least. Similar in status to other collections in the collections
package.
- Would the implementation be a regular set augmented by a doubly-linked list? This is how
collections.OrderedDict
was implemented, making it slower and more memory intensive than regular dicts. Or would the implementation be a wrapper around the compact and ordered regular dictionary? The latter gives memory savings but is algorithmically poor for set applications with frequent deletions and reinsertions.
The thrust of most of the responses in this thread is that the ordered set will just be a wrapper around the existing ordered regular dictionary. See source code in the examples above for ideas about what operations would be included. (We could also hash out those details here if it’s an appropriate place, but I think those questions will be better discussed looking at code, not forum posts).
- Why not just use one of the many choices on PyPI? I presume that if people really needed an ordered set on a regular basis, those packages would be in more popular use. I almost never see them used in my customer’s code repositories.
As I mentioned above, and as as explained much better further up in this thread, occasionally that is, unfortunately, not possible, or is prohibitively costly. Even beyond technical possibility, there are reasons to avoid inclusion of additional dependencies. Perhaps these constraints are a bit contrived (though they do apply to real python users out there…), but to this I respond/ask: Why was it appropriate to OrderedDict
in stdlib at the time, but not OrderedSet
? My personal opinion is that it’s not a question that OrderedSet
would be useful to some users and if it could just “magically” appear in stdlib and be maintained it would be a no-brainer. So the only question is if there’s someone willing to put this in and someone willing to maintain it.