Sure, I’m happy to clarify. The short version is that I’m making a distinction between data model and presentation. I typically only need the properties of a set
for modeling, but the additional properties of an oset are helpful for presentation, they don’t interfere with modeling, and I don’t care about the potential performance gap between the two types.
The longer version:
I often find myself working with data that lends itself to set representation, like collections of tags or keywords. I mentioned a recent example where, given some strings s
and transformations s->s'
, I needed both the closure s*
and the minimal starting set s0
, which are easy to calculate with set union and difference. That naturally lent itself to a set
representation:
# excerpt from close(*args):
aliases = set(args)
closure: set[str] = set()
while aliases != closure: # Stop when the closure is stable.
aliases |= closure
for alias in aliases:
closure |= {alias, alias.casefold(), ...}
# excerpt from __repr__:
minset = set(aliases)
for alias in aliases:
minset -= close(alias) - {alias} # Remove the non-root aliases.
That algorithm doesn’t need an oset, just set
. Because it doesn’t require ordering semantics, and the set notation is quite clear, I think it would be a mistake to reimplement this with a dict
to get ordering semantics. It would obscure the code for no benefit.
However, when it comes to presentation, then I do care about set ordering, for consistency and ease of reading. The second excerpt there is from a __repr__
method, which “should look like a valid Python expression that could be used to recreate an object with the same value,” and the minset
is the smallest set that generates the same closure. I also want a stable repr
that is easy for a human to parse. Thus, the ordering matters here.
While writing that code, I hopped over to the Python 3.11 collections
docs to see whether collections.OrderedSet
or SortedSet
exists yet, and finding that they don’t, I used sorted
to order the output instead.
Because this kind of code doesn’t need oset or poset properties for correctness, I’m not willing to sacrifice notational clarity, testability, effort, etc. to get them. However, if I had a well-vetted oset class handy, such as the hypothetical collections.OrderedSet
, then I would happily use it for the convenience of stable and readable diagnostic output. I expect that the oset class would sacrifice some performance for this convenience, but that’s a tradeoff I’m fine with.
I hope that clarifies why I would prefer to have an oset class available even for code that doesn’t require it algorithmically, and why I’m not happy with the suggestion to use workarounds more complex than throwing sorted
into my output methods.