The sorted() built-in and list.sort() both accept an optional key= parameter to specify the key for the sort and reverse= to reverse the result. But the functions in the heapq module do not. This feels like a curious omission.
Yes, I’ve seen this previous idea. But it feels to me like the scale of the problem isn’t properly appreciated.
As a case in point, look at the second example in the documentation. Even in this most trivial case, populating the heap with (key,value) tuples only works if either:
Keys are guaranteed to be distinct, or
Values are comparable
Otherwise, sooner or later you’re going to get a comparison error.
Currently, the only truly effective workaround I can see for this limitation is to incur the CPU and memory overhead of populating the heap with wrapper objects of a custom type which implements the required ordering. Well, aside from doing without the heapq module entirely, of course,and creating one’s own implementation.
Sorry, which “triple” solution? I’ve not seen one articulated anywhere?
I’m inferring that you’re suggesting putting (key,nonce,item) triples in the heap, where the nonce exists solely to prevent tuples being compared by item? That remains problematic for a few reasons:
It’s cumbersome and ugly
Currently, to achieve a reversed order it is necessary that keys be negatable, though I see that exposed max-heap functionality is coming in 3.13 .
It relies on heap items never being compared with themselves. While in practice (and given the nature of the algorithm) I don’t think they ever can be, I don’t believe there’s a formal guarantee.
The choice of nonce is problematic. id(item) seems a natural option, except that this won’t work if the same item is inserted multiple times into the heap, which is sometimes a defensible action rather than a logic error.
Ooh, sorry - I see it now. It looks like I correctly intuited it, aside from using an indefinitely-incrementing counter as an alternative to id(item).
I suppose that being an officially recommended solution tacitly guarantees that nothing in heapq ever compares an element with itself. And I accept that a counter is an adequate solution to the choice of nonce.
The key objection though - that the solution is cumbersome and ugly - most definitely stands. It seems that something like this has to be done in almost every conceivable use case for heapq, which strongly suggests it should be supported natively?
Yeah I think so, too, and it even says “since no two entry counts are the same, the tuple comparison will never attempt to directly compare two tasks”.
About cumbersome/ugly: Would the key-parameter usage really be less cumbersome/ugly? Would be good to see a comparison of code for both ways.
Hmm. I spiked some proof of concept code and rapidly realised something: what I really feel is missing is a factory for priority queue types.
Either or both of these spellings seem sane:
my_qtype = heapq.heaptype('my_qtype', key=attrgetter('start'))
class MyQType(heapq.HeapQ, key=attrgetter('start')):
... # Other stuff can go here
My ideal parameterisation for the type is:
key=None - Function from item to keying value
reverse=False - min-heap if False, max-heap if True
raw_factory=list - How the constructor should make the underlying container if none is specified
The resulting type would have approximately these methods:
def __init__(self, raw=None, heapify=True):
"""Create a heap-queue.
If raw is specified, it is used (by sharing, not by copy) as the
underlying container. Otherwise, raw_factory() is called to
create it. If heapify is True (the default) then any provided
container is heapified.
"""
...
@property
def raw(self):
"The underlying container"
...
@property
def head(self):
return self.raw[0]
def push(self, item):
...
def pop(self):
...
...
Now, the interesting thing is that I can already implement this to my satisfaction; I probably will, tomorrow! However, as things stand, the implementation would have to wrap every item, which feels horribly inefficient. It still feels preferable for heapq to provide native support, either by providing an interface like the one above, or by providing the necessary underpinnings.
Addendum: I’ve just noticed that, unlike the python implementation, the C implementation of heapq requires a heap to be an actual list, not merely “list-like”. So my dream of a raw_factory parameter should be disregarded.
Yeah, I suspected you wouldn’t actually like to have to pass the key in every call of every heap function :-). A type factory seems overkill, SortedList for example just takes a key at the list construction.
Well, I’ve implemented my own heapq.py which exposes the undocumented _heappop_max and _heapify_max, implements heappush_max in terms of _siftdown_max and then implements MinQ, MaxQ, KeyedMinQ and KeyedMaxQ in terms of them.
It works. And it’s about as efficient as an implementation in Python is ever going to be.
But I’m very mindful that:
Having an instance that wraps a list rather than a C object with an associated C array
Having a list of KeyedItem objects, rather than storing key and item directly inside a C array
…both cause significant unnecessary space to be used, and all the indirection in the Python code takes time.
Maybe I’m the only person who cares, but this feels like a sufficiently fundamental structure that it’s worth implementing efficiently.
(Yes, I’m aware I’m talking myself into a job, here…)
If keys is specified, it shall be a second list the same size as heap, containing the comparison keys. For the push functions, if keys is specified, key is the key for the new item.
It feels to me like that strikes a good balance between implementability, efficiency and elegance.
(If it was felt to be more intuitive, an alternative would be to add values and value parameters and always sift by comparing elements of heap.)