Create new package similar to `heapq` but be able to pass custom comparator through a constructor

The current heap container is okay in Python. It works as intended but you cannot pass a custom comparator and working with it feels “C like”, since you need to pass your list object each time. I’d be willing to make a new package so that we can have a similar heap/priority queue to java and c++.

I know you can override the less-than-operator, but that just doesn’t seem right and is not intuitive.

You are totally free to make such a package and put it on PyPI. I wouldn’t expect it to be added to the stdlib though.

I’ve solved many heap problems on LeetCode and adapted to Python’s heapq, but I can see where you’re coming from. The lack of a custom comparator can make heapq less intuitive and harder to use for certain use cases. Here’s an example to illustrate the issue:

Example: Custom Comparator Workaround in heapq

Let’s say you want to use a heap to process tasks based on their priority, where each task is represented as a tuple (priority, task_name). If you want a max-heap based on priority, you’d typically need to negate the priority:

import heapq

tasks = [(1, "task1"), (3, "task3"), (2, "task2")]
max_heap = []
for priority, task in tasks:
    heapq.heappush(max_heap, (-priority, task))

# Get the task with the highest priority
highest_priority_task = heapq.heappop(max_heap)
print(highest_priority_task)  # Outputs (-3, 'task3')

This approach works, but:

  • High cognitive load
  • Leaking information about the comparator, as the user has to manage it manually, like making a max-heap by negating values. This is also against encapsulation.
  • It’s very inflexible. Why?
    If you need a different custom order (e.g., sort by task length or a combination of factors), you’re forced to restructure the data or override comparison operators, neither of which feels clean.

I don’t know why the first commenter simply discarded the design (I wouldn’t expect it to be added to the stdlib though.). At least counter it. The heapq library is against the design principle of high-level, expressive design.
I have solved a lot of Dijkstra algorithm questions using heapq, and I needed to adapt to the design. It is not at all intuitive.

1 Like

In my opinion, at this point in Python’s lifecycle it’s unlikely we’d add a data structure to the stdlib unless we had an internal use case for it. And even then we might not expose it (see HAMT).

Plus, there are plenty of heap implementations on PyPI.

Thanks, that makes sense. However, what about the option to patch in an optional comparator within the design?

I understand there are plenty of libraries available online, but my concern lies in pair programming interviews where installing third-party libraries isn’t an option. When you try to adapt heapq to solve a Dijkstra’s algorithm problem, it often feels unintuitive to the interviewer. This is especially true when interviewers, who usually work in compiled languages, are accustomed to dedicated and intuitive priority queue implementations.

I respect your perspective and decisions, but this addition could be a significant help for students.

Why does it not seem right? Using Prince Roshan’s example to demonstrate, the problem is trying to reuse existing data types instead of defining one that models what you want. Don’t try to use tuples that don’t compare like you want; do use a custom Task class that does compare properly.

from dataclasses import dataclass
import heapq

tasks = [(1, "task1"), (3, "task3"), (2, "task2")]

@dataclass
class Task:
    priority: int
    name: str

    def __lt__(self, other):
        return self.priority > other.priority

max_heap = []
for t in tasks:
    heapq.heappush(max_heap, Task(*t))
1 Like

But the OP isn’t proposing adding a new data structure to the stdlib. What’s proposed here is adding a custom comparator option, likely in form of a keyword argument key like that in sorted and heapq.merge, to heapq.heapify, heapq.heappush and heapq.heappop.

And we already have an internal use for it in the implementation of asyncio.queues.PriorityQueue, which currently has the limitation of data needing to be sortable due the lack of a key argument (so we can’t use a dict as data for example):

The documentation says the priority queue “retrieves entries in priority order” when in fact it retrieves entries in the order of both priority and data, which to me is an issue worth fixing.

It would be in line with API of bisect.

Yes, heapq and bisect are very alike in that they are both collections of functions that operate on list and rely on the order of list items. So if bisect functions all get to have a key argument, it would bring the heapq API consistent with the bisect API for all heapq functions to also get a key argument.

It is wierd, that only heapq.marge has key and reverse arguments. Finding the reason why, I did find mail proposing this addition, but the discussion is only about the details. Why would someone marge heaps with std. lib, but heapify on their own?

Well, the subject is about adding a “new package”, so I assumed a new data structure.

Not just “new package” but “pass custom comparator through a constructor” and “heap container”. Sounds like a new data structure to me. Constructed like heap = Heap(key=...) (or I guess Heap(cmp=...)).

Those aren’t constructors. And adding that option to them wouldn’t be a new package.

It’s not for merging heaps. It’s for merging sorted iterables.

Ok, that makes more sense

Ah, I had my blinders on and missed the very suggestion in the OP’s proposal about creating a new package.

Still, despite the XY problem, I think we can agree that the actual problem the OP is trying to solve is clear–that the existing heapq functions in the stdlib are missing an option to customize how the items are compared. A key argument for the heapq functions, rather than a new package/module/class, should be all it takes to solve the problem.

1 Like

Hi @ericvsmith,

Suppose you’re solving a Dijkstra’s algorithm problem and need to use a priority queue. Currently, in Python, you might implement it like this:

import heapq

reference_point = (0, 0)

def squared_distance(point):
    x0, y0 = reference_point
    return (point[0] - x0) ** 2 + (point[1] - y0) ** 2


heap = []

points = [(1, 2), (3, 4), (-1, -1), (0, 0), (2, -2)]
for point in points:
    distance = squared_distance(point)
    heapq.heappush(heap, (distance, point))

Suppose, I’ve made changes to the Python implementation of heapq module to allow using a custom comparator function. Now, you can write:

import heapq

reference_point = (0, 0)

def point_comparator(p1, p2):
    x0, y0 = reference_point
    d1 = (p1[0] - x0) ** 2 + (p1[1] - y0) ** 2
    d2 = (p2[0] - x0) ** 2 + (p2[1] - y0) ** 2
    return (d1 > d2) - (d1 < d2)

heap = []
points = [(1, 2), (3, 4), (-1, -1), (0, 0), (2, -2)]
for point in points:
    heapq.heappush(heap, point, comparator=point_comparator)

The latter is very intuitive, and most language containers implementing priority_queue provide a way to pass a comparator function.

I understand that there are many open-source packages offering various implementations in different styles—functional, object-oriented, or other paradigms—but in coding competitions and interview settings, people are often not allowed to use these packages.

yes @Alex-Wasowicz ,I also didn’t find any request. I think we can agree that most competitive programmers use C++, and those who use Python have adapted their approaches accordingly, often starting from high school.

In response to @chepner , yes, the first thing that came to mind was to create an object and define the lt and other comparison methods. However, why should we use an object that takes extra memory when we can use a tuple?

Should I create a ticket for this? I was planning to post a new question about adding an optional comparator to heapq.

Exactly, I took the real problem he is trying to solve, ignored his suggestion of adding a new module, and proposed a solution. What do you think—should I create a ticket and work on it?
A lot of people have supported it.

I read somewhere that at least one core developer needs to support the idea, so I’m waiting for Eric V. Smith to confirm.

Unfortunately, Dijkstra’s algorithm will not benefit from this much, and it will even be a bit slower.

b = (1, 'a')
a = (1, 'b')
%timeit a < b    # 55.6 ns
# vs
IG0 = opr.itemgetter(0)
%timeit IG0(a) < IG0(b)    # 58.6 ns

Proposed functionality is useful for more complex situations nevertheless.

P.S. Standard argument name is key - “key function”.

On the other hand, if implemented correctly, it should be a bit faster as it would only call it once for new item, while use it many times.

Extra mile for maximum performance would be to hold parallel structure of extracted values so that key function is called only once for each item.

Would you be willing to show your implementation? I am quite curious.