Using select.select vs select.poll, select.epoll, select.select?

James_E · March 3, 2025, 7:16pm

My current understanding is that, when it comes to receiving data on multiple sockets in an event loop style, without using async tooling, and without interfering with Ctrl-C:

select.epoll is the best option on Linux;
select.kqueue is the best option on *BSD;
select.poll is the best fallback option on POSIX platforms;
select.select is the only option on Windows.

Have I got this correct?
The documentation for the select module never really spelled it out clearly.

Snippet

import platform
import select
from select import select as select_


if hasattr(select, 'epoll') and not platform.system().endswith('BSD'):
    def _iter_readable_forever(rlist, timeout=None):
        with select.epoll(sizehint=len(rlist)) as p:
            for obj in rlist:
                p.register(obj, select.EPOLLIN)
            poll_ = p.poll
            while True:
                for result in poll_(timeout):
                    yield result[0]

elif hasattr(select, 'kqueue'):
    def _iter_readable_forever(rlist, timeout=None):
        with closing(select.kqueue()) as kq:
            # bug in kqueue: timeout is ignored when max_ev = 0
            # workaround h/t: https://github.com/python/cpython/blob/v3.13.2/Lib/selectors.py#L541-L545
            max_ev = max(len(rlist), 1)
            timeout = None if timeout is None else max(timeout, 0)

            control_ = kq.control
            control_([select.kqueue.event(obj, select.KQ_FILTER_READ, select.KQ_EV_ADD) for obj in rlist], 0, 0)
            while True:
                for result in control_(None, max_ev, timeout):
                    yield result.ident

elif hasattr(select, 'devpoll'):
    def _iter_readable_forever(rlist, timeout=None):
        with closing(select.devpoll()) as p:
            for obj in rlist:
                p.register(obj, select.POLLIN)
            poll_ = p.poll
            while True:
                for result in poll_(timeout):
                    yield result[0]

elif hasattr(select, 'poll'):
    def _iter_readable_forever(rlist, timeout=None):
        with ExitStack() as ctx:
            p = select.poll()
            for obj in rlist:
                p.register(obj, select.POLLIN)
                ctx.callback(p.unregister, obj)
            poll_ = p.poll
            while True:
                for result in poll_(timeout):
                    yield result[0]

else:
    def _iter_readable_forever(rlist, timeout=None):
        _empty = []
        while True:
            yield from select_(rlist, _empty, _empty, timeout)[0]

Rosuav · March 3, 2025, 7:19pm

Any particular reason for avoiding asyncio? If it’s for the sake of learning, I wouldn’t worry about the differences here - any of them will be fine, and you can tinker with them without needing to worry about whether it’s going to have the throughput you need.

James_E · March 3, 2025, 7:25pm

The “boring reason” is that the codebase I’m actually using this in at work doesn’t use asyncio.

The “good reason” is that I do plan on eventually publishing this code as part of a larger mini-library on gist.github.com (or maybe PyPI someday, if this goes beyond a didactic exercise) to provide an ultra-flat UDP iteration interface exposing both classic and async iterators, and I happen to be asking about the first of these today.

James_E · March 3, 2025, 7:30pm

actually, that’s the thing: I noticed enough caveats on that assumption:

select.select is documented to fail in certain situations (at least on POSIX platforms)
select.poll, is not available on Windows
select.epoll is not available on non-Linux POSIX
select.kqueue is not available on non-BSD POSIX

that I figured I should just find out what the correct answer per se is.

Rosuav · March 3, 2025, 7:33pm

Ahhh. In that case, I would take the advice at the top of the select module’s docs and use the high level selectors.DefaultSelector which deals with the platform-specific stuff for you. I believe that should handle Ctrl-C correctly, although I haven’t actually tested it on anything other than Linux.

Rosuav · March 3, 2025, 7:38pm

Hmm, what do you mean by that? I’m trying to find it in the docs. Generally, select.select is basic, a bit fiddly to use, and doesn’t scale optimally, but it ought to work. ^[1] In any case, the high level selectors module is likely the best choice for actual production work, which is why my first guess for the use of the select module was learning about how all these things actually work under the hood. (Which is an excellent exercise for anyone who’s planning on using high level abstractions like asyncio. Implementing those from scratch gives you a great understanding of them. But that isn’t what you were after.)

For sockets, at least; but you already know that Windows won’t handle other FDs. ↩︎

James_E · March 3, 2025, 7:46pm

I see that DefaultSelector returns (key, events) tuples, where events is a bitmask indicating the subset of listened-for events which have actually become ready; but select.select (only available implementation on Windows) doesn’t yield that information.

I’m guessing that there’s some minor overhead to fetch that information (which in my use-case I’m just turning around and discarding)?

(emphasis sic:)

WARNING: select() can monitor only file descriptors numbers that are less than FD_SETSIZE (1024)—an unreasonably low limit for many modern applications—and this limitation will not change. All modern applications should instead use poll(2) or epoll(7), which do not suffer this limitation.

– select(2), Ubuntu Manpages, 24.02 LTS “Noble Numbat”

James_E · March 3, 2025, 7:51pm

Honestly, I kept on hearing that repeated in various online sources as a supposed downside of it, but (as you can see from the snippet I included in the OP) it seems to have an interface no worse than any other selector — heck, it took the least code to deal with of any selector!

My only grudges against it, when it comes to dealing with network sockets, are:

other selectors’ claims to perform better when push actually comes to shove
its own claims to fail in certain situations, at least on POSIX platforms

Rosuav · March 3, 2025, 7:56pm

TBH I’ve no idea; the source code just shows that it’s built on top of select.epoll which has a lot of flexibility, and I haven’t dug into exactly how it’s doing it. But I doubt that it’s enough overhead to warrant switching implementations, as otherwise selectors.DefaultSelector would do exactly that.

Ah. To be quite honest, I had completely forgotten about this, as I’ve only ever used select.select itself for toy projects and study. For anything where I actually want serious throughput (where the possibility of having large numbers of FDs will come up), it’ll always be epoll. So that limitation has never come up. But then, for me personally, limitations like “is not available on Windows” have never come up either, so don’t take my experience for everything.

Yeah it’s fine in Python. I’m not a fan of it in C though But you’re absolutely right:

others are definitely better for high throughput situations. Except on Windows, where… well, actually, it’s not even the same function, it just has the same name. So on Windows, you use the Windows option, and everywhere else, you use something else. Which is why we have high level tools to hide all those details

James_E · March 3, 2025, 8:04pm

That kinda circles back around to [the bottom 25% of] my original question: on Windows, when we aren’t breaking out asyncio, is select.select definitely the best tool for getting socket.recv not to step on the toes of KeyboardInterrupt?

(*I misspoke; select.epoll does provide that data, but select.select doesn’t. I edited the post you replied to.)

Rosuav · March 3, 2025, 8:18pm

I think it’s the only tool, so yes, it would be the best. But don’t trust me on Windows matters, I haven’t used it in many many years.

barry-scott · March 3, 2025, 10:46pm

If the number of file descriptors you need to handle is small (<10) the select is good enough. If you need to handle 100’s or 1000’s of file descriptors the epoll is going to be needed (and kqueue on bsd).

James_E · March 3, 2025, 11:17pm

hmm, looks like this question was already opened a bit over on the tracker back in 2012:

github.com/python/cpython

Patch selectmodule.c to support WSAPoll on Windows

opened 10:32PM - 18 Nov 12 UTC

closed 01:53PM - 20 Jun 13 UTC

tpn

type-feature

BPO | [16507](https://bugs.python.org/issue16507) --- | :--- Nosy | @gvanrossum,… @jcea, @pitrou, @giampaolo, @tpn Files | <li>[wsapoll.patch](https://bugs.python.org/file28038/wsapoll.patch "Uploaded as text/plain at 2012-11-18.22:32:52 by @tpn")</li><li>[miminal-wsapoll.patch](https://bugs.python.org/file28201/miminal-wsapoll.patch "Uploaded as text/plain at 2012-12-03.20:03:37 by sbt")</li><li>[runtime_wsapoll.patch](https://bugs.python.org/file28207/runtime_wsapoll.patch "Uploaded as text/plain at 2012-12-04.15:01:36 by sbt")</li><li>[runtime_wsapoll.patch](https://bugs.python.org/file28341/runtime_wsapoll.patch "Uploaded as text/plain at 2012-12-16.22:41:29 by sbt")</li><li>[runtime_wsapoll.patch](https://bugs.python.org/file28799/runtime_wsapoll.patch "Uploaded as text/plain at 2013-01-20.21:09:12 by @gvanrossum")</li> <sup>*Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.*</sup> <details><summary>Show more details</summary><p> GitHub fields: ```python assignee = None closed_at = <Date 2013-06-20.13:53:30.295> created_at = <Date 2012-11-18.22:32:54.167> labels = ['type-feature'] title = 'Patch selectmodule.c to support WSAPoll on Windows' updated_at = <Date 2013-06-20.13:53:30.283> user = 'https://github.com/tpn' ``` bugs.python.org fields: ```python activity = <Date 2013-06-20.13:53:30.283> actor = 'sbt' assignee = 'none' closed = True closed_date = <Date 2013-06-20.13:53:30.295> closer = 'sbt' components = [] creation = <Date 2012-11-18.22:32:54.167> creator = 'trent' dependencies = [] files = ['28038', '28201', '28207', '28341', '28799'] hgrepos = [] issue_num = 16507 keywords = ['gsoc'] message_count = 32.0 messages = ['175927', '175929', '175948', '176864', '176917', '177109', '177634', '180256', '180309', '180315', '180317', '180318', '180322', '180325', '180327', '180328', '180345', '180349', '180350', '180353', '180358', '180360', '180386', '180393', '180396', '180397', '180406', '180407', '180410', '180412', '180422', '180424'] nosy_count = 7.0 nosy_names = ['gvanrossum', 'jcea', 'pitrou', 'giampaolo.rodola', 'trent', 'neologix', 'sbt'] pr_nums = [] priority = 'normal' resolution = 'rejected' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue16507' versions = ['Python 3.4'] ``` </p></details>

the conclusion seems to have been that while WinSock’s WSAPoll does do stuff that WinSock’s select doesn’t, it is semantically different enough from the “POSIX poll()” that it wouldn’t have mapped cleanly into select.poll

barry-scott · March 4, 2025, 7:52am

I have always coded an app specific abstraction for the polling of FD’s to allow me to take advantage of each platform’s strengths (and mitigate weaknesses).

As each polling mechanism has it’s limitations this always seems necessary in any non-trivia app.

James_E · March 4, 2025, 4:55pm

semantically different enough

On further research:

It looks like the only “semantic difference” was a singular bug, which was fixed almost 5 years ago (and never affected anything except outbound pending TCP sockets anyway).

So when/if that ever gets added, select.poll would become the definitively-best fallback choice for every operating system.

James_E · March 4, 2025, 5:36pm

Looking into that more / for the record:

It looks like IOCP is supposed to sort of be “the Windows alternative to epoll/kqueue” to go further beyond and outperform even poll()/WSAPoll() when you actually need to do high-performance mass socket serving — and it looks like, despite IOCP being completely unavailable thru the selectors or select modules, asyncio has always used IOCP by default as the backend on Windows for things like create_datagram_endpoint.

So the work required to port the real high-performance option to non-asyncio code will/would be high indeed, since you’d need to duplicate/steal/adapt all the work the core team already did binding functions like CreateIoCompletionPort into Python.

James_E · July 18, 2025, 6:03pm

I went ahead and coded that up as a ponyfill, just in case it never makes it into the standard library: