select.poll is the best fallback option on POSIX platforms;
select.select is the only option on Windows.
Have I got this correct?
The documentation for the select module never really spelled it out clearly.
Snippet
import platform
import select
from select import select as select_
if hasattr(select, 'epoll') and not platform.system().endswith('BSD'):
def _iter_readable_forever(rlist, timeout=None):
with select.epoll(sizehint=len(rlist)) as p:
for obj in rlist:
p.register(obj, select.EPOLLIN)
poll_ = p.poll
while True:
for result in poll_(timeout):
yield result[0]
elif hasattr(select, 'kqueue'):
def _iter_readable_forever(rlist, timeout=None):
with closing(select.kqueue()) as kq:
# bug in kqueue: timeout is ignored when max_ev = 0
# workaround h/t: https://github.com/python/cpython/blob/v3.13.2/Lib/selectors.py#L541-L545
max_ev = max(len(rlist), 1)
timeout = None if timeout is None else max(timeout, 0)
control_ = kq.control
control_([select.kqueue.event(obj, select.KQ_FILTER_READ, select.KQ_EV_ADD) for obj in rlist], 0, 0)
while True:
for result in control_(None, max_ev, timeout):
yield result.ident
elif hasattr(select, 'devpoll'):
def _iter_readable_forever(rlist, timeout=None):
with closing(select.devpoll()) as p:
for obj in rlist:
p.register(obj, select.POLLIN)
poll_ = p.poll
while True:
for result in poll_(timeout):
yield result[0]
elif hasattr(select, 'poll'):
def _iter_readable_forever(rlist, timeout=None):
with ExitStack() as ctx:
p = select.poll()
for obj in rlist:
p.register(obj, select.POLLIN)
ctx.callback(p.unregister, obj)
poll_ = p.poll
while True:
for result in poll_(timeout):
yield result[0]
else:
def _iter_readable_forever(rlist, timeout=None):
_empty = []
while True:
yield from select_(rlist, _empty, _empty, timeout)[0]
Any particular reason for avoiding asyncio? If it’s for the sake of learning, I wouldn’t worry about the differences here - any of them will be fine, and you can tinker with them without needing to worry about whether it’s going to have the throughput you need.
The “boring reason” is that the codebase I’m actually using this in at work doesn’t use asyncio.
The “good reason” is that I do plan on eventually publishing this code as part of a larger mini-library on gist.github.com (or maybe PyPI someday, if this goes beyond a didactic exercise) to provide an ultra-flat UDP iteration interface exposing both classic and async iterators, and I happen to be asking about the first of these today.
Ahhh. In that case, I would take the advice at the top of the select module’s docs and use the high level selectors.DefaultSelector which deals with the platform-specific stuff for you. I believe that should handle Ctrl-C correctly, although I haven’t actually tested it on anything other than Linux.
Hmm, what do you mean by that? I’m trying to find it in the docs. Generally, select.select is basic, a bit fiddly to use, and doesn’t scale optimally, but it ought to work. [1] In any case, the high level selectors module is likely the best choice for actual production work, which is why my first guess for the use of the select module was learning about how all these things actually work under the hood. (Which is an excellent exercise for anyone who’s planning on using high level abstractions like asyncio. Implementing those from scratch gives you a great understanding of them. But that isn’t what you were after.)
For sockets, at least; but you already know that Windows won’t handle other FDs. ↩︎
I see that DefaultSelector returns (key, events) tuples, where events is a bitmask indicating the subset of listened-for events which have actually become ready; but select.select (only available implementation on Windows) doesn’t yield that information.
I’m guessing that there’s some minor overhead to fetch that information (which in my use-case I’m just turning around and discarding)?
(emphasis sic:)
WARNING: select() can monitor only file descriptors numbers that are less than FD_SETSIZE (1024)—an unreasonably low limit for many modern applications—and this limitation will not change. All modern applications should instead use poll(2) or epoll(7), which do not suffer this limitation.
Honestly, I kept on hearing that repeated in various online sources as a supposed downside of it, but (as you can see from the snippet I included in the OP) it seems to have an interface no worse than any other selector — heck, it took the least code to deal with of any selector!
My only grudges against it, when it comes to dealing with network sockets, are:
other selectors’ claims to perform better when push actually comes to shove
its own claims to fail in certain situations, at least on POSIX platforms
TBH I’ve no idea; the source code just shows that it’s built on top of select.epoll which has a lot of flexibility, and I haven’t dug into exactly how it’s doing it. But I doubt that it’s enough overhead to warrant switching implementations, as otherwise selectors.DefaultSelector would do exactly that.
Ah. To be quite honest, I had completely forgotten about this, as I’ve only ever used select.select itself for toy projects and study. For anything where I actually want serious throughput (where the possibility of having large numbers of FDs will come up), it’ll always be epoll. So that limitation has never come up. But then, for me personally, limitations like “is not available on Windows” have never come up either, so don’t take my experience for everything.
Yeah it’s fine in Python. I’m not a fan of it in C though But you’re absolutely right:
others are definitely better for high throughput situations. Except on Windows, where… well, actually, it’s not even the same function, it just has the same name. So on Windows, you use the Windows option, and everywhere else, you use something else. Which is why we have high level tools to hide all those details
That kinda circles back around to [the bottom 25% of] my original question: on Windows, when we aren’t breaking out asyncio, isselect.selectdefinitely the best tool for getting socket.recv not to step on the toes of KeyboardInterrupt?
(*I misspoke; select.epoll does provide that data, but select.select doesn’t. I edited the post you replied to.)
If the number of file descriptors you need to handle is small (<10) the select is good enough. If you need to handle 100’s or 1000’s of file descriptors the epoll is going to be needed (and kqueue on bsd).
hmm, looks like this question was already opened a bit over on the tracker back in 2012:
the conclusion seems to have been that while WinSock’s WSAPoll does do stuff that WinSock’s select doesn’t, it is semantically different enough from the “POSIX poll()” that it wouldn’t have mapped cleanly into select.poll
I have always coded an app specific abstraction for the polling of FD’s to allow me to take advantage of each platform’s strengths (and mitigate weaknesses).
As each polling mechanism has it’s limitations this always seems necessary in any non-trivia app.
It looks like the only “semantic difference” was a singular bug, which was fixed almost 5 years ago (and never affected anything except outbound pending TCP sockets anyway).
So when/if that ever gets added, select.poll would become the definitively-best fallback choice for every operating system.
It looks like IOCP is supposed to sort of be “the Windows alternative to epoll/kqueue” to go further beyond and outperform even poll()/WSAPoll() when you actually need to do high-performance mass socket serving — and it looks like, despite IOCP being completely unavailable thru the selectors or select modules, asyncio has always used IOCP by default as the backend on Windows for things like create_datagram_endpoint.
So the work required to port the real high-performance option to non-asyncio code will/would be high indeed, since you’d need to duplicate/steal/adapt all the work the core team already did binding functions like CreateIoCompletionPort into Python.