Socket.socket().setsockopt() incompatibility with Alpine Linux (and musl libc)

(also on Socket.socket().setsockopt() incompatibility with Alpine Linux (and musl libc) — sourcehut lists)

I know that two of the three most suspicious statements it is a fault of interpreter, and it is a fault of libc, but I am slowly sliding towards this. When investigating problems with time representation on 32bit platforms, I run the test suite of M2Crypto in i386/alpine and I found problems with socket.setsockopt call. When running the code distilled into this short script, I found some problems:

import socket
import struct
from M2Crypto import SSL

ctx = SSL.Context()
s = SSL.Connection(ctx)

timeout = SSL.timeout()
print(f"timeout.sec = {timeout.sec}")
print(f"timeout.microsec = {timeout.microsec}")
print(f"timeout.pack() = {timeout.pack()}")
print(f"socket.SOL_SOCKET = {socket.SOL_SOCKET}")
print(f"socket.SO_RCVTIMEO = {socket.SO_RCVTIMEO}")
s.socket.setsockopt(socket.SOL_SOCKET, socket.SO_RCVTIMEO, timeout.pack())

When running on i386/debian there is no problem and the script outputs:

$ python3 small_test.py
timeout.sec = 600
timeout.microsec = 0
timeout.pack() = b'X\x02\x00\x00\x00\x00\x00\x00'
socket.SOL_SOCKET = 1
socket.SO_RCVTIMEO = 20
$

while when it is run i386/alpine it fails:

$ python3 small_test.py
Traceback (most recent call last):
  File "/builds/m2crypto/m2crypto/small_test.py", line 14, in <module>
    s.socket.setsockopt(socket.SOL_SOCKET, socket.SO_RCVTIMEO, timeout.pack())
timeout.sec = 600
timeout.microsec = 0
timeout.pack() = b'X\x02\x00\x00\x00\x00\x00\x00'
socket.SOL_SOCKET = 1
socket.SO_RCVTIMEO = 66
OSError: [Errno 22] Invalid argument
$

When studying the problem, I found this comment by @tiran:

SO_RCVTIME0 works only with operating system level sockets. A SSLSocket is not an OS level. It’s a high level abstraction layer that wraps either a file descriptor or a memory BIO. A read operation on a SSLSocket can perform write, a write operation can perform read. For the initial handshake, it will do both.

This means that SO_RCVTIME0 is not supported. Either you have to use the SSLSocket’s timeout feature or do your own socket io and use a memory BIO. The internal timeout feature is build around select()/poll() syscall and low level OpenSSL calls.

Is this relevant here? s.socket is not SSLSocket or anything like that, but if I understand correctly it is just the plain socket.socket underlying all those complicated BIO constructs, isn’t it?

Is s.socket an OS socket? What is its repr()?

If you use the OS socket layer do you still see the issue or only via the SSL wrapper?
Can you reproduce the issue using only the stdlib socket module?

Maybe this is not a python issue by a kernel/libc issue.
Can you reproduce the issue with a C program that does the same calls?

Yes, the repr in the log is

s.socket = <socket.socket fd=3, family=2, type=1, proto=0, laddr=('0.0.0.0', 0)>

Thanks for the repr. Given that my guess is you are looking at a C runtime or kernel issue.

I know that “It is a libc bug!” is the most common wrong suspicion in history of all suspicions, but it truly seems we could have it here.

Thank you for making me a sound desk!

This is low level enough that doing the same things in C would be pretty
easy. See if you get EINVAL there also. socket() is a system call and
so is setsockopt(), so aside from the shims in libc you should be
looking at pretty direct “what does the kernel do?” stuff.

FWIW, PEP 11 does not list your platform as supported by Python upstream. Neither Linux on i686 / x86 nor Linux with musl libc are supported. Therefore any bug is between you and your vendor. I suggest that you file a bug with Alpine instead.

One reason for the lack of musl libc support is the fact, that CPython’s regression test suite does not pass on Alpine – and never has. For years, the Alpine Python package disables tests like test_os instead of adressing the problem.

1 Like

Ugh, that’s terrible. I have started with really high opinion about Alpine (and with recommendations of people like Drew DeVault) and more I learn about it, more I am disillusioned. Sad.

Seems, from the use of musl and busybox, Alpine is aimed at embedded applications.

Is that what you had in mind for it?
If not you are better off with a fully featured linux distro.

Alpine is also popular for container images (Docker, podman), because the images are slightly smaller than Debian or UBI-based images. The problem with musl is that it’s not as well tested as glibc.

The locale problem in test_re is easy to detect. Other bugs are more subtle, e.g. when Kerberos/GSSAPI authentication suddenly fails because the DNS SRV records got a bit longer and musl’s code for edns0 with SRV does something wrong.

2 Likes

I am developing M2Crypto, which is supposed to be a universal library working on all operating systems (including Windows, Mac OS X, *BSD and similar stuff), so of course it includes all Linux distributions. My reaction at this moment is that I skip this test, and if anybody using Alpine files a bug about it, I will work on some platform specific solution.