What would it take to run SocketCAN tests on a Buildbot?

While working on adding J1939 socket support, I discovered that one of the SocketCAN tests had been broken 3 years ago, but the breakage somehow went unnoticed until I ran the test locally. I would have expected for this to come up fairly quickly in the Buildbot, which leads me to believe that these tests are currently not run.

Here’s the relevant snippet from the test:

# class SocketCANTest(unittest.TestCase):
# ...
    def setUp(self):
        self.s = socket.socket(socket.PF_CAN, socket.SOCK_RAW, socket.CAN_RAW)
        self.addCleanup(self.s.close)
        try:
            self.s.bind((self.interface,))
        except OSError:
            self.skipTest('network interface `%s` does not exist' %
                           self.interface)
# ...

From what I understand, the Buildbot cluster runs after commits are merged (or on some set schedule). As these SocketCAN tests rely on a network interface being available, they’re presumably always skipped (since the test skips them if a vcan device isn’t found, as SocketCAN isn’t available on every platform).

Setting up a virtual SocketCAN interface doesn’t need any additional hardware on Linux, as drivers were mainlined and are present after Linux 2.6.25+:

# Load the vcan kernel module
sudo modprobe vcan
# Add a vcan device called vcan0
sudo ip link add dev vcan0 type vcan
# Set up the vcan0 device
sudo ip link set up vcan0

Are the SocketCAN tests actually being run on the Buildbot cluster? If not, are there any technical reasons why a vcan interface can’t be set up such that these could run? It would be nice to have these tests protected under Buildbot as well to prevent these test breakages in the future.

The buildbot instances are contributed and managed by volunteers. I’ve posted a link to this page to the buildbot-status mailing list in the hopes that at least one of the Linux buildbot owners will be able to configure the device on their buildbot.

Before merging GH-19548, I enabled the vcan0 interface on my ware-gentoo-x86 worker so that it will actually run the SocketCAN tests. However, GH-19548 was only backported as far as 3.9, so 3.8 and 3.7 will always fail on that worker now.

@ambv, @nad, how do you want to handle this? As far as I can see, we have 3 options:

  • We can backport the same fix to 3.8 easily, but 3.7’s failure appears to be different, and it’s decidedly not a security issue anyway.
  • We can remove this worker from the pool for just 3.7 and 3.8
  • We can remove 3.7 and 3.8 builders entirely since both branches are now in security-fix-only mode. If buildbot tests are desired for a particular patch on these branches, the test-with-buildbots label can still be used before merge, and the overall buildbot interface will be much less cluttered with 2 fewer branches.

WDYT?

For 3.7, it would not be the end of the world if we no longer ran that bulder. OTOH, testing Gentoo is a good thing. I would accept a PR to fix the test on 3.7; we do explicitly allow for that: “You should also consider fixing hard-failing tests in open security branches since it is important to be able to run the tests successfully before releasing.”.

Please backport to 3.8.

FYI, @zware has taken care of this so that the SocketCAN tests are now running on the ware-gentoo-x86 buildbot worker. Thanks for the ping @karlding!

1 Like