Multiprocessing spawn (default on macOS since Python 3.8) is slower than fork method

I am being fairly vague here, but it is mainly because I don’t know the best way to reduce the test cases down to pinpoint the problem.

I maintain Waitress, a pure Python HTTP server, and ever since Python 3.8 there’s been a marked slowdown of the functional tests.

Python 3.6 on my MacBook Pro M1 Rosetta: 15.91s to run the entire test suite, similar slow tests as Py37. So not included.

Python 3.7 on my MacBook Pro M1 Rosetta: 15.57s to run the entire test suite

Slowest tests on Py37 is:

3.10s call     tests/test_functional.py::SleepyThreadTests::test_it
1.01s call     tests/test_channel.py::TestHTTPChannelLookahead::test_client_disconnect
0.41s call     tests/test_wasyncore.py::DispatcherWithSendTests::test_send
0.17s call     tests/test_wasyncore.py::TestAPI_UseIPv6Poll::test_handle_close_after_conn_broken
0.11s call     tests/test_wasyncore.py::TestAPI_UseIPv4Select::test_handle_read
0.11s call     tests/test_wasyncore.py::TestAPI_UseIPv4Select::test_handle_close_after_conn_broken
0.11s call     tests/test_wasyncore.py::TestAPI_UseIPv4Poll::test_handle_close_after_conn_broken
0.11s call     tests/test_wasyncore.py::TestAPI_UseUnixSocketsPoll::test_handle_close_after_conn_broken
0.11s call     tests/test_wasyncore.py::TestAPI_UseIPv6Select::test_handle_close_after_conn_broken
0.11s call     tests/test_wasyncore.py::TestAPI_UseIPv4Select::test_handle_close

Python 3.8 on my MacBook Pro M1 Rosetta: 44.33 seconds to run the entire test suite

3.34s call     tests/test_functional.py::SleepyThreadTests::test_it
1.01s call     tests/test_channel.py::TestHTTPChannelLookahead::test_client_disconnect
0.41s call     tests/test_wasyncore.py::DispatcherWithSendTests::test_send
0.40s call     tests/test_functional.py::TcpInternalServerErrorTests::test_before_start_response_http_11_close
0.36s call     tests/test_functional.py::TcpInternalServerErrorTests::test_before_start_response_http_11
0.35s call     tests/test_functional.py::UnixWriteCallbackTests::test_no_content_length
0.34s call     tests/test_functional.py::TcpInternalServerErrorTests::test_after_write_cb
0.33s call     tests/test_functional.py::TcpInternalServerErrorTests::test_after_start_response_http11_close
0.32s call     tests/test_functional.py::TcpFileWrapperTests::test_filelike_nocl_http10
0.32s call     tests/test_functional.py::TcpFileWrapperTests::test_filelike_http10

Python 3.9 on my MacBook Pro M1 Rosetta: 41.93 seconds

3.28s call     tests/test_functional.py::SleepyThreadTests::test_it
1.01s call     tests/test_channel.py::TestHTTPChannelLookahead::test_client_disconnect
0.41s call     tests/test_wasyncore.py::DispatcherWithSendTests::test_send
0.36s call     tests/test_functional.py::TcpWriteCallbackTests::test_short_body
0.33s call     tests/test_functional.py::TcpFileWrapperTests::test_notfilelike_nocl_http10
0.32s call     tests/test_functional.py::TcpEchoTests::test_chunking_request_without_content
0.32s call     tests/test_functional.py::TcpEchoTests::test_date_and_server
0.31s call     tests/test_functional.py::TcpBadContentLengthTests::test_long_body
0.30s call     tests/test_functional.py::TcpEchoTests::test_broken_chunked_encoding_missing_chunk_end
0.30s call     tests/test_functional.py::TcpTooLargeTests::test_request_body_too_large_with_wrong_cl_http10

This is very visible too in the test times on Github actions:

https://github.com/Pylons/waitress/runs/1335343659?check_suite_focus=true (Python 3.7, running tox takes 40 seconds)

https://github.com/Pylons/waitress/runs/1335343690?check_suite_focus=true (Python 3.8, running tox takes 3m and 42seconds)

Although GitHub actions are not always reliable since it is shared infrastructure and you never know who your neighbors are, but the same issue holds across all of the Mac devices I own (2017 MacBook Pro i7, 2012 Mac Mini i7). Just as an extra data point, my 2017 MacBook Pro runs the py39 tests in 67 seconds, and the py37 tests in 26 seconds.

The problem is that I don’t know where to begin attempting figure out what is going on. So I am shooting my shot here in the bug tracker. How can I help debug this/track this down so that we can get rid of the regression and speed things back up?

Open source project: https://github.com/Pylons/waitress
Uses pytest to run tests, tox is the easiest way to accomplish this.

I’ve also created a bug report, but since it isn’t an actual straight forward report (https://bugs.python.org/issue42471), I figured I’d try posting here too.

I want to help, but need advice on how to try and figure out what caused the slowdown in the first place.

1 Like

@nad correctly deduced it is due to the change in multiprocessing to go from using the fork model to using the spawn model.

1 Like

The solution is probably to avoid spawning new processes and instead use a process pool.

1 Like

Since it is tests and we absolutely want a new instance for each test we run, for now I’ve just gone for using the fork model, which gets us the speed back. I’d rather not try to figure out how build out a process pool and change the tests around that.

We don’t do anything fancy with PyObjC or any macOS frameworks so its not unsafe, nor are we spawning these things from threads.

1 Like

Then I’d suggest to at least use forkserver, which is much less fragile than fork (on any Unix-like OS).

1 Like

I took the liberty of changing the topic title.