How to get an extension module working w/ free threading?

oscarbenjamin · September 5, 2024, 9:33pm

I hadn’t really considered this point. If I am distributing wheels with extension modules in do I need to do something to make it so that my package “actually runs in free threaded mode”?

I added a CI job to build and run tests under free-threading and I checked that cibuildwheel could produce the wheels if I added it to the build matrix although I decided not to distribute the wheels yet so then removed it from the matrix.

Since the CI job passes I was assuming that I had tested that the package works in free-threaded mode but maybe it is actually just running in a free-threaded build and disabling the free-threading mode…

ncoghlan · September 5, 2024, 10:27pm

Yeah, the default state turns off free threading: C API Extension Support for Free Threading — Python 3.13.0rc2 documentation

It’s an import time warning so test runners will often miss it, even if they’re otherwise set up to treat unexpected warnings as errors.

It’s possible to force the interpreter to stay in free threaded mode via an environment variable: PEP 703 – Making the Global Interpreter Lock Optional in CPython | peps.python.org

There’s not yet a way to turn the GIL warnings into errors (I haven’t checked if there is a proposal to add one anywhere)

oscarbenjamin · September 5, 2024, 11:15pm

Okay, I see it now:

RuntimeWarning: The global interpreter lock (GIL) has been enabled to load module ‘flint.pyflint’, which has not declared that it can run safely without the GIL. To override this behavior and keep the GIL disabled (at your own risk), run with PYTHON_GIL=0 or -Xgil=0.

oscarbenjamin · September 5, 2024, 11:56pm

Thanks for this. While this page is very informative it has not directly answered my immediate questions like “what should I do to test proper free-threading?”.

My extension modules are all generated from Cython code and so the descriptions in that document relate to things that I don’t control directly. It would be great if there was some regularly updated advice page somewhere for extension module authors because this seems like a fast moving target.

ngoldbaum · September 6, 2024, 12:49am

Maybe you’ll find Porting Extension Modules to Support Free-Threading - py-free-threading more useful?

oscarbenjamin · September 6, 2024, 1:03am

That is exactly what I was looking for. Thanks!

oscarbenjamin · September 6, 2024, 7:07pm

Firstly it might happen by accident. I was briefly building python-flint wheels like this in CI and could have easily pushed the trigger to upload them to PyPI but decided not to distribute them yet. (Probably I would have done at least a bit of local testing if I was actually going to distribute them.)

Secondly, it is not about the “effort” of marking the module with Py_MOD_GIL_NOT_USED. The purpose of Py_MOD_GIL_NOT_USED is to distinguish whether or not the package actually is thread-safe without the GIL. Packages that are not will build but then not set the flag. The “effort” is making the package thread-safe which is many orders of magnitude harder than just setting a build flag and might even be structurally impossible for some packages.

Thirdly, people are going to want to install packages into a free-threaded build regardless of whether importing them disables free-threading. You can install my GIL-using extension modules into your free-threading build and use them sometimes but otherwise not:

# (hypothetically) uses GIL:
python my_script_using_python_flint.py

# benefits from free-threading:
python my_other_script.py

Note that it makes total sense to do this for another reason: python-flint already uses threads internally without any GIL so it can already give you that benefit if you just tell it how many threads to use:

from flint import ctx
ctx.threads = 10
# From here any python-flint operation can use
# up to 10 threads. This is done transparently
# without the user needing to manage threads.

Apparently I haven’t tested it properly yet so in principle it might be difficult for python-flint to run safely without the GIL. Hypothetically setting Py_MOD_GIL_NOT_USED might mean that import flint explodes your computer in a mushroom cloud of segfaults. If that is the case then there will certainly be no point in uploading extension modules marked with Py_MOD_GIL_NOT_USED.

Hypothetically it might be so difficult to fix the thread-safety issues that it would be years before the Py_MOD_GIL_NOT_USED flag could be set. There will however be people who will want to use python-flint with their free-threaded build of CPython regardless of whether it brings the GIL back on a per-process basis. Those people would say:

Why don’t you upload cp313t wheels without setting Py_MOD_GIL_NOT_USED? We don’t mind if python-flint uses the GIL but we have free-threaded builds for other reasons and yet we still want to use python-flint sometimes.

If it seems odd to you that people will want that then remember that if you are like me and just spin up Python versions via pyenv any time you feel like then you are in a small minority of the overall Python userbase. I can happily switch between a free-threaded and a non-free-threaded Python any time I like but most Python users want to have one installation of Python.

As soon as Python 3.13 is released thousands upon thousands of people are going to download the Windows and MacOS installers and tick the “free-threading please” button. Those people are going to be very disappointed when they find that none of their favourite packages are available.

So yes, there may well be packages that upload wheels matching the free-threading ABI but that do not set the Py_MOD_GIL_NOT_USED flag so that the GIL is then disabled at runtime. There are valid reasons why the flag cannot be set and there are also valid reasons to want to install a package that does not set the flag.

The warning about enabling the GIL that is printed now is useful while free-threading is considered experimental but longer term (e.g. Python 3.14) will likely need to be removed if the expectation is that users will use the free-threading build by default.

For clarification given my comments about python-flint I don’t expect that any of this applies in python-flint’s case. I think that the situation is going to be that there are some less often used operations that are not thread-safe and I am just going to document as such:

python-flint is thread-safe in a free-threaded build provided that you never mutate an object that is shared between multiple threads. Very few operations mutate objects so a complete list is …

That approach will work for python-flint because the unsafe operations are ones that most people will not use anyway. It means that we will upload wheels with the cp313t ABI and Py_MOD_GIL_NOT_USED flag set. If you happen to mutate a polynomial that is shared between threads then you may get a segfault or other corrupted data. If I was going to improve on this then I would firstly consider just making all objects immutable (basically deprecate non-thread-safe features).

ncoghlan · September 7, 2024, 4:33pm

This is actually an area that helps explain why the “free-threading or subinterpreters?” parallel execution question ended up being resolved as “Let’s pursue both ideas, since their respective strengths can be used to compensate for the other’s weaknesses”

The relevant capability for this discussion is the face a free-threaded main interpreter can spawn a GIL-protected subinterpreter to manage the problematic modules, or else a GIL-protected main interpreter (in the free threaded build) can push the CPU bound computation threads off to a dedicated subinterpreter that runs in free threaded mode and either limits itself to thread-safe dependencies, or else adds some form of external thread safety protection.

oscarbenjamin · September 8, 2024, 4:43pm

There are different meanings of thread safe and of “safety” in general. Python can have thread-safety issues but is in general a memory-safe language. The purpose of the GIL is to provide that memory safety. Python programmers are not generally used to segfaults or totally undefined behaviour which is the kind of thread-safe that applies when talking about extension modules and Py_MOD_GIL_NOT_USED.

The situation now is that with python-flint you can in one thread set an element of a matrix:

M[0,0] = 2

In another thread you can compute the determinant of the matrix:

d = M.det()

Currently the GIL ensures that these two operations take place one after another although the order is undefined. This means that you might get d = 10 or you might get d = 20 but exactly one of those possibilities will occur and Python won’t crash or anything.

Without the GIL what can happen is that M[0,0] = 2 might trigger a call to realloc under the hood and then the code in the det function finds itself wandering around deallocated memory. This is now completely undefined behaviour: you can get pure garbage bytes from memory or you can get segfaults occurring either immediately or possibly much later as a result of memory corruption. It is impossible for me to give any real constraints on exactly what sort of bad things may or may not result from this although usually the outcome is either that it happens to work fine or that you get a segfault.

There absolutely will be an expectation that extension module authors do not generally allow that kind of unsafe behaviour regardless of whether the user is doing something that clearly involves data races. For now though free-threading is experimental and so I am happy to put out the wheels and let people try them even if I know that they might crash with segfaults. I also feel reassured that this is a reasonable approach since it is mentioned in the only document I have seen that provides meaningful advice for what extension module authors should do.

Does simply setting Py_MOD_GIL_NOT_USED and then uploading the wheels qualify for a “free-threading supported badge” even if it is expected that some multithreaded usage can result in completely undefined behaviour?

dpdani · September 8, 2024, 6:17pm

I would discourage you from doing that, but it is completely up to you, because it’s your package.

A package that doesn’t work for Python 2 is not prevented from checking the Python 2 classifier.
Of course, Python 2 users would complain, and the same reasoning applies here.

mikeshardmind · September 8, 2024, 8:53pm

Extensions that set this value for the Py_mod_gil slot and know that they are not actually safe are erroneous in setting it. There should not be a case where it is expected that multithreaded use can result in undefined behavior.

The documented meaning is

The module is safe to run without an active GIL.

And the interpreter relies on this value during import to determine if interpreters need to be paused and the GIL re-enabled for this to be safe to use.

oscarbenjamin · September 8, 2024, 9:56pm

Okay well see above where I quoted the intention for NumPy as stated in the only document that I have seen that gives any meaningful advice to extension authors for how to handle this:

For NumPy, we are generally assuming users will not do pathological things like resizing an array while another thread is reading from or writing to it and do not explicitly account for this.

What this means is: total undefined behaviour if an array is resized in one thread while any other thread uses that same array for anything.

If someone wants to provide some better advice for how extension module authors should handle this then go ahead and please also do all the work to actually make it supported like:

Adding object lock helpers in Cython/CPython
Creating utilities that can test likely issues in multithreaded contention.
Documenting how to use all of these things.
Demonstrating how to benchmark potential performance impact.
etc.

In the meantime I and others will ship versions of packages that work fine for most users on the new free-threaded builds. Putting the packages out now while free-threading is explicitly considered experimental gives a chance to collect early feedback.

mikeshardmind · September 8, 2024, 9:59pm

That makes your module less useful to users, not more. The interpreter has this safeguard for a reason during the transition period.

You can observe by contrast, the other definition there:

Py_MOD_GIL_USED
The module depends on the presence of the global interpreter lock (GIL), and may access global state without synchronization.

The part that applies here is accessing state without synchronization.

oscarbenjamin · September 8, 2024, 10:11pm

I think it makes the module very useful to most users who will not use the features that are not thread-safe (vast majority of users) and would otherwise be unable to install the module or to benefit from free-threading.

jack1142 · September 8, 2024, 11:36pm

Isn’t it enough for those users to set the PYTHON_GIL=0 env var / -X gil=0 flag to force disable the GIL despite the module not claiming to support it as mentioned earlier? This way you’d actually allow the users to make the decision of force-disabling the GIL by themselves which they’ll be informed they need to do:

(venv) [root@83157cc472c0 /]# python
Python 3.13.0rc1 experimental free-threading build (main, Aug  6 2024, 00:00:00) [GCC 14.1.1 20240701 (Red Hat 14.1.1-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cffi
>>> cffi.FFI()
<frozen importlib._bootstrap>:488: RuntimeWarning: The global interpreter lock (GIL) has been enabled to load module '_cffi_backend', which has not declared that it can run safely without the GIL. To override this behavior and keep the GIL disabled (at your own risk), run with PYTHON_GIL=0 or -Xgil=0.
<cffi.api.FFI object at 0x5b31c503490>
>>> 

(venv) [root@83157cc472c0 /]# PYTHON_GIL=0 python
Python 3.13.0rc1 experimental free-threading build (main, Aug  6 2024, 00:00:00) [GCC 14.1.1 20240701 (Red Hat 14.1.1-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cffi
>>> cffi.FFI()
<cffi.api.FFI object at 0x41d48503610>

You’d still need to build a variant of the wheels with a 3.13t tag (which you can do regardless of whether you depend on GIL or not) if you don’t want the users to have to fall back to building from source but at least you’re not lying to the interpreter and deciding for the user.

oscarbenjamin · September 9, 2024, 12:05am

Maybe. I’m not sure I understand all the mechanics here yet but:

Our users won’t want to set an environment variable. That sort of thing is not a good option. That’s fine though because as I said before free-threading is experimental: I don’t expect typical users to use it and if they do then I will advise them against it.

I’m not lying to anyone. The package is thread safe if used correctly.

The distinction between “thread safe” and “thread safe if used correctly” is the main problem with this whole thread. What use is it having a trove classifier if people don’t know that they have to use the library correctly in order to have thread safety? How does the trove classifier help them to know what usage of the library is actually safe?

mikeshardmind · September 9, 2024, 12:17am

You would be if you set the Py_mod_gil slot to say you weren’t relying on the gil. While you may not explicitly acquire it, you are currently relying on it, per your above examples. The correct thing to do here is not to tell the Python interpreter that you don’t rely on the GIL. If your users can’t set an environment variable, having Python segfault because you intentionally told the interpreter something inaccurate is something your users won’t be able to handle.

And I don’t think that this being experimental excuses intentionally exposing your user base to that problem, if you aren’t free-threading ready, then either don’t publish a free-threading wheel, or publish one that marks that it isn’t threadsafe in a no-gil world until you can achieve being ready.

The intention of the Py_mod_gil slot is that users aren’t exposed to this. Libraries are supposed to do the right thing here. If you require people to use it correctly or segfault, you should probably just remove the ability for it to be used incorrectly.

oscarbenjamin · September 9, 2024, 12:40am

That sounds easy when you say it in the abstract.

Many libraries have been developed for many years on top of the GIL.

The GIL now goes so your suggestion is that we no longer push binaries until absolutely 100% of all GIL-removal related problems are fixed?

I’m done with this discussion. People who would like extension module authors to take a different approach should start a thread about that and actually look at what different libraries are going to do and what the issues are. Here it feels like I am talking to people who have no idea why I can’t just magically make everything threadsafe which is beyond tedious.

jack1142 · September 9, 2024, 12:47am

What about the interpreter’s command line option? You probably already have to change what executable you start, when using free-threading build so adding -X gil=0 arguments doesn’t really seem like something that would be a problem. As you said, it’s experimental so the users already have to do things they typically wouldn’t have to - install an experimental build of Python (in some cases needing to build it themselves or change how they install it, if the way they used to use doesn’t support free-threading builds, e.g. Python Docker images) so they already have to perform additional steps - why is adding a command line option (or env var) the deal breaker here?

mikeshardmind · September 9, 2024, 2:48am

That’s not what I suggested at all. You can continue to ship a wheel that is importable on 3.13t, but does not tell the interpreter it is safe with free threading, and only inform the interpreter it is safe to keep the gil disabled once all of those issues have been ironed out.