Extension modules with static types and Python embedding and subinterpreters

Historically Python leaked a lot of references at exit and this wasn’t a problem as the kernel will clear the memory when the process exits. However, this is a problem for embedders and multiple interpreters aka subinterpreters which in embedding case will leak memory whenever Python is initialized and finalized one or more times in the same process or in subinterpreters case if an object survives these initializations, it can cause crashes or create worse two singletons and one of them can be dangling.

There have been prior work to fix this by porting over static types from extension modules to heap types. See _multiprocessing leaks references when initialised multiple times · Issue #94382 · python/cpython · GitHub for a recent issue but it was suggested to discuss this broadly. PEP 687 was accepted to isolate modules in the stdlib but it does not explicitly apply to this issue. So the main point of the discussion is that “Is it okay to cover this issue under PEP 687 and port extension modules static types to heap types” ? Any feedback or comments are appreciated.

See bug reports related to this issue:
https://bugs.python.org/issue45691
https://bugs.python.org/issue46417

See prior work on this issue:

2 Likes

As far as I can see, this is a one-time memory leak: it happens once per process (even with multiple Py_Finalize/Py_Initialize), so the memory leak is not a problem for embedders and multiple interpreters aka subinterpreters.

So, the reason for this change is a bit weaker (though I still think it’s worth it):

  • it makes tools like -X showrefcount and valgrind more useful, by removing false reports.
  • it avoids objects held by the class leaking into unrelated interpreters – see gh-45691 – though arguably this could also be solved by making more objects static.
3 Likes

Let me clarify the issues caused by this issue:

  • Python leaks memory at exit even if it is initialized only once. The multiple initialization leak is less of an issue now as Python finalizes even static types in main branch. (although that has it’s own issue, what happens if there is a static type inheriting a built-in static type?)

  • If I am a c extension author, I cannot rely on valgrind or python -X showrefcount as its count is unstable as the memory blocks allocated by it is random. You can try it with the reproducer I posted on issue.

  • it causes issues with sub interpreters, if a static types holds a ref to a per interpreter object, then in case of multiple initialization it can cause crash or worse two singletons or freed memory. (there are even more issues when gc is involved)

  • For per interpreter GIL, in order to avoid races the static types must be per interpreter which means heap types.

  • There are more issues… I’ll update this list as I get some more time.

  • it avoids objects held by the class leaking into unrelated interpreters – see gh-45691 – though arguably this could also be solved by making more objects static.

It can be and that’s what Eric (and I) have done by statically allocating these singletons but that solution is only for builtin objects or caches by making them effectively immortal and Python must take over their initialization and finalization. Without Immortal objects there is some book-keeping required making that option a bit hard to understand (at least on first try).

It doesn’t “leak”. It just allocates a fixed amount of memory that’s never freed – but (as has been argued over and over in the thread already) it’s a fixed amount (O(1)) so you can just add this to the total amount of memory Python needs. Regarding a static type deriving from a built-in static type, why would finalization of the subclass affect the base class? If you think there is an unwanted effect (other than more unfreed memory), can you find an example

This is a real concern and to me this is the main reason why I think we will have to do something.

But clearly static types should never hold references to per-interpreter objects. If you know of any occurrences, feel free to open bugs for them and send PRs to fix those bugs.

Or immortal, right? I’m quite hopeful that immortal objects (PEP 683) will be accepted. We shouldn’t do anything to fix issues that would be fixed by immortal objects, at least not before that PEP has been rejected.

3 Likes

I am also hopeful about immortal objects, but it would not solve the this problem. Consider this sequence:

  • Create main interpreter
  • Main interpreter initializes static type named X
  • Create two sub-interpreter as A and B
  • Both sub-interpreters try to subclass X
  • While creating subclasses both try to to mutate X’s tp_subclasses slot of the main interpreter, here is the data race. Also note that this isn’t the only race condition, it is the most obvious one.

Immortal objects will only solve the issue for refcnt not other races. If this static type were to be converted to heap type then both sub-interpreter can subclass the type independently without race condition as each sub-interpreter will have its own independent copy of the type.

2 Likes

I’ll not use the term leak anymore. Let’s assume that the fixed memory is by itself not a huge problem, but it is a problem when in an embedded application you configure the memory allocator. See PEP 445
Consider this sequence of a machine learning software which uses Python libraries and allows the user to change the allocator for performance (any other reason maybe to be use huge pages on linux).

  • Initialize Python to create main interpreter.

  • Main interpreter initializes static types with the default memory allocator.

  • Read a Python config file, exec it to load the configuration.

  • Finalize Python.

  • User wants to change the allocator to something different, Note Python is currently holding memory allocated by the previous allocator in static types.

  • Initialize Python again but this time main interpreter will not initialize the static types as it is already initialized.

  • The application tries to create subclass with the static types, Python want to resize tp_subclasses (again most obvious slot, there could be more) to create some space for more weakrefs.

  • Python reallocs with the new allocator. <— Crash as the memory is not allocated by the new allocator.

If the static type was converted to heap type, then the user can change memory allocators as all the memory from the previous allocator will be freed, but currently this can cause crash or worse undefined behavior.

1 Like

While there’s only one GIL, that race is prevented by the GIL. We should reconsider this once we introduce a GIL per interpreter. @ericsnowcurrently What are your plans for tp_subclasses?

2 Likes

Ah, this is a good point, thanks for bringing it up!
Neither PEP 445 nor the current docs mention when it is safe to change the allocator. It makes sense that it can be called between Py_Finalize and Py_Initialize. The Py_Initialize caveats hint that it’s a bad idea (in production), but that’s arguably the bug we’re trying to fix here.
IMO we should add notes to PyMem_Set*Allocator saying that the functions must not be called when an interpreter is active (between Py_Finalize and Py_Initialize), and that calling it after Py_Finalize is dangerous (with the same caveats as in Py_Finalize – it should theoretically work, but support for it is buggy in CPython and many third-party extensions won’t support it – except unlike Py_Finalize, crashes are likely rather than just memory leaks).
(Now just to word that affirmatively…)
@vstinner, does that sound right?


That’s a more general issue, and as Guido says it doesn’t need solving until we move to per-interpreter GIL.
When it does come up, it will need to be solved for object, and if it’s solved by converting object to a heap type, then we need to get rid of static types entirely (as they can’t inherit from heap types).

Is one of the less obvious races possible even with the global GIL lock?

Python 3.8 added the concept of “Python pre-initialization”. A Python process is pre-initialized exactly once. It means that Py_PreInitialize() can only set memory allocators exactly once.

Well, you can call PyMem_SetAllocator() between Py_Finalize() and Py_InitializeEx(), but IMO it doesn’t work because Python leaks memory and objects in Py_Finalize(). Once Python will no longer leak memory/objects, it will be fine.

I think so although the chances are very low because of the GIL. Even with GIL, there are race conditions possible when gc, finalizers and threading is involved. See Crash when inspecting frame of generator · Issue #94262 · python/cpython · GitHub for an unrelated case to this issue but how gc finalizers can corrupt the internal Python VM state. It is kind of a race condition when a finalizer initializes a frame whereas the VM expects it to be uninitialized.

The plan is to basically drop tp_subclasses and store that list in PyInterpreterState instead. Then __subclasses__ for static types would be a getter that looks it up on the interpreter.

There shouldn’t be any other problems like this with static types. IIRC, we don’t need to worry about tp_dict (already effectively immutable).

This will only fix the race condition when the static type is ready.

Consider this sequence:

  • Create main interpreter.
  • Create two sub-interpreters A & B.
  • Both try to import a module with static types. (Note: the module is not imported in main interpreter so types are uninitialized)
  • Both interpreters call PyType_Ready on the static type. <— Data race

This race would not be solved by making tp_subclasses per-interpreter.

We’ve come back around to your original question, more or less. :slight_smile: Extension modules must use heap types to work properly under multiple interpreters. This is discussed in PEP 684 and PEP 630.

Per PEP 684, we will keep most/all the built-in types (incl. exception types) as static; they are exposed in the C-API and keeping them static (and making them immortal) is so much simpler than the alternatives.

1 Like

FYI, I’ve created gh-94673 for this and am working on it. (I thought I’d create an issue already but apparently not.)

1 Like

+1, for builtins and exceptions static types, heap types is not the solution and I am against such a change myself because of complication and backwards compatibility issues.

1 Like

The discussion seems conclusive that heap types is the only viable solution here which fixes all the issues discussed here.

@encukou Do you have any other comments/feedback ?

1 Like

Sounds like the discussion ran its course.
I recommend reviving the itertools PR, for which Raymond had objections before. The code churn & performance situation should be better now, but it still might uncover issues to keep in mind for all the other modules.

3 Likes