Hi all,
@eric.snow and I would like to give some updates on the PEP, following-up from the last update that we’ve sent out. For context, this is the reference implementation: https://github.com/python/cpython/pull/19474
Since then, PEP683 has been accepted with conditions. We’ve been able to successfully address all the required conditions within the control of the PEP implementation. There are some edge cases that might not be fully handled but we’ve noted that below and these are already being tracked as separate issues that can already be addressed upstream.
PEP Conditions Addressed:
-
Reset refcount in tp_dealloc: Partially Done. We’ve added the tp_dealloc checks as much as we could in the existing immortal objects. Some cases were not being addressed (yet) such as deep frozen objects. This is already being tracked in: Do Not Allow Static Objects to be Deallocated · Issue #101265 · python/cpython · GitHub. One this issue is resolved, we can fully address this point.
-
Types without tp_dealloc checks may not be immortalized: Partially Done. Refer to the point above.
-
Benchmark results: :1.03x slower Geometric Mean on the PyPerformance suite using GCC11.1 (results might vary, specially if using older compilers).
Details on Performance Measurements:
The results of 1.03x slower are valid as of 01/28/2023, this measurement was done on the immortal hash: a748e80 vs the baseline hash: 666c084. This Python was compiled on a Linux machine using GCC11.1. Note that the last time that this was measured was in October 2022 and the measured performance there was 1.02x slower Geometric Mean.
Regarding the increase from 1.02x → 1.03x, there’s two things to consider. First, is that the addition of checks in tp_dealloc, have added a slight performance cost, which were not considered in the past. Second, since this PR started to today, there have been many improvements to the runtime performance (particularly in the interpreter). Given that IncRef and DecRef have a constant overhead, this overhead becomes more pronounced as the runtime becomes more efficient. Therefore, all the improvements that have been done to reduce the runtime performance cost of immortalization have, so far, served mostly to maintain the same regression, making this a moving target.
Future Work:
While there is still a performance regression, there are more things that can be done (and they each have to be analyzed with their pros and cons). Here’s a list of three potential follow-ups that can happen after this PR. I will avoid giving any specific numbers on potential improvement as the end result will greatly vary once all the considerations for each of these opportunities are taken into account.
- Make Code Objects immortal and remove the expensive DecRef in the interpreter loop.
- Immortalize the startup heap, reducing the overall cost of all the known objects to be alive before any user code is executed.
- Create a GC “Gen3”, by immortalizing and moving objects that have lived in Gen2 for a while to the permanent generation. This will improve the performance of large/long-lived applications that end up using a constant set of common objects that live throughout the entire execution of the runtime.
Unicode Object Leaks:
Currently, the runtime does not have a strict guarantee that it does not leak object (and more specifically unicode objects). There’s already an open bug about it: Leaks on Python’s standard library at runtime shutdown and unfortunately, fixing this is not trivial as we need to track down the source of 1000+ refcount mismatches.
Under this condition, then it is impossible to correctly deallocate immortal string objects at runtime shutdown. This is only an issue in embedded programs that re-initialize the runtime multiple times. That is, if we have a reference that did not let go at runtime shutdown (the leak), then cleaning up immortal string objects means that these references will now be invalid in the next runtime initialization, causing a segmentation fault.
Therefore, for the implementation in the PR, we have guarded the ability to correctly clean up all the immortal string objects with an ifdef until the bug is closed and we can have a strict guarantee around leaks from the core runtime and standard library.
Instagram Usage:
As an extra note, we have been able to successfully run Instagram using the current upstream patch (sans the recent tp_dealloc changes). We did this in a two-step approach, the first, updating only the core runtime (without updating all the C-Extension with the new headers), and second, recompiling our thousands of extensions. Note that our extensions are being used as-is, without the explicit usage of the Limited API. Even under this scenario and without the tp_dealloc fixes, we never reached a scenario where we reached an accidental deallocation of an immortalized object.
Context on Implementation Details:
This is meant as helper notes for people reading through the implementation details and might be puzzled by some of the introduced changes.
- Include/Python.h: string.h had to be included because we now use memcpy in object.h
- Modules/gcmodule.c: These changes are to make sure that we correctly handle objects that reach the maximum (immortal reference count). These will be excluded from the cycle detection (which requires a correct reference count value to work) and moved into the permanent generation.
-
Objects/bytes_methods.c: For some reason, single character
istitle
checks were not working on windows applications with this change. This change just removed the one special case but maintains the correctness. -
Lib/test/_test_embed_structseq.py: Given that we have unicode objects leaks (I.e in
import unittest
), running this test as an embedded test multiple times will cause the application to crash. By adding manual assertions, we are able to get all the tests passing with the same checks, but it avoids the leaking library unittest. -
Lib/test/test_venv.py: In address sanitizer mode, given that we now “leak” due to the unicode problems, the
check_return
causes the test to fail. This just updates the assumption to still maintain the essence of the test until we fix the Unicode Leaks.