Is it time to remove TraceRefs special build? (Remove `PYTHONDUMPREFS` and `sys.getobjects()`.)

vstinner · October 11, 2024, 9:54am

Hi,

Since Python got immortal objects in Python 3.12, we are facing more and more issues about immortal objects shared between (sub)interpreters in the special build TraceRefs: ./configure --with-trace-refs.

Eric Snow, Neil Schemenauer, Petr Viktorin, me and others are doing their best to make sure that TraceRefs continues to work, but the implementation becomes more and more complex. See the latest bug: gh-125286.

In 2022, when I asked who uses PYTHONDUMPREFS env var, no one replied. I understand that nobody uses the TraceRefs build.

Are you ok to remove this TraceRefs debug feature in Python 3.14? There are other better debugging tools to debug memory issues.

Victor

pitrou · October 11, 2024, 10:01am

I don’t think I’ve found this option ever useful, so +1 for removing it.

ZeroIntensity · October 11, 2024, 12:45pm

+1. I don’t use TraceRefs when debugging reference leaks–there’s too much noise for it to be useful. Generally, -X showrefcount or some external tool (such as Memray) is much better at finding them. It’s also been problematic for the free-threaded build, because it’s not thread safe.

FWIW, I’ve been working with @Eclips4 on designing a better tool for finding leaks, that doesn’t rely on object lifetimes (so no silly immortality and crossinterpreter problems).

Eclips4 · October 11, 2024, 1:44pm

+1. I think that continuing to support trace-refs is not worth the effort.

kumaraditya303 · October 11, 2024, 1:47pm

+1 to remove it, it’s too much complexity for little gain

barry-scott · October 11, 2024, 1:52pm

Is trace refs needed to support sys.getobjects()?
I have used sys.getobjects() to track down object leaks in the past.

pablogsal · October 11, 2024, 3:16pm

Yes. This will indeed invalidate using the linked list to support objects not tracked by the gc. I have also used this in the past to track empty dictionaries that were leaked as these are not tracked by default by the GC

pablogsal · October 11, 2024, 3:19pm

If we remove this mode, I would like a replacement mechanism to be able to know what objects are being leaked when reference leaks are detected, because if these objects are not tracked by the GC it makes it very difficult to do, specially at finalisation.

nas · October 11, 2024, 4:04pm

Would -fsanitize=address cover that case? It should tell you when memory is leaking, whether it be objects or some other memory. To actually look at the object type or attributes you would have to load a trace into a debugger (I would use rr).

Making Python run cleanly with -fsanitize=address would take some additional work but perhaps we are not too far from it running without warnings.

vstinner · October 11, 2024, 4:12pm

So far, I only used python -X showrefcount counter and then I modified the source code to bisect until the quantity of the code that I have to read is low enough. I basically read the code to understand which objects are leaking. It’s not an automated method, but it’s better than nothing (and it worked for me).

I tried multiple times to use TraceRefs to understand which objects leak at finalization. But PYTHONDUMPREFS didn’t help me, the output is long and not useful. It doesn’t tell me where memory was allocated. I don’t know how to use sys.getobjects() to identify leaks at finalization. At least, it never helped me.

Do you have a concrete example showing how TraceRefs can be used to fix leaks at finalization? Which kind of information does it provide? Statistics per memory address or type name?

Sometimes, tracemalloc helped me to identify which code leaked which memory: identify where (filename + line number) the leak occurred.

barry · October 11, 2024, 5:06pm

+0 from me, but I ask that if you do remove it, to please ensure there’s documentation about alternatives and replacements for users who may need to debug memory issues.

pitrou · October 11, 2024, 8:42pm

Doesn’t that imply that any third-party extension module has to be recompiled with -fsanitize=address as well? Or am I misremembering?

ngoldbaum · October 12, 2024, 10:11am

No, I don’t think so, since NumPy is happily running clang ASAN in CI using a normal Python build. You just need to e.g. LD_PRELOAD the ASAN runtime library. Meson made this trivial to set up for NumPy.

pitrou · October 12, 2024, 2:32pm

I’m a bit skeptical. ASAN relies on compile-time instrumentation (otherwise -fsanitize=address would not need to exist). If you merely preload the ASAN runtime library without recompiling CPython, I’m afraid you could potentially miss some errors involving C-level interactions between NumPy and CPython.

I’m not an expert though, and I cannot find any authoritative information on this.

ngoldbaum · October 12, 2024, 2:53pm

Sorry for being unclear, all that’s needed on top of building numpy with ASAN support is the runtime library.

See the CI workflow: numpy/.github/workflows/linux_compiler_sanitizers.yml at main · numpy/numpy · GitHub

This is only to test for issues inside of NumPy itself, I agree this would miss issues that involve interaction with the CPython C API.

Still, that alone is very useful, it’s caught a number of issues while working on C or C++ code inside of NumPy during the PR stage.

nas · October 13, 2024, 8:44pm

Since Pablo has a definite use for the TraceRefs build and we don’t have (at least at this time) a direct replacement for it, I think we should keep it for now. Some issues with ASAN as a replacement:

the platform compiler needs to support it. In contrast, TraceRefs should work anywhere Python builds.
Python is not yet clean in terms of ASAN warnings and 3rd party extensions unlikely to be clean as well.

I see that --with-trace-refs and --disable-gil are not compatible. So unless we fix that then if free-threading is the future then TraceRefs will eventually be going away.

colesbury · October 13, 2024, 10:31pm

I think we can support sys.getobjects and PYTHONDUMPREFS in the free threading build. I haven’t done it yet because I’m lazy and, like Victor, I never found it very useful for identifying leaks, but if Pablo yells at me, I’ll do it.

pablogsal · October 15, 2024, 6:38pm

I don’t oppose to the change if this is a maintenance pain, I just want to be clear that there are use cases for this functionality since a lot of other people were +1 on the basis that they have not used the mode. The key uses difference with sanitisers its to be able to find what specific objects are leaked (including finalisation) instead of just “memory blocks” or limiting to things that appear in gc.get_objects().

If we collectively thing this is not a good use of our time to maintain it I am ok with it but I just wanted us to make an informed decision and not removing it based on “nobody uses it”

pablogsal · October 15, 2024, 6:40pm

Oh, I would never I can be slightly annoying at most

encukou · October 23, 2024, 11:22am

PR #125709 (which I think should go in) proposes a change to -X showrefcount and sys.getobjects(): interpreters that share the object allocator state also share the list of objects. This reflects the actual situation: due to some existing C API, there are some objects for which we can’t tell in which interpreter they belong.

Generally, if you use an object from a different interpreter, bad things can happen. But, examining the list to debug leaks should be relatively safe.