Distinguish application PyObject with VM PyObject?

Hi, I’m wondering if there’s a way to distinguish between Python application-created PyObjects and interpreter-created (or VM) PyObjects? I am doing some system-related work and only want those application objects’ references. My initial thought is there isn’t, given the following example:

The PyList_New(Py_ssize_t size) API is called when you create a list object in Python program (application obj):

>>> list_obj = []

But it’s also being called to create a lot of internal VM objects such as GCState->garbage here when initializing the interpreter. In other words, the garbage collection modules tracks itself. But it makes sure it’s not been freed until Python finishes execution.

Given this, either reference to PyList_New() doesn’t distinguish whether the object’s “affiliation” is. And under the hood, the memory management module always assign the first free_block to the request PyObject during an allocation.

Thus, I would like to ask if there’s really no such way to distinguish application objs and VM objs, either physically (I doubt) or logically?

Your initial thought is right. I don’t think there is any clear boundary between “application” and “VM” objects.

What are you doing to gather all the object references you want? If there’s a specific API you are using then you could use that API right after runtime initialization has finished, right before any of your app code starts running. Then you subtract that collection from what you gather later. Of course, that only helps if you want to ignore just objects created during runtime init.

I want to obtain all application objects then do some filtering do identify those are frequently accessed. Thus, I want to pre-filter out those “VM” objects (which I shouldn’t touch). What you suggested does make sense. Regarding those VM objects created after interpreter initialization, do you think there are a lot of them? Or should I say, depends on application (e.g., PyThreadState objects).

While it doesn’t directly answer to your question, the tracemalloc API has an optional “domain” argument. It’s possible to categorize memory allocations in “domains”. For example, separate memory blocks allocated in a GPU.

Python 3.13 has a new optional mimalloc allocator and PEP 703 rely on that to “filter objects”. mimalloc makes it possible to list Python objects, whereas tracemalloc doesn’t allow that “directly”.