Debugging a possibly memory leak

I think there might be a memory leak when using PyDict_SetItemString versus PyDict_SetItem. (specifically this line which has a dubious comment by it).

My reason for thinking it’s a leak is from using the Memory Profiler package, which I was using to investigate a horrendous C extension I was writing.

I have created a minimal example which tests this out, and if I comment out the above line, the apparent leak does go away. However this isn’t a particularly robust approach, and isn’t helpful for writing tests to confirm a fix does indeed solve the issue. All tests currently pass with and without that line.

I tried using Valgrind but it didn’t seem to show any differences in the various cases I tried, not that I know what I’m doing on that front.

I would appreciate any tips on how to debug this, confirm it is/isn’t a memory leak, create some tests,and also check there are no side-effects to removing that line. Thanks!

I would run the code under a debugger and check that the refcounts make sense.

I don’t think there is one. Looking at that code there is only a single object created via PyUnicode_FromString() and I don’t see a code path where that function exits without that object getting a Py_DECREF called on it.

Now, the reason I bet the memory profiler you’re using thinks there is a leak is interning a string basically makes it live forever. So with a ton of keys in dicts you can lead to a lot of strings being kept around. Whether that is best or not is an open question, hence the comment.

  1. Remove the line
  2. Recompile
  3. Run the test suite
  4. Profit! :wink:

Basically there’s not going to be a better way to verify there aren’t any adverse affects. But you will probably want to run before and after to see how it affects things.

Also, take into account that using memory profilers with pymalloc activated will yield, at the very least, confusing results. Python by default does not return memory to the OS until some of the least granular internal structures that it uses for managing memory (arenas) are completely free. This means that technially you may see the allocation but only see the deallocation much later (or never).