I’m doing analytics on multiple images in parallel (around 100 , then another 100…)
If I use standard object references I fill my ram then start swapping which is OK for a while… I do want to access some of the 1st 100 again. I set the images (numpy arrays) as weakrefs and now python enthusiastically chucks them away when my ram is less than 30% used (I have 16Gb of RAM).
Can I tell it to hang onto weakrefs until much more of my ram is used?
Weak references are references to objects that do not count toward the reference count used to garbage collect memory in Python. Thus your images are garbage collected as soon as there isn’t anything referencing them (other that the weak references). Memory pressure does not enter anywhere on the decision on whether to garbage collect objects. If you want to avoid the objects to be garbage collected you need to hold onto them via a (non-weak) reference.
The Python Ideas area is for exploring possible future changes to Python itself. General questions should go in the Users category (I’ve moved it over).
Ah! I misunderstood the point of weakref. Looks like I need an lru cache, but object size based…
What would it then do if you happen to still need an image while it has already been collected?
The normal approach would be to keep a reference to an image just as long as you need it. Then you’ll always use just as much of the RAM as you need.
Well I’m running interactively in jupyter lab, and I don’t always know where I’m going next.
I have found a good tool though - based on functools.lru_cache. https://gist.github.com/wmayner/0245b7d9c329e498d42b
I’ve found it holds the last 150 images in store without the need to use swapspace