While using the Python programm Xpra we came across a bug which might be a Python or a NumPy issue.
Perhaps some of you can help us understanding some internals.
Calling import numpy
at the same time in two different threads of a Python program can lead to a race-condition. This happens for example with Xpra when loading the encoder nvjpeg:
2022-03-20 12:54:59,298 cannot load enc_nvjpeg (nvjpeg encoder)
Traceback (most recent call last):
File "<pythondir>/lib/python3.9/site-packages/xpra/codecs/loader.py", line 52, in codec_import_check
ic = __import__(class_module, {}, {}, classnames)
File "xpra/codecs/nvjpeg/encoder.pyx", line 8, in init xpra.codecs.nvjpeg.encoder
File "<pythondir>/lib/python3.9/site-packages/numpy/__init__.py", line 150, in <module>
from . import core
File "<pythondir>/lib/python3.9/site-packages/numpy/core/__init__.py", line 51, in <module>
del os.environ[envkey]
File "<pythondir>/lib/python3.9/os.py", line 695, in __delitem__
raise KeyError(key) from None
KeyError: 'OPENBLAS_MAIN_FREE'
Here the environment variable OPENBLAS_MAIN_FREE is set in the numpy
code:
numpy/core/__init__.py#L18
and short after that it is deleted
numpy/core/__init__.py#L51
But this deletion fails …
Xpra uses multipe threads here - perhaps import numpy
runs at the same time in two threads leads Python to call the initialization twice.
Shouldn’t Python protect us by design?
So, my current hypothesis (and I have briefly checked the Python code) is that Python does not do manual locking. But it effectively locks due to this going into C and thus holding the GIL. But somewhere during the import of NumPy, NumPy probably releases the GIL briefly and that could allow the next thread to go into the import machinery.
[..]
NumPy may be doing some worse than typical stuff here, but right now it seems to me that Python should be protecting us.
Can anyone comment on this?
Best regards,
Jens Henrik