I am trying to download a bunch of files from AWS S3 and then processing them. I used the concurrent futures ThreadPoolExecutor.
Here is my code:
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
futures_to_filename: Dict[concurrent.futures.Future, str] = {}
for filename in _glob_files_for_seven_days():
futures_to_filename[
executor.submit(self.process_file, filename)
] = filename
if len(futures_to_filename) > _MAX_FILES_TO_PROCESS_CONCURRENTLY:
# Process futures here
_process_futures(futures_to_filename)
_process_futures(futures_to_filename)
def _process_futures(futures: Dict[concurrent.futures.Future, str]):
for completed_future in concurrent.futures.as_completed(futures):
futures.pop(completed_future)
_MAX_FILES_TO_PROCESS_CONCURRENTLY is a parameter I am using to make sure that the memory doesn’t blow up.
By simply increasing the parameter _MAX_FILES_TO_PROCESS_CONCURRENTLY and max_workers, I get an error that says:
- /usr/local/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
Need some ideas on how I can debug? Also, this only happens on Linux machines and I haven’t seen this issue locally on my Mac.