I’ve written some code that pulls a list of IPv4 subnets from a config file and then using Python’s ipaddress module, iterates over each IP in the list, trying to connect via SNMP to get the device’s hostname. I’m looking for routers, switches and the like. This works well, but it is very slow to iterate over the 4,500+ IPs in those ranges.
So I figured I’d use the multiprocessing module to run many of these in parallel. My first problem was that I didn’t want Python attempting to create thousands of subprocesses all at once. Right now I’m using a pool to accomplish this, and it works, but it doesn’t appear to be any faster than running the tasks in serial. The intent was to run only so many sub-processes at a given time while the list of total IPs in need of querying eventually whittles down to none.
My current code looks something like this:
import multiprocessing as mp
from itertools import product
and then later…
pool = mp.Pool()
results = pool.starmap(getHostname, product(addrs, cfg))
In this case, getHostname is the function, and addrs is a list of IPs that I want to iterate over. The cfg object has to be passed to the function because it contains the SNMP community name that the getHostname function needs to make the query. The getHostname function returns a tuple of both the original IP and the acquired hostname (or None if the action was unsuccessful).
I’ve fiddled with params to mp.Pool() such as processes, but it doesn’t seem to make a difference. I should note that this is running on a Linux VM. Again, the code runs fine, but it is SLOW – it takes many hours for this to complete. I would expect to be noticeably faster than the serial version of the same script.
Am I on the right track? Should I be approaching this from an entirely different perspective? I don’t profess to be a guru in this area, so if there’s a better way, I’m all ears.
Thanks!