Using multiprocessing to query a bunch of IPs for hostname

jsb · November 9, 2023, 11:53pm

I’ve written some code that pulls a list of IPv4 subnets from a config file and then using Python’s ipaddress module, iterates over each IP in the list, trying to connect via SNMP to get the device’s hostname. I’m looking for routers, switches and the like. This works well, but it is very slow to iterate over the 4,500+ IPs in those ranges.

So I figured I’d use the multiprocessing module to run many of these in parallel. My first problem was that I didn’t want Python attempting to create thousands of subprocesses all at once. Right now I’m using a pool to accomplish this, and it works, but it doesn’t appear to be any faster than running the tasks in serial. The intent was to run only so many sub-processes at a given time while the list of total IPs in need of querying eventually whittles down to none.

My current code looks something like this:

import multiprocessing as mp
from itertools import product

and then later…

pool = mp.Pool()
results = pool.starmap(getHostname, product(addrs, cfg))

In this case, getHostname is the function, and addrs is a list of IPs that I want to iterate over. The cfg object has to be passed to the function because it contains the SNMP community name that the getHostname function needs to make the query. The getHostname function returns a tuple of both the original IP and the acquired hostname (or None if the action was unsuccessful).

I’ve fiddled with params to mp.Pool() such as processes, but it doesn’t seem to make a difference. I should note that this is running on a Linux VM. Again, the code runs fine, but it is SLOW – it takes many hours for this to complete. I would expect to be noticeably faster than the serial version of the same script.

Am I on the right track? Should I be approaching this from an entirely different perspective? I don’t profess to be a guru in this area, so if there’s a better way, I’m all ears.

Thanks!

kknechtel · November 10, 2023, 12:07am

product(addrs, cfg) means to make every possible combination of one element of addrs, and one element of cfg. If cfg is for example a dictionary:

>>> from itertools import product
>>> addrs = ['example.com', 'example.org', 'example.net']
>>> cfg = {'x': 1, 'y': 2, 'z': 3}
>>> list(product(addrs, cfg))
[('example.com', 'x'), ('example.com', 'y'), ('example.com', 'z'), ('example.org', 'x'), ('example.org', 'y'), ('example.org', 'z'), ('example.net', 'x'), ('example.net', 'y'), ('example.net', 'z')]

It iterated over the keys of the dict.

If you want cfg to be a complete object that is the same for each call, you can wrap it in a 1-tuple:

>>> list(product(addrs, (cfg,)))
[('example.com', {'x': 1, 'y': 2, 'z': 3}), ('example.org', {'x': 1, 'y': 2, 'z': 3}), ('example.net', {'x': 1, 'y': 2, 'z': 3})]

Or you can use one of several different techniques to “bind” the cfg value to getHostname, and then just use pool.map since there is only one iterable to map:

results = pool.map(lambda a: getHostname(a, cfg), addrs)

Rosuav · November 10, 2023, 12:08am

MOST of your time is spent waiting for SNMP. Or, more likely, waiting for responses that will never come. Parallelism may help, but multiprocessing just adds overhead. I would recommend using asyncio here if you can, as it scales well to huge numbers of parallel requests; but you may simply find that you are being rate-limited by something on your network, and nothing you do can speed it up.

barry-scott · November 10, 2023, 8:33am

Not sure how you are probing, but saw this SO How to auto-detect snmp devices using C/C++? - Stack Overflow
It shows use of nmap to find out which SNMP ports are open.

I would be sending a UDP probe to port 161 of each IP using async and watch to see if that gets a response. You can send to lots of hosts at the same time. Of course as UDP is not reliable you would need to retry hosts that do not respond.

jsb · November 20, 2023, 4:21pm

Thanks everyone – I appreciate the feedback. I decided to take the multiprocessing code out of the script but tweaked the timeout on the SNMP polling. That has proven to be a bit more reliable than what I had, and actually a bit more performant.

Topic		Replies	Views
Use multiple lists to collect multiprocessing results with one callback function while using python multiprocessing module pool.apply_async function Python Help	1	3659	January 14, 2021
Use multiprocessing module to handle a large file in python Python Help	3	5160	January 12, 2021
How to use multiple parameters in multiprocessing Pool? Python Help help	3	4060	April 11, 2023
Add `filter()` method for `multiprocessing.Pool` class Ideas stdlib	7	1524	October 16, 2022
Usage of multiprocessing pool map vs imap Python Help documentation , help	0	694	July 31, 2023

Using multiprocessing to query a bunch of IPs for hostname

Related Topics