Adding get_many() to Mapping


(Daniel Pope) #1

Reading PEP-584 reminds me that a few times I have wished for a few bulk operators on dict (indeed, collections.abc.Mapping) - operations that PEP-584 doesn’t cover.

Might something like this fly as a PEP?

Proposal

d.get_many(keys) would return a new mapping that contains any items in d whose keys are in keys, equivalent to

K, V = map(TypeVar, 'KV')

def get_many(self: Mapping[K, V], keys: Iterable[K]) -> Mapping[K, V]:
    found = {}
    missing = object()
    for k in keys:
        v = self.get(k, missing)
        if v is not missing:
            found[k] = v
    return type(self)(found)

This generalises to a .pop_many() on MutableMapping, and a .del_many(). .update() is already available as a bulk set/insert operation, and the constructor of most mappings takes a mapping. That then completes a bulk CRUD API.

Rationale

Filtering

get_many() in dicts is often useful (much like set intersection is useful). Filtering down a dict comes up frequently:

# Retrieve global config section from flat dict
global_config = conf.get_many(k for k in conf.keys() if k.startswith('global.'))
# Run subprocess with clean environment
subprocess.run(..., env=os.environ.get_many(['PATH', 'HOME', 'LANG', 'LC_ALL']))

Duck-typing with network services

The get_many operation offers an important performance optimisation for Mappings that are backed by network services. Requesting multiple keys at the same time can often be done in one query, reducing the operation from n round-trips to 1 round-trip. Memcached and Redis, for example, provide this operation.

redis = Redis(...)
print(redis.get_many(queries))    # fast, uses MGET operation

Services that do not support this operation might at least be able to use pipelining or concurrency to optimise a get_many(). The generic implementation would suffice for implementations where no performance improvement is possible.

A Redis driver could have its own method for this operation. The problem then is that this prevents duck-typing. A dict is not a suitable alternative implementation; indeed the implementation in a Redis library is likely to have a different name to one in an ElasticSearch library, so they cannot be exchanged for one another.


(Paul Moore) #2

{k: d[k] for k in keys} seems like a pretty readable way of writing your get_many, which works in Python now.


(Daniel Pope) #3

That’s not quite the same, as it raises KeyError for any key in keys not in d. This would be closer:

{k: d[k] for k in keys if k in d}