Adding get_many() to Mapping

mauve · March 7, 2019, 2:43pm

Reading PEP-584 reminds me that a few times I have wished for a few bulk operators on dict (indeed, collections.abc.Mapping) - operations that PEP-584 doesn’t cover.

Might something like this fly as a PEP?

Proposal

d.get_many(keys) would return a new mapping that contains any items in d whose keys are in keys, equivalent to

K, V = map(TypeVar, 'KV')

def get_many(self: Mapping[K, V], keys: Iterable[K]) -> Mapping[K, V]:
    found = {}
    missing = object()
    for k in keys:
        v = self.get(k, missing)
        if v is not missing:
            found[k] = v
    return type(self)(found)

This generalises to a .pop_many() on MutableMapping, and a .del_many(). .update() is already available as a bulk set/insert operation, and the constructor of most mappings takes a mapping. That then completes a bulk CRUD API.

Rationale

Filtering

get_many() in dicts is often useful (much like set intersection is useful). Filtering down a dict comes up frequently:

# Retrieve global config section from flat dict
global_config = conf.get_many(k for k in conf.keys() if k.startswith('global.'))

# Run subprocess with clean environment
subprocess.run(..., env=os.environ.get_many(['PATH', 'HOME', 'LANG', 'LC_ALL']))

Duck-typing with network services

The get_many operation offers an important performance optimisation for Mappings that are backed by network services. Requesting multiple keys at the same time can often be done in one query, reducing the operation from n round-trips to 1 round-trip. Memcached and Redis, for example, provide this operation.

redis = Redis(...)
print(redis.get_many(queries))    # fast, uses MGET operation

Services that do not support this operation might at least be able to use pipelining or concurrency to optimise a get_many(). The generic implementation would suffice for implementations where no performance improvement is possible.

A Redis driver could have its own method for this operation. The problem then is that this prevents duck-typing. A dict is not a suitable alternative implementation; indeed the implementation in a Redis library is likely to have a different name to one in an ElasticSearch library, so they cannot be exchanged for one another.

pf_moore · March 7, 2019, 10:21pm

{k: d[k] for k in keys} seems like a pretty readable way of writing your get_many, which works in Python now.

mauve · March 18, 2019, 3:00pm

That’s not quite the same, as it raises KeyError for any key in keys not in d. This would be closer:

{k: d[k] for k in keys if k in d}

zuo · April 15, 2019, 1:19am

The last two posts show that it is not obvious at the first sight whether non-existent keys are ignored or cause an exception. Maybe intersection operator would be better…

apalala · April 15, 2019, 2:04am

An intersection operator between dict and set is part of the discussion on operators over dict for 3.8:

filtered_dict = d & {'b', 'd'}

mjpieters · April 15, 2019, 3:06pm

It’s more of an ‘select’ or ‘intersection’ operation. And because dictionary keys are unique, should the function perhaps accept a set of keys instead of an iterable?

Last, but not least, if you are going to name this .get_many() it should at least produce values for all selected keys, with a default if the key is missing from self, because that’s what .get() does. If that’s not the intention, don’t use the name .get_many().

At any rate, the selecting operation (ignoring missing keys) could be simplified using dictionary views:

def select(self, keys):
    return {self[k] for k in self.keys() & keys}

The & set intersection operator accepts an iterable for the right-hand-side operand so the issue of keys being any iterable or a set is moot.

The get_many() operation with a default value, would just use that default value in a .get() call:

def get_many(self, keys, default=None):
    get = self.get
    return {k: get(k, default) for k in keys}

Do you have a reference to the discussion on this? Is the idea for dictionaries to support the &, |, ^, + and - operators the way that sets do, but operate on the set of keys and return a new dictionary with the resulting keys and corresponding values?

apalala · April 15, 2019, 5:31pm

The current proposal is on PEP0584. The discussions happened on the python-ideas mailing list, so any mirror should carry them.

Regarding self.keys() & keys, it seems it currently has to be keys & self.keys().

mjpieters · April 15, 2019, 6:09pm

Ah, I did read that PEP already, but it currently only covers addition and subtraction. I see that the discussion got lost a bit in operator choices and such. We’ll see if there’s going to be any actual intersection operation!

I don’t quite follow; intersection is commutative: setA & setB produces the same set as setB & setA. The only reason to order the sets in a specific way is when the second operand is an iterable, not a set, so self.keys() & keys allows keys to be an arbitrary iterable.