What do you think of JoinMapping: A class to join mappings with equal keys into one mapping?

NeilGirdhar · May 15, 2024, 7:50am

Background

I find that I sometimes need to iterate over multiple mappings having identical keys. Thus, I do something like:

def f(a: Mapping[Key, Value], b: Mapping[Key, Value], c: Mapping[Key, Value]) -> Any:
    for key, a_value in a.items():
        b_balue = b[key]
        c_balue = c[key]
        ...

This has one downside of not verifying that a, b, and c have identical keys. The strict flag was added to zip to do a similar verification because it’s extremely useful. This also has the downside of being slightly convoluted (iteration happens on a instead of on all of them, and the key lookups happen on only b and c.

Proposal

Add an itertool class that reflects the parallel structure and verifies equivalence of keys:

from collections.abc import Iterator, Mapping
from typing import TypeVar, override


K = TypeVar('K')
V = TypeVar('V')


class JoinMapping(Mapping[K, tuple[V, ...]]):
    def __init__(self, x: Mapping[K, V], /, *args: Mapping[K, V]):
        super().__init__()
        self.mappings = x, *args
        n = len(x)
        s = set(x)
        for y in args:
            if len(y) != n:
                raise ValueError
            if set(y) != s:
                raise ValueError

    @override
    def __getitem__(self, key: K) -> tuple[V, ...]:
        return tuple(mapping[key] for mapping in self.mappings)

    @override
    def __iter__(self) -> Iterator[K]:
        return iter(self.mappings[0])

    @override
    def __len__(self) -> int:
        return len(self.mappings[0])

This checks for equal lengths, iterates over one mapping, while indexing the other mappings. I initially thought of proposing this for Python, but I think if this were to be added, it would belong in more-itertools first.

This is a bit like ChainMap in the sense that it combines mappings, but instead of delegation of values, it creates tuples of keys.

Examples

JoinMappings(a, b, c)[x]  #  equivalent to (a[x], b[x], c[x])
JoinMappings(a, b, c).items()  # equivalent to ((k, (a[k], b[k], c[k])) for k in a)
JoinMappings(a, b, c).values()  # equivalent to ((a[k], b[k], c[k]) for k in a)
# etc.

Rosuav · May 15, 2024, 8:00am

It creates a weird asymmetry: if b or c has additional keys compared to a, they will be silently ignored, but if a has additional keys, they will cause errors. (Or if you prefer: missing keys in b or c will raise errors, but missing keys in a will not.)

Personally, I would just pre-check.

if a.keys() == b.keys() == c.keys():
    # all's well, go ahead
else:
    # nope nope nope

blhsing · May 15, 2024, 8:36am

Like ChainMap, for generalization I think it should allow:

Initialization with zero mapping, in which case the mappings list can be intialized as either [{}] as ChainMap does, or simply as an empty list, in which case __getitem__ can either:
- raise KeyError for any key, or,
- return an empty tuple for any key
An exposed, user-updatable maps attribute (which you already have as mappings but can probably be renamed for consistency with ChainMap).
A __setitem__ method that updates a given key for all mappings, where it can take either:
- a single value to applly to all mappings, or,
- a tuple of values to apply to respective mappings
  (obviously we have to decide on one of the two or it’d become ambiguous.)

And @Rosuav’s suggestion of a key equivalence check can be generalized with all_equal(map(methodcaller('keys'), maps)).

MegaIng · May 15, 2024, 9:09am

I think this is a bad idea, this should be two different methods (with the later being the one that takes the syntax directly). To quote the Zen of Python “refuse the temptation to guess”, in this case the function needs to guess if the user wanted to assign a tuple to all maps or unpack the tuple, especially if the tuple has the wrong length. Either behavior can lead to unexpected behavior or hide programming errors.

But otherwise I do agree that if this idea gets implemented, it should include this more general behavior (except maybe an exposed, writable maps attribute/property, readonly would be my preference).

But tbh, I am not using this pattern all that much, so i am not sure if it’s worth adding to the stdlib (but it also isn’t big enough for a PyPl package IMO, so suggesting that as a way to gauge interest is not going to be useful)

blhsing · May 15, 2024, 9:32am

Right, I was thinking we should pick one of the two behaviors and document it, but since both make sense in difference scenarios it may be worth not implementing __setitem__ at all and instead implement two separately named set methods.

I mentioned it because ChainMap allows it to be writeable. It’s sometimes useful to initialize an empty ChainMap and update it with actual mappings later.

I don’t think I’d be using this proposed class much either, and it seems fairly trivial to implement with a comprehension when I do need such a pattern.

kknechtel · May 15, 2024, 10:12pm

Following the ChainMap example, I think the right name for this is ZipMap.

I don’t think I would use it. Having data structured like this suggests to me that the code is missing a dataclass or similar. Or perhaps the values should get grouped eagerly ({k: tuple(m[k] for m in mappings]) for k in mappings[0]}).