User-defined slice-like objects, and slicing mappings

I agree that this would be very useful for Mapping types. I often need these operations:

  • Create a sub-dict.
  • Remove part of a dict.
  • Replace part of a dict.

…but I do not see why we would need the proposed __slice__() method for this. All this is achievable by extending the functionality of the __*item__() methods.

Avoid confusion by having very similar indices yield very different results. Lists, as a data-storage value like tuple, should be considered like any other such value, and only be disallowed as dict keys because it’s mutable and unhashable.
Slices and slice-like objects (if they exist) have the sole (or main) purpose of being keys, so there’s no confusion about what happens when indexing with them.

Do you mean that users should subclass the base types, or that Python should improve the builtin types ?
If the former, well yes that’s true but that yields heavy container-conversion cost when needing to often use these mechanics, typically for **kwargs or other contexts when accessing a dict. __dict__ is probably another example.
If the latter, then I agree : making the builtin __*item__ methods accept and use these __multi_index__ methods is what we’re discussing. If you have another syntax in mind, another way of extending the mappings’ behavior, it’s also good for me.

I actually agree with this. While it looks nice to say e.g.

d[[k1, k2, k3]] = v1, v2, v3

it means that if someone accidentally tries to use a variable as a key that happens to have a list, weird things will happen. E.g.

def func(): return (1, 2, 3)  # Immutable key
d[func()] = val

If someone were to change func to return a list instead, currently that would be an error because a list isn’t a valid key. But if we interpreted lists as multi-keys it would do something totally different (and probably produce a confusing error message).

So the less error-prone design would be to have a standard class to hold such keys. Let’s stipulate there’s a multikey() builtin that just takes a series of arguments. Then you’d have to write

d[multikey(k1, k2, k3)] = v1, v2, v3

and there would be much less opportunity for confusion (only someone intent on sabotage would change func() to return a multikey value.

Unfortunately this makes the design quite a bit more complicated. AFAIK numpy (and Pandas?) get away with using just plain lists for this purpose. Maybe we can learn from them that in practice the confusing thing of func() returning a list doesn’t happen often enough to be much of a trap? Or maybe the error message can be clear enough that users can figure it out?

It certainly would be simpler to specify if we just said “for all builtin sequences and mappings, if the key is a list, that means a multi-key and has the following behavior”.

2 Likes

I would really welcome if the builtin types are extended.

Yes the possible confusion may be a real concern. I do not think it would happen often (and static type checkers could help in some cases). I would be interested in opinion of more programmers about this.

I am afraid that the verbosity of multikey() can lower the readability in some cases.

There is also a possibility to consider allowing both list and a dedicated multikey class because multikey() can bring additional functionality - for example distinguish optional and required keys:

d1 = {'k1': 1, 'k2': 2}

d2 = d1[multikey('k2', 'k3')]
# by default keys are optional - similar to slicing
# Missing keys do not raise KeyError.

d3 = d1[multikey('k1', required=('k2', 'k3'))]
# KeyError : 'k3'

d3 = d1[opt_keys('k1'), req_keys('k2', 'k3')]
# KeyError : 'k3'
# Probably better option how to distinguish optional and required keys.
# Allows arbitrary ordering and mixing of the keys.

Another problem is that we should be allowed to list the keys directly as arguments to multikey() and also we should be able to create the multikey object from an iterable. Now we are back to the problem of the ambiguity between keys of iterable types and iterables of keys.

I am not sure which solution for this would be best. multikey.from_iter()? Similarly we can have multikey.req() and multikey.req_from_iter() for required keys?

If you want to use an iterable, you could just type multikey(*iterable) .