User-defined slice-like objects, and slicing mappings

When indexing a list, it’s useful to get a subset of the elements of the list, and that’s what slices are for. But while the indices requested by the slice object follow useful rules matching the most common uses, it could sometimes be useful to program alternative rules for the generation of these indices.
A typical use would be the emulation of passing lists or other arrays as indices to numpy array. When you do arr[[1, 5, 4, 18]] (notice the double square brackets), you’re requesting another array made of the first, fifth, fourth and eighteenth element of the initial array. This irregular sequence of indices would not be doable using a slice.
Enabling this opens another door : slice-like behavior for mappings. If my slice-like object returns a sequence of keys, I get in return a dict made of these keys and of their values in the original dict.

You’re probably thinking “that could and should be done in a user-defined function, or using comprehension syntax such as [arr[k] for k in [1, 5, 4, 18]]”.
That’s true, but it only works for the __getitem__ side of slicing, not for the __setitem__ side ! I would like to be able to do this as well indexable[myslicelike] = some_sequence. With lists, but also with dicts.

I’m proposing to create a new __slice__ method in the datamodel, which would return a sequence or iterable. When indexing a list or a dict (or a tuple, in a getitem situation), if the given index object is not a valid integer (for sequences) or key (for dicts), the __slice__ method is called on the index to obtain a sequence of indices. And then, the usual setitem, getitem or delitem behavior takes place.

The arguments to pass to the method are debatable : the vanilla slice needs to know the size of the sequence, but other information could be useful (for dicts, that would be the set of keys typically). And at the same time, passing the object itself could allow the method to mutate it, which is not ideal. We could always state it’s not meant to be done, like mutating a list that’s being sorted… I have no definitive view on that question, so please target other aspects of the idea for initial critics :sweat_smile:

Traditionally, indexing has been viewed as under the control of the container. This proposal would effectively invert it so it’s under the control of the key passed into e.g. __setitem__. It seems like what you specifically want is more containers to understand iterables as keys. In that case I don’t think you need a new magic method to have objects act differently when used as a key, but instead get containers to accept an arbitrary iterable and know how to work with that. And since containers would have to be updated to use your proposed method anyway, I don’t think adding a new magic method changes things that much.

But if you want to try and pursue this, 3. Data model — Python 3.10.5 documentation is the closest to what you’re talking about. So I would look at the discussion around __index__ and see if that can help you make a case.

I know some of this can be done by subclassing and expanding the *item__ methods of a new container type, but in that case if you want to use the new features you need to convert any and all indexable you have (sequences and mappings) to the new types. And although I admit the liberal indexing of sequences is a bit of a gadget feature, which would be useful but isn’t really required, slicing dicts should become a thing, with or without this particular syntax.
Also, I know that the current way python works is by relying on containers to interpret the indices how they want… but slices do exist, natively. My proposition only expands that feature.
Having containers accept an arbitrary iterable is not possible without shadowing currently supported behavior, which is tuples (and frozensets) as valid dict keys. And having list and tuples behave in completely different ways as indices would be confusing and a bad design idea. That’s why I went in the direction of a new dunder method.

In addition to that, you may have a use for a way of slicing (or extracting indices and values) that’s specific to a program or a part of a program. Having one, potentially lightweight slicer class to implement that behavior allows it to be easily created and passed back and forth between big containers. That way, a part of a program that slices in a particular fashion can exchange containers made in a common and constant type (ideally builtin) with another part of the program that slices them differently.
On the contrary, if the slicing behavior is implemented in the container type(s), the contained data would have to be converted back and forth between several containers, each time creating a new large object and wasting copying time.

I don’t understand what this has to do with __index__ though. Is there a past discussion about it on these forums that’s related with this ?

In other words, you want to use a list as a data selector in subscription calls (get-, set- and del-item dunders).

For the getting side of things, itertools calls this compress.

Any class could define their __getitem__ etc methods so that their getters, setters and deleters support a list argument. It is legal syntax right now:

d[[1, 2, 'k']] = [100, 200, 'ok']

it just fails for builtin types.

Essentially, I think, you are asking for Python to bless the interpretation used by numpy: a single list argument to subscripting is equivalent to subscripting with each of the list elements.

obj[alist]          # like [obj[k] for k in alist]
del obj[alist]      # like for k in alist: del obj[k]
obj[alist] = blist  # like for k,v in zip(alist, blist): obj[k] = v

This seems like a reasonable request. It is useful; it requires no new syntax; it is backwards-compatible for builtins; and a very large and important third-party library (numpy) already does this.

The downsides are:

  • increases the complexity of the builtins;
  • may interfere with user-defined classes that already accept lists but do something else;
  • redundant: we can already do that as a one-liner.

I don’t think that these are overwhelming negative arguments:

  • the complexity is probably managable;
  • this is the obvious interpretation for list subscripts, so it is unlikely that user-defined classes do something different;
  • but if they do, they will continue to work the way they do now (no breakage).

As for the redundancy aspect, yes, this would be syntactic sugar, not new functionality. But the point of syntactic sugar is to make it easier to do common tasks that otherwise are trickier to get right, and this is both common and sometimes tricky. E.g. the observant among you may have realised that del obj[alist] is not actually equivalent to the naive for k in alist: del obj[k].

obj = list(range(10))
for k in [5, 3, 7]:
    del obj[k]

print(obj)  # expect [0, 1, 2, 4, 6, 8, 9]

But the actual result is [0, 1, 2, 4, 6, 7, 8]. So for deletions at least, there is a trap in the obvious one-liner implementation.

I think this deserves a PEP, and I would hope that it has a good chance of succeeding.

I think that, maybe, the only change I would consider is using a tuple instead of a list. That would allow the indicies to be slices:

items[1:2,2:5,3]

passes the tuple (slice(1, 2, None), slice(2, 5, None), 3) to __getitem__.

The downside of that is that supporting nested slices in your subscripts increases the complexity significantly.

1 Like

Oh, further thought comes to mind…

There is an ambiguity when “slicing” dicts. Should it return a new dict or just the requested values?

Given d = {1: 'a', 2: 'b', 3: 'c'}, it is ambiguous whether d[[1, 3]] should return just the values ['a', 'c'] or the subdict {'1': 'a', 3: 'c'}. Unless there is a strong argument to be made in favour of one or the other, that argues against extending slicing to mappings.

2 Likes

I would really welcome this extended item selection. I often use this kind operations on dictionaries from a JSON API or configuration structures. This syntax could make the code more readable.

Yes, that could be useful for Sequence types but tuple is problematic for Mapping types, as @Gouvernathor already pointed out. Before I investigated this, I did not even know that this is possible (and for our use-case unfortunate):

d = {}
d[1,2] = 'x'
print(d)
{(1, 2): 'x'}

If there is an acceptable way how to make the dunder methods differentiate between d[1,2] and d[(1, 2)] it could be a way but I am skeptical.

To be useful it should return a new dict. Returning just values would be too limiting. To get the values we can use d[[1, 3]].values(). It is also logical:

assert type(l) == list and type(d) == dict
l[1]       # returns value (item of the list)
l[1, 3]    # returns list
l[[1, 3]]  # alternative of the above
d[1]       # returns value (similar to an item of a list)
d[[1, 3]]  # returns dict

Similar problem is assignment which should accept both Sequence and Mapping types on the right side:

d[['key1', 'key3']] = (7, 8)                  # replaces the values
d[['key1', 'key3']] = {'key6': 7, 'key8': 8}  # replaces both keys and values
d[['key1', 'key3']] = {'key6': 7}             # replaces two keys by a single key with a value

I think a large number of numerical programmers would get confused by this one, as it’s already used by NumPy etc to mean a multi-dimensional (2 in this example) index. I think l[[1, 2]] is fine.


I think this example demonstrates why returning dicts is not ideal: the behaviour requires some thought as to what is happening, and not immediately obvious. Especially line 2: which keys are replacing which?

Instead, returning and manipulating as lists is simpler:

d = {"a": 1, "b": 2, "c": 3, "d": 4}
a, b = d[["a", "b"]]
del d[["b", "c"]]
d[["d", "e"]] = (6, 7)
# d: {"a": 1, "d": 6, "e": 7}

This also fits into my idea of what arbitrary indexing is in NumPy: the return value has the same shape as the input index.

@steven.daprano

In other words, you want to use a list as a data selector in subscription calls (get-, set- and del-item dunders).

Yes and no. Not necessarily a list, any finite iterable would do the trick.

Any class could define their __getitem__ etc methods so that their getters, setters and deleters support a list argument.

Yes, that’s what I was discussing in my first message, why customizing subclasses wasn’t enough, and indexing lists with the syntax you’re showing is not what I’m proposing. In fact, I specifically said the opposite :

And having list and tuples behave in completely different ways as indices would be confusing and a bad design idea. That’s why I went in the direction of a new dunder method.

So :

Essentially, I think, you are asking for Python to bless the interpretation used by numpy

No.
Your proposition of importing numpy-syntax may be considered, and it’s not an insane or outrageous proposition, but I want to make clear its not mine.

@vbrozik

If there is an acceptable way how to make the dunder methods differentiate between d[1,2] and d[(1, 2)] it could be a way but I am skeptical.

That would be massively backwards-incompatible, so even though it could be cool, I don’t think it has any chance to see the day.

I agree that the multiple getitem should return a dict, for the reasons you stated. The delitem case doesn’t pose any design question - we delete the entries with the given keys, that’s it.
However for the setitem, I disagree that it should take a dict. IMO it should take an iterable whose length is equal to the number of given keys.
In a nutshell, I would have @vbrozik’s getitem with @EpicWink’s setitem.

Any finite iterable won’t do the trick, because some finite iterables are already valid keys as they are:

mydict[(2, 4, 5)]

That already uses the key (2, 4, 5) so it cannot be used as a data selector.

I completely disagree that having tuples and lists behave different is a bad idea. They already behave differently.

mydict[[1, 2]]

mydict[(1, 2)]

are not the same thing.

“Your proposition of importing numpy-syntax may be considered, and it’s not an insane or outrageous proposition, but I want to make clear its not mine.”

Okay, now I’m confused. It sure looks like you want to use the same numpy syntax. Isn’t it your suggestion that builtin types support list arguments for subscripting, just as numpy does?

Given the numpy array:

arr = np.array(range(10))*100

arr[[5, 2, 8, 7]] selects the items in positions 5, 2, 8 and 7, returning the array array([500, 200, 800, 700]). I thought that this was your proposal to support this for sequences and mappings, including for setting and deleting items:

# Given the list

L = [0, 100, 200, 300, 400, 500]



L[[4, 2]]  # should return [400, 200]



del L[[3, 5]]  # should result in L = [0, 100, 200, 400]



L[[0, 3]] = [1, 2]  # should result in L = [1, 100, 200, 2]

And similarly for other sequences, and dicts.

If that is not your proposal, then I don’t understand what you actually want!

Introducing a new dunder method is not workable. We only have one syntax for subscripting

obj[...]

and it will call the same dunder regardless of what is in the subscript.

Back in the ancient days of Python 1.5, subscripting with a slice object would call the dunder __getslice__. The problem was that if you did this:

x = slice(1, 10)

obj[x]

it would call __getitem__ instead. So the __getslice__ method was deprecated and we ended up with the situation we have now.

The same problem would occur with your extra dunder. It is a requirement that the programmer must be able to pull out the data selector list out of the subscript:

indices = [7, 9, 2, 6]

obj[indices]



# must be exactly equivalent to

obj[[7, 9, 2, 6]]

without changing the meaning of the code. Since the interpreter cannot tell what indices is until runtime, both calls must use the same dunder.

We only have one syntax for subscripting […] and it will call the same dunder regardless of what is in the subscript.

Read my first message, basically. The thing about __slice__, which is a method of the index object, called by the __*item__ methods of the indexable object. That’s always been my proposition.
Numpy’s behavior is an example, but as I said I consider it bad design and prone to errors due to the proximity between tuples and lists, and between sets and frozensets in the case of indexing dicts (they even compare as equals). If you want an equivalency with numpy’s syntax, here it is :

sli = object()
sli.__slice__ = (lambda *args:[1, 3, 5, 8])
indexable[sli] <=> nparray[[1, 3, 5, 8]]

The difference is that, as with the slice() object which adapts the set of indices it requests based upon the size of the indexable, the __slice__ can adapt the indices it requests based upon the indexable, passed in some form as an argument to the method (the specifics still need to be ironed out for what arguments would be passed to __slice__).

Okay, so your initial message has this:

You’re proposing changes to the __getitem__, __setitem__ and __delitem__ of list and dict (and presumably other [Mutable]Sequence and [Mutable]Mapping types). Let me take list.__setitem__ as an example.

The current behavior of that is close to the following:

def __setitem__(self, index, value):
    if IsIndex(index):
        index = int(index)
        if index < 0:
            index+= len(self)
        if index < 0 or index >= len(self):
            raise IndexError
        self.__items[index] = value
    elif isinstance(index, slice):
        # Ignore reverse slices for now, they complicate things
        if index.step not in (None, 1):
            raise ValueError
        start = min(max(0, index.start), len(self)
        stop = min(max(0, index.stop), len(self)
        self.__items[start : stop] = value
    else:
        raise TypeError

It looks like you’d like to replace the elif isinstance(index, slice) block with something that checks for a __slice__ method, calls it, and uses the indexes it produces to assign items from values, roughly like this:

    elif hasattr(index, "__slice__"):
        values = iter(value)
        for i in index.__slice__(...):
            try:
                v = next(values)
                self.__items[i] = v
            except StopIteration:
                del self.__items[i]  # Not quite, see below

except that the behavior if there aren’t the same number of values as there are indexes is much more complicated, and that’s where things get interesting. Currently, list slicing for a positive stride behaves roughly like this: a[i:j] = v computes a[:i] + list(v) + a[j:] and assigns the resulting list to the original list object, in place (I’m ignoring negative strides because they complicate the description but don’t add anything extra to the explanation, and similar for i > j).

But what should the behavior be if we write a[[1, 3, 5]] = "abcd"? Or a[[1, 3, 5]] = "ab"? Numpy requires that source and target slice have the same shape, but lists don’t require that. Without a decent answer for that (which doesn’t involve "special-case the slice type) we have a problem.

1 Like

I’m sorry, I don’t want to get bogged down in the discussion of your preferred implementation and whether it uses a new dunder or not until I understand the interface and semantics.

I have read your initial post three times now and I still don’t understand how it differs from the numpy use of a list as data selectors.

Numpy takes a list of indices to select the items at those indices. Isn’t that what you want too?

Ignoring the implementation, which is the least important part of your proposal, what does this code snippet output?

alist = [0, 10, 20, 30, 40, 50, 60]

print(alist[[5, 2, 6]])

alist[[5, 2, 6]] = [555, 222, 666]

print(alist)

del alist[[5, 2, 6]]

print(alist)

I expect this will print:

[500, 200, 600]

[0, 10, 222, 30, 40, 555, 666]

[0, 10, 30, 40]

Is that what you want? If not, what do you want the code to do?

If we can’t agree on what the feature does, there is no point in arguing about how to do it.

In my world (let’s simplyfy the semantics), you can’t give a list as an index. Period. The examples you give are TypeErrors, as they are now.
Imaging this slice equivalent :

class myslice():
    def __slice__(self, ...):
        ln # length of the object being indexed on
        # imagine it's been computed from the parameters
        rv = []
        for i in range(ln):
            if i**2 >= ln:
                break
            rv.append(i**2)
        return rv

a = [i+10 for i in range(16)]
print(a[myslice()])
# myslice.__slice__ returns [0, 1, 4, 9]
# it prints [10, 11, 14, 19]

For your example, it’s something like this:

class constantslice:
    def __init__(self, itble):
        self.__slice__ = (lambda *args:itble)
        # unorthodox, but well

alist = [0, 10, 20, 30, 40, 50, 60]
print(alist[constantslice([5, 2, 6])])
# [50, 20, 60]

alist[constantslice([5, 2, 6])] = [555, 222, 666]
print(alist)
# [0, 10, 222, 30, 40, 555, 666]

del alist[constantslice([5, 2, 6])]
print(alist)
# [0, 10, 30, 40]

I don’t know how I can do more clear than this.

@guido
That’s a very good point. I hadn’t thought about that. I assumed setitem + slice left the indices of the start and end sequence the same (for untouched values, typically a[j:] in your example), but it’s not the case.
I’ll have to think about that, but finding a programattic equivalent would be very hard, especially since I don’t know what it does when you have a step parameter in your too-short or too-long slice. I will explore that.

In the meantime, how about if that new slicing behavior only applied for mappings ? The problem you’re pointing out applies to sequences due to the very nature of a successive sequence of values, but in an unordered thing like a dict, it would still work, right ?
(maybe __slice__ wouldn’t be a good name then, but that’s a detail)

Can you elaborate on this?

Numpy requires that source and target slice have the same shape, but lists don’t require that.

In numpy you can assign a scalar to multiple indices, am I misunderstanding you?

import numpy as np

arr = np.array(["mel", "mel", "mel", "mel"], dtype="U3")
arr
arr[[0,1,2]] = "gvr"
arr

typed from phone apologize for syntax error, but that will assign "gvr" to positions 1, 2, and 3

If I’m not misunderstanding you, I would expect list and dict item setting to be the same. I think I get the point you are making, what if the assignment wasn’t a string (which gets special cases) and it was some other iterable.

Oh. Earlier I suggested “you want to use a list as a data selector in subscription calls (get-, set- and del-item dunders)” and you replied:

We can’t use tuples or strings or bytes, because they already have meaning as dict keys. You no longer want to use lists as numpy does. That limits your options somewhat.

Sorry, my initial enthusiasm for your suggestion is now rapidly waning.

I still think that using lists as numpy-style data selectors is a reasonable idea (although some of the details need to be sorted out), but I don’t think your __slice__ implementation is practical or useful. I think it is over-engineered.

Your example a[myslice()] to select four items requires the caller to write a class (nine lines of code). Or with numpy-style data selectors:

a[[0, 1, 4, 9]]

# Alternatively:
a[[isq for i in range(len(a)) if (isq:=i**2) < len(a)]]

An inplace expression versus a nine-line class. I know which one I would prefer to write.

I’m a little sad that we don’t have a “while” variant of list comps, so we can break out of the comprehension early, but if that is an issue there’s always the itertools solution:

from itertools import takewhile
a[list(takewhile(lambda n: n < len(a), (i**2 for i in range(len(a)))))]

The point is, we shouldn’t need to write an entire class just to give a few indices, when we could just give the indices directly.

@steven.daprano
slice is an entire class built (roughly) around the same concept as range when it comes to generating indices. What I’m proposing, see the title of the thread, is user-defined slice objects (and classes). Makes sense ?
“Any finite iterable”, like I said just after that quote, is to be returned by the __slice__ method, not used as index.

@Melendowski
Yes, there’s the concept of broadcastable arrays, where iirc when scanning the dimensions of the two arrays from right to left you need to have either the same value, 1, or extraneous values.
But the details aren’t important : in our case, the point is you can’t do arr[3:5] = [1, 2, 3, 4, 5, 6] in numpy, while in python you can.

@guido
According to this, three-element slice indexing is documented and restricted for sequences of the same length. So, maybe this could be an equivalent of just the three-argument slice, leaving the two-argument slice as builtin-only ?
I’m less sure about it now, maybe only slicing mappings as a start would be a better idea.

@Gouvernathor

Ah, I did not pick up on the indices being out of bounds, thank you.

That could work, but:

  • It would add more special cases, not generalize anything
  • The name “slice” seems inappropriate, maybe use something like __multi_index__
  • The fact that every sequence type has to implement this anew is definitely a weakness of the proposal (though this is shared with general slice support)

That seems veering very far from your original idea (which started with “When indexing a list, …”). It also would be a shame, since you are so clearly inspired by numpy array multi-indexing.

That

Well, it would generalize some uses of slice. And to be honest, they’re the most common uses. I mean, having 1 as the step is very common, but changing the length of a list with a slice setitem is not.

I’m only thinking about list and tuple, maybe the other containers from collections like deque, are there others I didn’t think about ? As you said, slice support is not generally added whe people create their own __*item__ methods, so they can in the same way keep lacking multi-indexing support (multi-index is a good name btw).

Regardless of this, I hold slicing (or multi-indexing) dicts to be more important than generalizing sequence slicing. I first mentioned lists to give a context, because they’re the main type that supports slices, but my main focus is dicts.

I don’t think you have justified why there is any need for (what seems to me to be) an over-engineered, overly complex solution to the problem of giving multiple indices at once.

The easiest way to specify a sequence of indices (or keys) is to specify a sequence of indices, not to create a class with a dunder method which then supplies a sequence of indices.

What can your “slice” class with its __slice__ dunder do that can’t be done more easily using a list or list comprehension?