Alternative to create slice and tuple of slices with slice.__class_getitem__

Sometimes we want to use the same slice object with multiple objects, which needs to be explicitly created as:

s = slice(start, stop)

x[s], y[s]  # equivalent to x[start:stop], y[start:stop]

This becomes less readable for “multidimensional” objects, such as NumPy arrays:

s = (Ellipsis, slice(start, stop), slice(None))

x[s], y[s]  # equivalent to x[..., start:stop, :], y[..., start:stop, :]

My proposal is to construct these objects as:

s = slice[...,start:stop,:]

x[s], y[s]

which would involve adding a __class_getitem__ to slice:

class slice:
    def __class_getitem__(self, item):
        return item

Additionally, it would be easier to teach how to create these standalone slicers than the current way of explicitly creating a tuple of slices.

6 Likes

See https://docs.python.org/3/reference/datamodel.html#the-purpose-of-class-getitem, this is explictly against the purpose of __class_getitem__.

The purpose of __class_getitem__() is to allow runtime parameterization of standard-library generic classes in order to more easily apply type hints to these classes.

Using __class_getitem__() on any class for purposes other than type hinting is discouraged.

1 Like

I like this idea, and it would in fact remove duplicate effort in at least two libraries.

Numpy has numpy.s_ and Pandas has pandas.IndexSlice.

The code for both is very simple, although it is so simple it might be argued not worth it.

I would think there’d be less pushback on this if slice was a singleton that was callable but it appears to be a class that you instantiate. If it was the former it would just be adding __getitem__ logic. I have a feeling there will be pushback just because we’re abusing the usecase of __class_getitem__ for something that is not typing.

Edit: Someone just brought up the great point of what would slice() return if it was a singleton…which perfectly explains why its a class.

1 Like

An alternative way to implement this would be to allow the slice instance to be subscriptable. An additional change to make it cleaner would be to allow it to be called with no arguments, i.e. slice() is equivalent to [::].

So the example above could look like:

s = slice()[...,start:stop,:]  # note parentheses to instantiate a "default" slice

x[s], y[s]
1 Like

This isn’t really specific to slice, though. Indexing is generically a neat way to pack a tuple.

>>> class Packer:
...     def __getitem__(self, item):
...         return item
... 
>>> pack = Packer()
>>> pack[1]
1
>>> pack[1, 2, 3]
(1, 2, 3)
>>> pack[..., 3:6, ...]
(Ellipsis, slice(3, 6, None), Ellipsis)

The problem is that these libs use nested tuples as keys.

For example, a pandas dataframe with a multi leveled index could key a row,col key such as (("a",1), "column") and worse, you could have a dataframe with a multi leveled columns too. Where a single cell is accessed a tuple containing two tuples.

Something like this

In [18]: df = pd.DataFrame(np.random.randn(3, 8), index=["A", "B", "C"], columns=index)

In [19]: df
Out[19]: 
first        bar                 baz  ...       foo       qux          
second       one       two       one  ...       two       one       two
A       0.895717  0.805244 -1.206412  ...  1.340309 -1.170299 -0.226169
B       0.410835  0.813850  0.132003  ... -1.187678  1.130127 -1.436737
C      -1.413681  1.607920  1.024180  ... -2.211372  0.974466 -2.006747

[3 rows x 8 columns]

In [20]: pd.DataFrame(np.random.randn(6, 6), index=index[:6], columns=index[:6])
Out[20]: 
first              bar                 baz                 foo          
second             one       two       one       two       one       two
first second                                                            
bar   one    -0.410001 -0.078638  0.545952 -1.219217 -1.226825  0.769804
      two    -1.281247 -0.727707 -0.121306 -0.097883  0.695775  0.341734
baz   one     0.959726 -1.110336 -0.619976  0.149748 -0.732339  0.687738
      two     0.176444  0.403310 -0.154951  0.301624 -2.179861 -1.369849
foo   one    -0.954208  1.462696 -1.743161 -0.826591 -0.345352  1.314232
      two     0.690579  0.995761  2.396780  0.014871  3.357427 -0.317441

Which is why pandas and numpy have their own helpers.

1 Like

It all was already discussed before: Add operator.subscript as a convenience for creating slices · Issue #68567 · python/cpython · GitHub

I’ve thrown around ideas like this before and people seem to think it’s too confusing because of how it interacts with the expected default semantics for negative indices.

I agree that __class_getitem__ isn’t intended for this sort of purpose. On the one hand, slice fundamentally isn’t a parameterizable type, so OP’s suggestion can’t realistically step on any toes. On the other hand, it creates a definite wart - “special cases aren’t special enough to break the rules”.

Here’s another workaround:

from functools import cache

class slice:
    class _maker:
        def __getitem__(self, s):
            if not isinstance(s, slice):
                raise TypeError("invalid slice")
            return s


    @cache
    @property
    def like(self):
        return _maker()

As a side note for OP: ... is valid syntax on its own; it isn’t necessary to type out the name Ellipsis.

Edit: looks like I more or less reinvented the original idea from the previous issue-tracker discussion. It’s not entirely clear to me why people ended up -1 on the idea, except for a release crunch which to me doesn’t seem like a reason to forget about an idea.

1 Like

Since both slice(1, 2) and slice('a', 'b') are valid, maybe there are ways to see it as a parameterizable type? It is more similar to a triplet (3-tuple) in terms of types than anything else, just by allowing arbitrary values.

Is slice('a', 'b') really valid? I can construct it, but in what context does it actually work?

Never seen it in action, but in theory, an ISAM database could be sliced that way. You index your way into it at the start position, then return records until you reach/exceed the end position. It’s certainly not something I’d expect to see a lot of, but I wouldn’t be surprised if it’s out there somewhere.

2 Likes

A pandas dataframe with columns a, b, c, d, e. df.loc[:, slice('a', 'c')] gives you the three first columns. It demonstrates how slices have their semantics defined by the library using them (here end-inclusive, which is not like how slices work with lists).

I would also imagine an ordered map with string keys, then 'a':'b' would be a natural way to get a view of the part of the map that have keys between ‘a’ and ‘b’ in the key space.

5 Likes