Alternative to create slice and tuple of slices with slice.__class_getitem__

Sometimes we want to use the same slice object with multiple objects, which needs to be explicitly created as:

s = slice(start, stop)

x[s], y[s]  # equivalent to x[start:stop], y[start:stop]

This becomes less readable for “multidimensional” objects, such as NumPy arrays:

s = (Ellipsis, slice(start, stop), slice(None))

x[s], y[s]  # equivalent to x[..., start:stop, :], y[..., start:stop, :]

My proposal is to construct these objects as:

s = slice[...,start:stop,:]

x[s], y[s]

which would involve adding a __class_getitem__ to slice:

class slice:
    def __class_getitem__(self, item):
        return item

Additionally, it would be easier to teach how to create these standalone slicers than the current way of explicitly creating a tuple of slices.

8 Likes

See https://docs.python.org/3/reference/datamodel.html#the-purpose-of-class-getitem, this is explictly against the purpose of __class_getitem__.

The purpose of __class_getitem__() is to allow runtime parameterization of standard-library generic classes in order to more easily apply type hints to these classes.

Using __class_getitem__() on any class for purposes other than type hinting is discouraged.

4 Likes

I like this idea, and it would in fact remove duplicate effort in at least two libraries.

Numpy has numpy.s_ and Pandas has pandas.IndexSlice.

The code for both is very simple, although it is so simple it might be argued not worth it.

I would think there’d be less pushback on this if slice was a singleton that was callable but it appears to be a class that you instantiate. If it was the former it would just be adding __getitem__ logic. I have a feeling there will be pushback just because we’re abusing the usecase of __class_getitem__ for something that is not typing.

Edit: Someone just brought up the great point of what would slice() return if it was a singleton…which perfectly explains why its a class.

2 Likes

An alternative way to implement this would be to allow the slice instance to be subscriptable. An additional change to make it cleaner would be to allow it to be called with no arguments, i.e. slice() is equivalent to [::].

So the example above could look like:

s = slice()[...,start:stop,:]  # note parentheses to instantiate a "default" slice

x[s], y[s]
3 Likes

This isn’t really specific to slice, though. Indexing is generically a neat way to pack a tuple.

>>> class Packer:
...     def __getitem__(self, item):
...         return item
... 
>>> pack = Packer()
>>> pack[1]
1
>>> pack[1, 2, 3]
(1, 2, 3)
>>> pack[..., 3:6, ...]
(Ellipsis, slice(3, 6, None), Ellipsis)
2 Likes

The problem is that these libs use nested tuples as keys.

For example, a pandas dataframe with a multi leveled index could key a row,col key such as (("a",1), "column") and worse, you could have a dataframe with a multi leveled columns too. Where a single cell is accessed a tuple containing two tuples.

Something like this

In [18]: df = pd.DataFrame(np.random.randn(3, 8), index=["A", "B", "C"], columns=index)

In [19]: df
Out[19]: 
first        bar                 baz  ...       foo       qux          
second       one       two       one  ...       two       one       two
A       0.895717  0.805244 -1.206412  ...  1.340309 -1.170299 -0.226169
B       0.410835  0.813850  0.132003  ... -1.187678  1.130127 -1.436737
C      -1.413681  1.607920  1.024180  ... -2.211372  0.974466 -2.006747

[3 rows x 8 columns]

In [20]: pd.DataFrame(np.random.randn(6, 6), index=index[:6], columns=index[:6])
Out[20]: 
first              bar                 baz                 foo          
second             one       two       one       two       one       two
first second                                                            
bar   one    -0.410001 -0.078638  0.545952 -1.219217 -1.226825  0.769804
      two    -1.281247 -0.727707 -0.121306 -0.097883  0.695775  0.341734
baz   one     0.959726 -1.110336 -0.619976  0.149748 -0.732339  0.687738
      two     0.176444  0.403310 -0.154951  0.301624 -2.179861 -1.369849
foo   one    -0.954208  1.462696 -1.743161 -0.826591 -0.345352  1.314232
      two     0.690579  0.995761  2.396780  0.014871  3.357427 -0.317441

Which is why pandas and numpy have their own helpers.

1 Like

It all was already discussed before: Add operator.subscript as a convenience for creating slices · Issue #68567 · python/cpython · GitHub

I’ve thrown around ideas like this before and people seem to think it’s too confusing because of how it interacts with the expected default semantics for negative indices.

I agree that __class_getitem__ isn’t intended for this sort of purpose. On the one hand, slice fundamentally isn’t a parameterizable type, so OP’s suggestion can’t realistically step on any toes. On the other hand, it creates a definite wart - “special cases aren’t special enough to break the rules”.

Here’s another workaround:

from functools import cache

class slice:
    class _maker:
        def __getitem__(self, s):
            if not isinstance(s, slice):
                raise TypeError("invalid slice")
            return s


    @cache
    @property
    def like(self):
        return _maker()

As a side note for OP: ... is valid syntax on its own; it isn’t necessary to type out the name Ellipsis.

Edit: looks like I more or less reinvented the original idea from the previous issue-tracker discussion. It’s not entirely clear to me why people ended up -1 on the idea, except for a release crunch which to me doesn’t seem like a reason to forget about an idea.

1 Like

Since both slice(1, 2) and slice('a', 'b') are valid, maybe there are ways to see it as a parameterizable type? It is more similar to a triplet (3-tuple) in terms of types than anything else, just by allowing arbitrary values.

Is slice('a', 'b') really valid? I can construct it, but in what context does it actually work?

Never seen it in action, but in theory, an ISAM database could be sliced that way. You index your way into it at the start position, then return records until you reach/exceed the end position. It’s certainly not something I’d expect to see a lot of, but I wouldn’t be surprised if it’s out there somewhere.

3 Likes

A pandas dataframe with columns a, b, c, d, e. df.loc[:, slice('a', 'c')] gives you the three first columns. It demonstrates how slices have their semantics defined by the library using them (here end-inclusive, which is not like how slices work with lists).

I would also imagine an ordered map with string keys, then 'a':'b' would be a natural way to get a view of the part of the map that have keys between ‘a’ and ‘b’ in the key space.

7 Likes

The proposed behavior can currently be achieved already at the instance level, without using __class_getitem__:

class Slicer:
    def __getitem__(self, index):
        return index

slicer = Slicer()
s = slicer[1:4]
print(range(10)[s]) # outputs range(1, 4)

The main issue I’ve had with a “Slicer” class is that it feels like this should belong to slice itself. But I’ve finally landed on a solution I like that lets you create something that looks a whole lot like a Slice subclass:

class _SliceMeta(type):
    """
    Normally you cannot inherit from slice, but we can use a metaclass to
    overload the constructor of our Slice class to return a real slice.
    """
    def __call__(cls, *args):
        # Allow for empty constructor
        if len(args) == 0:
            return slice(None)
        # When the class is called (instantiated), return a real slice instead
        return slice(*args)

    def __instancecheck__(self, instance):
        # https://peps.python.org/pep-3119/
        return isinstance(instance, slice)

    def __getitem__(self, index):
        # __class_getitem__ was introduced in Python 3.7
        # https://peps.python.org/pep-0560/
        return index


class Slice(metaclass=_SliceMeta):
    """
    A "subclass" of :class:`slice` with convinience features.

    Namely you can create a slice with class getitem syntax.

    The motiviation is creating explicit slice objects is cumbersome, and this
    provides a more concise notation. Consider the example

    .. code:: python

        import kwarray

        # This is the standard way to create multi dimensional index
        sl = (Ellipsis, slice(0, 10), slice(0, 10), slice(None))
        arr = arr[sl]

        # And this is equivalent
        sl = kwarray.Slice[..., 0:10, 0:10, :]
        arr = arr[sl]

    Note:
        This is not a real subclass of slice, but by using metaclasses it
        behaves like one in most circumstances. The instances created by using
        the constructor or convinience methods are real slice instances (or
        tuples of real slice instances).

    A similar idea has been proposed and rejected several times to core Python
    [CPythonIssue68567]_ [PythonDiscuss30316]_.

    References:
        .. [PythonDiscuss30316] https://discuss.python.org/t/alternative-to-create-slice-and-tuple-of-slices-with-slice-class-getitem/30316/6
        .. [CPythonIssue68567] https://github.com/python/cpython/issues/68567

    Example:
        >>> import kwarray
        >>> kwarray.Slice[::3]
        slice(None, None, 3)
        >>> kwarray.Slice[0:10, 3:11]
        (slice(0, 10, None), slice(3, 11, None))

    Example:
        >>> from kwarray.util_slices import Slice
        >>> assert Slice() == slice(None, None, None)
        >>> assert Slice[0:3] == slice(0, 3, None)
        >>> assert Slice[0:3, 0:5] == (slice(0, 3, None), slice(0, 5, None))
        >>> assert Slice[..., 0:3, 0:5] == (slice(0, 3, None), slice(0, 5, None))
        >>> assert Slice(0, 3) == (slice(0, 3, None), slice(0, 5, None))
        >>> assert isinstance(Slice(), Slice)
        >>> assert not issubclass(Slice, slice), 'no way to make this work AFAIK'
    """
    def __init__(self, *args):
        raise AssertionError('It should not be possible to construct a real instance of this class')

more_itertools.islice_extended does something similar to what you’re proposing; however, it implements __getitem__ instead of __class_getitem__.