`itemgetter` split into 2 objects

dg-pb · October 24, 2024, 6:31am

Following feedback from Itemgetter `always_tuple` keyword arg - #8 by dg-pb I thought I will propose this alternative.

It would straighten things out eliminating current complexity that comes from the attempt to bundle 2 return types into 1 object.

There is no big complexity issue as of now, but extensions will make things unnecessarily messy. E.g. Allowing missing item or attributes in operator's itemgetter and attrgetter - #15 by dg-pb

Furthermore, current implementation does 1 item well, but is ambiguous when is used programatically with unknown number of items where the user needs to explicitly check how many items are provided to args to know whether it is going to be tuple of values or just 1 value.

Introduction of extra keyword argument is fairly messy and having it all in 1 object further increases complexity.

It would break backwards compatibility, but it might just be worth it.

So it would simple be:

class itemgetter:
    def __init__(self, item, /):
        self._item = item

    def __call__(self, obj, /):
        return obj[self.item]


class itemsgetter:
    def __init__(self, *items):
        self._items = items

    def __call__(self, obj, /):
        return tuple(obj[i] for i in self.items)

Could also leave itemgetter for multiple values and name single item getter singleitemgetter.

Are there any use cases, where current behaviour is desired? I.e. There is unknown number of items and return needs to be single value if n == 1 and tuple(values) if n > 1.

Daverball · October 24, 2024, 6:42am

I like this design, but it needs a transition plan, due to breaking backwards compatibility.

A nice way to transition to this design would be to add a __new__ to itemgetter which returns you an instance of itemsgetter, when you pass it more than one item, thereby emulating the old behavior of itemgetter.

That way you can emit a DeprecationWarning either immediately or in some future version when calling itemgetter with multiple arguments and can postpone making it a breaking change either indefinitely or at least a long time, without making it a huge maintenance burden.

Dutcho · October 24, 2024, 8:50am

Ah, yes, itemgetter is a class acting as a function.

Then likely itemsgetter (plural) needs to subclass itemgetter for backward compatibility, correct?
Not that I know of occurrences of isinstance(obj, itemgetter), but it’s bound to exist somewhere.

Rosuav · October 24, 2024, 9:00am

Why does itemgetter need to have all of this complexity added? If you need something that’s “like itemgetter, except…”, it’s not that hard to write your own, doing precisely what you need it to.

blhsing · October 24, 2024, 9:04am

The design would be fine if it were like that since day 1. But since the ship has sailed, I don’t quite see why we need to touch the current itemgetter at all if we are to add a new function.

dg-pb · October 24, 2024, 9:12am

Otherwise, extensions such as defaults will have unnecessarily complex implementation.

@Daverball’s idea could be a happy middle here.
The need for subclassing is a bit annoying, but still worth it IMO.

blhsing · October 24, 2024, 9:18am

I don’t quite see how the defaults feature can complicate the implementation, as the new itemgetter can simply be reimplemented as a wrapper of itemsgetter that unpacks the returning tuple, or itemsgetter can be implemented as a wrapper of itemgetter with a (maybe undocumented/private) unwrap flag, or maybe both of them can be implemented as a wrapper of a core function with this flag (so that the flag wouldn’t be exposed).

dg-pb · October 24, 2024, 9:47am

All of these are valid options. Each with their pros and cons.

itemgetter as a wrapper of itemsgetter creates unnecessary tuple.
having both in same object results in unnecessary complexity when adding new features

This could work (as per suggestions above):

class itemgetter:
    def __new__(cls, item, /, *items):
        if items:
            # DEPRECATION WARNING HERE
            return itemsgetter(item, *items)
        else:
            self = object.__new__(cls)
            self._item = item
            return self

    def __call__(self, obj, /):
        return obj[self.item]


class itemsgetter(itemgetter):
    def __init__(self, *items):
        self._items = items

    def __call__(self, obj, /):
        return tuple(obj[i] for i in self.items)

oscarbenjamin · October 24, 2024, 11:18am

This is generally true of everything in the operator module:

getter = lambda obj: obj[item]

The purpose of functions like itemgetter is that they are builtins i.e. you can’t easily write them yourself in Python code. If the interpreter was more efficient then the whole operator module would be unnecessary. Perhaps on PyPy it is unnecessary and it would be just as fast to use lambda functions instead.

While I agree that itemgetter was misdesigned, this does not warrant any deprecation or compatibility break. Lots of perfectly fine code uses itemgetter with multiple arguments and forcing that code to change is not worthwhile. If a new itemsgetter is added then new code does not need to use itemgetter with multiple arguments. The docs can suggest using itemsgetter if multiple items are wanted.

Existing code should not be broken or forced to change without good reason.

blhsing · October 25, 2024, 4:21am

dg-pb:

This could work (as per suggestions above):

class itemgetter:
    def __new__(cls, item, /, *items):
        if items:
            # DEPRECATION WARNING HERE
            return itemsgetter(item, *items)
        else:
            self = object.__new__(cls)
            self._item = item
            return self

    def __call__(self, obj, /):
        return obj[self.item]

Looks like a good implementation of itemgetter indeed as a wrapper over itemsgetter, though again I don’t think deprecation is at all necessary.

I don’t see any value in making itemsgetter a subclass of itemgetter, especially when itemgetter is a wrapper over itemsgetter, which actually justifies itemgetter as a subclass of itemsgetter.

dg-pb · October 25, 2024, 6:40am

Don’t think it is about whether it is necessary, but more about what is better in the long run.

I am +1 for slow deprecation to reduce unnecessary complexity.

itemgetter would return itemsgetter instance and would break the code.

blhsing · October 25, 2024, 7:21am

Ah you’re right. I had it backwards.

pf_moore · October 25, 2024, 10:09am

Is it only me who thinks that having two functions called itemgetter and itemsgetter is a recipe for incredible amounts of confusion and typos?

ncoghlan · October 25, 2024, 12:37pm

I agree in principle with the idea of splitting the API, but disagree with the specifics of this approach to doing that.

Firstly, the existing support for multiple values is fine, and it can just throw ValueError (recommending the new API) for new features where the multi value support isn’t fine. No need to deprecate anything or to add a second way of doing the things that the new API would support.

Secondly, I agree with @pf_moore that a single s in the middle of a runtogether word is too subtle a distinction.

My suggested name would be itemtuplegetter to emphasise that it always returns a tuple (even for one lookup).

dg-pb · October 25, 2024, 11:15pm

Agreed. Although I, personally, tend to prefer breaking things in the spirit of beauty, I appreciate when this is not desirable.

So now:

Leave itemgetter as it is and limit its future extensions to single item. E.g. in case of addition of default, it would work on single item, but raise an error for multiple items with suggestion to use multiple-items-version.
Naming is not optimal. I like itemtuplegetter. It can be used if nothing better comes up. Does anyone have any better ideas?

effigies · October 26, 2024, 2:13am

TBH I’m not sure that two classes is less complex than a kwarg:

class itemgetter:
    def __init__(self, *items, unwrap=True):
        self.items = items
        self.unwrap = unwrap and len(items) == 1

    def __call__(self, obj, /):
        ret = tuple(obj[i] for i in self.items)
        return ret[0] if self.unwrap else ret

I’m sure you can make it more complex trying to squeeze every bit of performance out, but the concept is straightforward.

blhsing · October 26, 2024, 2:47am

I have a better idea.

The whole reason why itemgetter is designed in the way it is is to cater to use cases where all the items are manually specified, where the number of items is known beforehand, so it is actually convenient to return the value directly without wrapping it a tuple when only one item is explicitly specified.

But the reason why we are having a discussion of a potential new function is to satisfy the other use cases, where an unknown number of items are specified. And how do we specify an unknown number of items? Currently, by unpacking an iterable into the variable argument of itemgetter. But why use a variable argument at all if the use case is all about using an iterable of items?

How about we just make the new function accept an actual iterable instead, so the old and the new functions serve distinctly different purposes and there’s no need to ever deprecate the old function?

dg-pb · October 26, 2024, 3:07am

Good one.

Also, it would add 2 cents to performance by not needing to unpack *args.

And having them conceptually uncoupled, would also allow extensions to one, but not another - as appropriate.

ncoghlan · October 26, 2024, 3:33am

As illustrated in the last couple of comments, defining an item retrieval API that always returns a tuple of items (avoiding the “one item” vs “multiple items” dynamic behaviour in the existing API that changes the return type from Any to tuple[Any]) allows for certain simplifications in the rest of the API design:

always accept an iterable of lookup keys
always accept an iterable of default values (use itertools.repeat if every item should have the same default)

It’s much less beneficial if the call signatures are identical, but that’s just an argument against keeping the signatures the same, rather than being an argument against a version of the item retrieval API that always returns a tuple.

“itemtuplegetter” existing would make me wish for a plain “getitemtuple” API somewhere (possibly right next to it in the operator module) that was its eagerly evaluated counterpart, though (similar to the attrgetter/getattr pair).

dg-pb · October 26, 2024, 3:42am

In this case, sentinel would be needed for partial defaults. Although, I would prefer iterable to dict, but my best attempt for multiple defaults so far:

class itemtuplegetter:
    def __init__(self, *items,
                 default=_NOT_GIVEN, 
                 defaults=_NOT_GIVEN):
        self._items = items
        if defaults is not _NOT_GIVEN:
            self._defaults = [
                defaults.get(item, default)
                for item in items
            ]
        elif default is _NOT_GIVEN:
            self._defaults = _NOT_GIVEN
        else:
            self._defaults = [default] * len(items)
   ...

Same, I have similar one defined for myself.