`itemgetter` split into 2 objects

Following feedback from Itemgetter `always_tuple` keyword arg - #8 by dg-pb I thought I will propose this alternative.

It would straighten things out eliminating current complexity that comes from the attempt to bundle 2 return types into 1 object.

There is no big complexity issue as of now, but extensions will make things unnecessarily messy. E.g. Allowing missing item or attributes in operator's itemgetter and attrgetter - #15 by dg-pb

Furthermore, current implementation does 1 item well, but is ambiguous when is used programatically with unknown number of items where the user needs to explicitly check how many items are provided to args to know whether it is going to be tuple of values or just 1 value.

Introduction of extra keyword argument is fairly messy and having it all in 1 object further increases complexity.

It would break backwards compatibility, but it might just be worth it.

So it would simple be:

class itemgetter:
    def __init__(self, item, /):
        self._item = item

    def __call__(self, obj, /):
        return obj[self.item]


class itemsgetter:
    def __init__(self, *items):
        self._items = items

    def __call__(self, obj, /):
        return tuple(obj[i] for i in self.items)

Could also leave itemgetter for multiple values and name single item getter singleitemgetter.

Are there any use cases, where current behaviour is desired? I.e. There is unknown number of items and return needs to be single value if n == 1 and tuple(values) if n > 1.

I like this design, but it needs a transition plan, due to breaking backwards compatibility.

A nice way to transition to this design would be to add a __new__ to itemgetter which returns you an instance of itemsgetter, when you pass it more than one item, thereby emulating the old behavior of itemgetter.

That way you can emit a DeprecationWarning either immediately or in some future version when calling itemgetter with multiple arguments and can postpone making it a breaking change either indefinitely or at least a long time, without making it a huge maintenance burden.

2 Likes

Ah, yes, itemgetter is a class acting as a function.

Then likely itemsgetter (plural) needs to subclass itemgetter for backward compatibility, correct?
Not that I know of occurrences of isinstance(obj, itemgetter), but it’s bound to exist somewhere.

Why does itemgetter need to have all of this complexity added? If you need something that’s “like itemgetter, except…”, it’s not that hard to write your own, doing precisely what you need it to.

5 Likes

The design would be fine if it were like that since day 1. But since the ship has sailed, I don’t quite see why we need to touch the current itemgetter at all if we are to add a new function.

Otherwise, extensions such as defaults will have unnecessarily complex implementation.

@Daverball’s idea could be a happy middle here.
The need for subclassing is a bit annoying, but still worth it IMO.

I don’t quite see how the defaults feature can complicate the implementation, as the new itemgetter can simply be reimplemented as a wrapper of itemsgetter that unpacks the returning tuple, or itemsgetter can be implemented as a wrapper of itemgetter with a (maybe undocumented/private) unwrap flag, or maybe both of them can be implemented as a wrapper of a core function with this flag (so that the flag wouldn’t be exposed).

All of these are valid options. Each with their pros and cons.

  • itemgetter as a wrapper of itemsgetter creates unnecessary tuple.
  • having both in same object results in unnecessary complexity when adding new features

This could work (as per suggestions above):

class itemgetter:
    def __new__(cls, item, /, *items):
        if items:
            # DEPRECATION WARNING HERE
            return itemsgetter(item, *items)
        else:
            self = object.__new__(cls)
            self._item = item
            return self

    def __call__(self, obj, /):
        return obj[self.item]


class itemsgetter(itemgetter):
    def __init__(self, *items):
        self._items = items

    def __call__(self, obj, /):
        return tuple(obj[i] for i in self.items)
1 Like

This is generally true of everything in the operator module:

getter = lambda obj: obj[item]

The purpose of functions like itemgetter is that they are builtins i.e. you can’t easily write them yourself in Python code. If the interpreter was more efficient then the whole operator module would be unnecessary. Perhaps on PyPy it is unnecessary and it would be just as fast to use lambda functions instead.

While I agree that itemgetter was misdesigned, this does not warrant any deprecation or compatibility break. Lots of perfectly fine code uses itemgetter with multiple arguments and forcing that code to change is not worthwhile. If a new itemsgetter is added then new code does not need to use itemgetter with multiple arguments. The docs can suggest using itemsgetter if multiple items are wanted.

Existing code should not be broken or forced to change without good reason.

1 Like

Looks like a good implementation of itemgetter indeed as a wrapper over itemsgetter, though again I don’t think deprecation is at all necessary.

I don’t see any value in making itemsgetter a subclass of itemgetter, especially when itemgetter is a wrapper over itemsgetter, which actually justifies itemgetter as a subclass of itemsgetter.

Don’t think it is about whether it is necessary, but more about what is better in the long run.

I am +1 for slow deprecation to reduce unnecessary complexity.

itemgetter would return itemsgetter instance and would break the code.

1 Like

Ah you’re right. I had it backwards.

1 Like

Is it only me who thinks that having two functions called itemgetter and itemsgetter is a recipe for incredible amounts of confusion and typos?

15 Likes

I agree in principle with the idea of splitting the API, but disagree with the specifics of this approach to doing that.

Firstly, the existing support for multiple values is fine, and it can just throw ValueError (recommending the new API) for new features where the multi value support isn’t fine. No need to deprecate anything or to add a second way of doing the things that the new API would support.

Secondly, I agree with @pf_moore that a single s in the middle of a runtogether word is too subtle a distinction.

My suggested name would be itemtuplegetter to emphasise that it always returns a tuple (even for one lookup).

6 Likes

Agreed. Although I, personally, tend to prefer breaking things in the spirit of beauty, I appreciate when this is not desirable.

So now:

  1. Leave itemgetter as it is and limit its future extensions to single item. E.g. in case of addition of default, it would work on single item, but raise an error for multiple items with suggestion to use multiple-items-version.

  2. Naming is not optimal. I like itemtuplegetter. It can be used if nothing better comes up. Does anyone have any better ideas?

1 Like

TBH I’m not sure that two classes is less complex than a kwarg:

class itemgetter:
    def __init__(self, *items, unwrap=True):
        self.items = items
        self.unwrap = unwrap and len(items) == 1

    def __call__(self, obj, /):
        ret = tuple(obj[i] for i in self.items)
        return ret[0] if self.unwrap else ret

I’m sure you can make it more complex trying to squeeze every bit of performance out, but the concept is straightforward.

I have a better idea.

The whole reason why itemgetter is designed in the way it is is to cater to use cases where all the items are manually specified, where the number of items is known beforehand, so it is actually convenient to return the value directly without wrapping it a tuple when only one item is explicitly specified.

But the reason why we are having a discussion of a potential new function is to satisfy the other use cases, where an unknown number of items are specified. And how do we specify an unknown number of items? Currently, by unpacking an iterable into the variable argument of itemgetter. But why use a variable argument at all if the use case is all about using an iterable of items?

How about we just make the new function accept an actual iterable instead, so the old and the new functions serve distinctly different purposes and there’s no need to ever deprecate the old function?

1 Like

Good one.

Also, it would add 2 cents to performance by not needing to unpack *args.

And having them conceptually uncoupled, would also allow extensions to one, but not another - as appropriate.

1 Like

As illustrated in the last couple of comments, defining an item retrieval API that always returns a tuple of items (avoiding the “one item” vs “multiple items” dynamic behaviour in the existing API that changes the return type from Any to tuple[Any]) allows for certain simplifications in the rest of the API design:

  • always accept an iterable of lookup keys
  • always accept an iterable of default values (use itertools.repeat if every item should have the same default)

It’s much less beneficial if the call signatures are identical, but that’s just an argument against keeping the signatures the same, rather than being an argument against a version of the item retrieval API that always returns a tuple.

“itemtuplegetter” existing would make me wish for a plain “getitemtuple” API somewhere (possibly right next to it in the operator module) that was its eagerly evaluated counterpart, though (similar to the attrgetter/getattr pair).

1 Like

In this case, sentinel would be needed for partial defaults. Although, I would prefer iterable to dict, but my best attempt for multiple defaults so far:

class itemtuplegetter:
    def __init__(self, *items,
                 default=_NOT_GIVEN, 
                 defaults=_NOT_GIVEN):
        self._items = items
        if defaults is not _NOT_GIVEN:
            self._defaults = [
                defaults.get(item, default)
                for item in items
            ]
        elif default is _NOT_GIVEN:
            self._defaults = _NOT_GIVEN
        else:
            self._defaults = [default] * len(items)
   ...

Same, I have similar one defined for myself.