Indexable get method. [1,2,3].get(4) # None

alicederyn · May 21, 2024, 7:14pm

I think that might need to be tweaked?

(thing1, thing2), thing3 = items[:2], items.get(3, default)

dg-pb · May 21, 2024, 7:29pm

Correct, my apologies. Had it in 2 lines at first.

Also, just in case, I have nothing to do with beartype, but it had an influence on me.

oscarbenjamin · May 21, 2024, 7:35pm

dg-pb:

oscarbenjamin:
if len(items) == 3:
    thing1, thing2, thing3 = items
else:
    thing1, thing2, thing3 = *items, default_value
I don’t see any examples that look like compelling use cases for list.get().
Here is one. This could be done with 3x less code and more readable:
thing1, thing2, thing3 = items[:2], items.get(3, default)

These are not equivalent because this version silently accepts more than 3 items (or it would if the index was corrected).

Also I don’t consider that to be more readable. This is what I meant when I said:

Many of the examples are expressing some nontrivial conditionality that I would want to show more explicitly in an if/else statement

The complexity of most code comes from conditionality. I always want to separate that conditionality from everything else so that it is clearly seen and handled in one place.

It’s not about being concerned about efficiency: this seems like the sort of thing that just should not happen in a hot loop. If you want to optimise a hot loop then you decide ahead of time where something is going to be a 2-tuple or a 3-tuple rather than writing a function that doesn’t know what it is going to get.

dg-pb · May 21, 2024, 7:41pm

True.

False. Writing efficient code where one does not know the length in advance is also valid. It can be both flexible and efficient at the same time - doesn’t have to be binary.

elis.byberi · May 21, 2024, 7:46pm

Would you like to elaborate more on how it could be made faster?

dg-pb · May 21, 2024, 7:47pm

Would this work?

[thing1, thing2], thing3 = items[-1:], items.get(-1,  default)

I see yur preference, but I like dict.get and I use it instead of wrapping it in if/elfse. I think I would prefer the same with list.get when it is appropriate.

pf_moore · May 21, 2024, 7:51pm

Presumably you meant items[:-1]?

You’ve made at least 3 mistakes so far implementing this. That seems to suggest that it’s definitely not as clear as the original if-statement based version that @oscarbenjamin presented…

dg-pb · May 21, 2024, 8:10pm

I would prefer not to get into this. If proving this is the determining factor I promise I will.

dg-pb · May 21, 2024, 8:12pm

No, it means that I am tired and I need a break. The syntax is clear. I don’t see anything unclear about items[:-1] - thats what people learn in 1st day of learning python.

if-statement is slightly clearer though, but to me personally brevity of the latter adds to clarity, so readability of those 2 is the same to me. And if readability is the same, then I tend to choose less verbose variant.

yoavdw · May 21, 2024, 8:18pm

I feel like this is also true for dict.get. A lot of the times I felt like it was a bit harder to get right than “manually” checking if it’s in the dict using if-else.

Infact, this may also be true in general for expressions opposed to statements. An inline if-else, for me, can be a bit more tricky than a 4-line block that does the same thing. Especially since expressions let users do a lot of unreadable things, like an if-else inside an if-else, all in one line.

Powerful constructs like if-else expressions are always going to create potential misuse, because they are so compact. Despite that, I’ve always felt if-else expressions and dict.get are very useful, and I think list.get would also fall under that.

Kxnr · May 21, 2024, 9:15pm

I’ve run into a handful of places where I would’ve used [...].get(...) if it existed, usually parsing weird xml or json, and turned to more_itertools.first or more_itertools.only. I haven’t needed to optimize those in any sort of hot loop, but I think the resulting code has ended up more readable than it would have using [...].get(...).

For something more complex with an unknown sequence, match now seems like a much more powerful approach than get:

some_list = [1, 2]

match some_list:
    case [a, b]:
        # do whatever default logic needs to be done
    case [a, b, c]:
        # we have all 3 values, nothing to see here
    case _:
        # raise, don't know what to do with 1 or 4+ items

This only really works for short sequences, but I think that would also be true of [...].get. I’d have some questions if I ever saw [...].get(17, None) in a PR. This is the core difference to me between {...}.get and [...].get–to use [...].get, I’d have to reason about previous elements in the sequence.

alicederyn · May 22, 2024, 7:26am

No, I don’t think so: even with Paul’s fix, this no longer works if there’s two items. You’re only ever using that default if items is empty.

I think this works:

thing1, thing2, thing3 = (*items[0:2], items.get(2,  default), *items[3:])

But I’m pretty sure it’d be slower, and I don’t think it’s clear any more.

blhsing · May 22, 2024, 7:41am

I’m neutral to the idea as I don’t have much real-world use case for it while not seeing how it can do harm either.

In the mean time, we can make use of the @ operator as a cute alternative to .:

class get:
    def __init__(self, key, default=None):
        self.key = key
        self.default = default

    def __rmatmul__(self, other):
        try:
            return other[self.key]
        except (IndexError, KeyError):
            return self.default

print([1, 2, 3]@get(1)) # outputs 2
print([1, 2, 3]@get(4)) # outputs None
print({1: 2, 3: 4}@get(1)) # outputs 2
print({1: 2, 3: 4}@get(4)) # outputs None

Demo here

dg-pb · May 22, 2024, 8:31am

With slicing

class get:
    def __init__(self, idx, default=None):
        self.idx = idx
        self.default = default

    def __rmatmul__(self, other):
        idx, default = self.idx, self.default
        if isinstance(idx, slice):
            start = 0 if idx.start is None else idx.start
            stop = len(other) if idx.stop is None else idx.stop
            step = 1 if idx.step is None else idx.step
            result = list()
            for i in range(start, stop, step):
                try:
                    item = other[i]
                except (IndexError, KeyError):
                    item = default
                result.append(item)
            return result
        try:
            return other[idx]
        except (IndexError, KeyError):
            return default


print([1, 2, 3]@get(1)) # outputs 2
print([1, 2, 3]@get(4)) # outputs None
print({1: 2, 3: 4}@get(1)) # outputs 2
print({1: 2, 3: 4}@get(4)) # outputs None
print([1, 2, 3]@get(slice(2, 5))) # outputs [3, None, None]

Attempt This Online!

oscarbenjamin · May 22, 2024, 1:04pm

I agree and I generally do not use if/else expressions for this reason.

Part of the reason I find the linked examples not compelling is because they all look like code that I don’t like and wouldn’t write. If I was to try to improve those examples then it would be to use function defaults or a proper if/else statement somewhere earlier rather than turning the if/else expression into anything that uses list.get.

It only seems to make sense to use list.get in those cases because you have allowed a situation to arise where you don’t know if what should be a fixed-length list has 2 or 3 items. That uncertainty is something that should be rectified as early as possible though rather than being allowed to propagate through the rest of the code.

I don’t see dict.get in the same vein here because it is useful for algorithms that do expensive things with potentially large dicts like the polynomial example I showed above. The same is never needed for lists because you either know what the valid indices are (from len) or you don’t using indexing at all.

Likewise if you have a dict as something like a cache then you want to be able to do single lookups:

def func(obj):
    val = _cache.get(obj)
    if val is None:
        val = _cache[obj] = real_func(obj)
    return val

Here you want maximum performance and dict.get avoids doing the double hash-table lookup in the happy path:

if obj in _cache:
    return _cache[obj]

(You can also catch KeyError but I would rather use .get() and avoid any exception handling.)

It seems that most people here are imagining using dict.get in some different context where you have a small dict of e.g. config settings and you want to handle a missing value or something. That is more like what all of the linked examples are doing with lists like foo[0] if len(foo) else bar. Probably in that sort of context I would not use dict.get either because the conditionality is nontrivial.

yoavdw · May 22, 2024, 3:56pm

Oscar Benjamin:

Here you want maximum performance and dict.get avoids doing the double hash-table lookup in the happy path:
if obj in _cache:
    return _cache[obj]
(You can also catch KeyError but I would rather use .get() and avoid any exception handling.)

This is why dict.get IS actually in the same vein.

You CAN get maximal performance by using exception handling for both list and dict.

And even though using exception handling is more explicit and easier to get right in complex scenarios, you’d still tather use .get in many cases since being that compact makes it that much more readable.

dg-pb · May 22, 2024, 4:08pm

No one uses dict like that. Can you point to anything like this in any library? At least 2 examples.

Also, why dict.get? Why not try-except statement?

DerSchinken · May 23, 2024, 10:12am

I think that inline statements are a Python code smell, tho they are powerful and cool they aren’t the best for readability, and as the this module states Readability counts.. I also find that they can easily get out of control and be “unreadable”.

You have to learn the .get function anyway for dicts, so I don’t think that it makes lists or tuples harder to learn. For dicts, you get the value for a key and for lists, etc. you get the value for an index which makes sense and thus should make it easy to learn.

I’m against this because I would want it to stay in line with dict.get where it’s not possible to use slices (captain obvious I am xD) or multiple keys (as this feels like using multiple keys)

I think that this would improve the speed of these operations. I’m not an expert in this, so please correct me if I’m wrong, but I think I’ve read that try’s are slower than not having them(?), so removing that would make it faster.

I’ve also seen that quite a lot of people do this, so having a simple function that’s being called makes code flatter ^[1], makes code shorter, and can also make it faster depending on how much this happens ^[2].

I’ll try to add it to my own cpython repository and test it tomorrow (or in the next few days depending on how much time I have ). But I’m set on the idea that this would be a good addition to Python for lists and tuples (and/or other types).

which btw. is called good by this ↩︎
as others have stated this shouldn’t be needed directly in loops but when it does it’s faster and it doesn’t remove the other points. Also, a function containing a call to this function could be called in a loop so I don’t know if it’s worthless to think about its efficiency ↩︎

dg-pb · May 23, 2024, 10:27am

Well it depends what you aim to be compatible with. I think comparing list.get with dict.get literally is a mistake.

However, there is a sensible comparison.

“list.get implements list.__getitem__ with a default in a same way as dict.get implements dict.__getitem__ with a default.”

These are 2 separate objects and their extensions should relate to their own functionality.

I don’t think building list extension based on how dict functions is a good idea, however drawing parallels for the sake of user experience is beneficial.

Now, I am not suggesting that list.get SHOULD function with slice, but I think this deserves a decent consideration. Furthermore, it could be a decisive factor given that list.get is much less useful than dict.get (if compared only for scalar indices) and its extended functioning with slice might add some points to its usefulness (might not).

Having that said, I don’t like:

list.get(idx: int | slice, default=None)
print(list.get(slice(2, 4), None))    # [2, None]

too much. Explicit slice is not very attractive or in line with standard lib practices.

blhsing · May 24, 2024, 9:59am

I’ve actually encountered a couple of real-world use cases for list.get today, one where I need to sometimes append an incremental ID to an initially empty list only if the last item of the list isn’t already the ID:

def add_id(id):
    if not ids or ids[-1] != id:
        ids.append(id)

ids = []
id = 1
... # id may increment and add_id(id) may be called at different times

which may be simplified as:

def add_id(id):
    if ids.get(-1) != id:
        ids.append(id)

and the other where I need to then perform a binary search for the rightmost ID in the aforementioned list that’s less than or equal to a given ID (following the find_le recipe), and default to 0 if not found (not raising an exception because the ID will then be used for a dict.get call with a meaningful default value):

def find_closest_id(id):
    index = bisect_right(ids, id)
    if index:
        return ids[index - 1]
    return 0

which can be simplified as:

def find_closest_id(id):
    return ids.get(bisect_right(ids, id) - 1, 0)

So I think I’m +1 on this idea now.