Indexable get method. [1,2,3].get(4) # None

elis.byberi · May 20, 2024, 10:26pm

If start or stop is out of scope, they are ignored. In list.get(slice(start, stop), None) , it seems that if the sequence is empty, it will return None instead of filling missing items in the slice with None.

dg-pb · May 20, 2024, 10:30pm

No, they are not ignored. What happens with __setter__ is different altogether. And I am fully aware that I am comparing getter with setter. This was just a response to statement that going out of indexing bounds is beyond the scope of list functionality.

Rosuav · May 20, 2024, 10:40pm

True, my comment wasn’t from the rejection but from the general temperature in the discussions. That feature just wasn’t one people were itching for.

oscarbenjamin · May 20, 2024, 11:10pm

In my experience most uses of try are overly broad. It is usually a bad construct for normal control flow and so I try to avoid it whenever possible. There are cases where try makes sense like top-level error handling. I would generally avoid designing an interface that requires the caller to catch exceptions if there is any obvious exception-free alternative though.

I don’t think I have ever used try to catch IndexError from accessing a list though. I can’t really imagine what situation I would be in where I would pass candidate indices to a list like stuff[index] and having index be out of bounds would be an exception to handle rather than just a bug that should end in traceback.

dg-pb · May 20, 2024, 11:24pm

Ok, so I made an effort:

69.9k files - /(?-i)^class \w+.*[^\w\[]list\W.*\:/ language:Python
25.0k files - /(?-i)^class \w+.*[^\w\[]Sequence\W.*\:/ language:Python
 4.4k files - /(?-i)^class \w+.*[^\w\[]UserList\W.*\:/ language:Python

I don’t think Sequence argument has any weight. And if this was to be implemented for list, then it should be part of collections base classes too.

Rosuav · May 20, 2024, 11:25pm

Yes. And I provided a suggestion for narrowing the scope of the try, which lead to Alice’s response about it requiring an “extremely tight expression”. However, as the (lengthy!) discussion of PEP 463 showed, people don’t actually want ways to narrow this below a statement. The sentiment wasn’t “wow, I wish we could do that, but what you proposed isn’t good because…”, it was more “meh, why would that be worthwhile?”.

It’s easy to say “that’s too broad”. It’s far less easy to actually narrow it in any useful way without making the code objectively worse. The use of try/else is absolutely able to narrow the scope of try, and yet we don’t often see it used.

alicederyn · May 21, 2024, 7:43am

If you’ll read what I wrote, you’ll see that was my objection in the first place. Having a single expression in the try to minimise the risk of catching the wrong thing makes the overall code less readable.

dg-pb · May 21, 2024, 11:12am

So maybe people weren’t bothered about PEP463 because exception catching is not used that much? Ok, it is used a lot in some places, but then is completely obsolete in others.

I use try-except at a fairly high level API, async frameworks and iterators, but generally it is a pretty rare occasion and I am fairly happy about it. I like using it for bigger multiline statements, but for 1-line statements try-except is a bit too verbose for what it does (according to my taste). Of course there are cases where it just fits problem perfectly, then I don’t mind 1-line statement body.

So maybe people aren’t bothered about PEP463 because try-except is not used that much and this proposal indicates that people want to use it even less and only where it is absolutely necessary?

oscarbenjamin · May 21, 2024, 11:47am

I think that try/except is used too much and I don’t want to make it easier to use especially for people who like to code golf cryptic sloppy one-liners. PEP 463 makes it easier to narrow the scope but it even if you narrow the scope to a single expression that expression is still calling some other code so somewhere else an entire function body is inside the try along with all functions called transitively. Exceptions like KeyError and IndexError can easily result from things that should rightly be considered bugs so any time you catch them you run the risk of catching an exception from the wrong place.

It is almost always better to use some other kind of control flow rather than exception catching and that is why I would use dict.get rather than catching KeyError. More often though I just use d[foo] because I expect that the key should be found and I want the KeyError to bubble up otherwise to show that there is a bug.

So there are two cases:

I “know” that the key should be there and use d[foo] and the exception if raised should not be caught.
I don’t know if the key is in the dict and use e.g. d.get(foo) and if necessary check the returned value.

This thread proposes list.get but I consider that case different from dict.get because I don’t use dicts and lists in the same way. If I am indexing a list then the code is always written in such a way that I know that the index should be valid so case (2) never arises in my use of lists.

pf_moore · May 21, 2024, 12:18pm

Also, you can check if an index is in range by using len(the_list), which is constant-time, whereas there’s no better way of checking if a key is in a dict than actually trying to access it.

dg-pb · May 21, 2024, 12:39pm

Not quite true

dict.get(key, d) == key if key in dict else d

So len(the_list) is the equivalent of key in dict

dict.get can not use try-except in many cases, because underlying mapping can have __missing__ implemented, so general specification of dict.get is:

class Mapping(Collection):
    def get(self, key, default=None):
        'D.get(k[,d]) -> D[k] if k in D, else d.  d defaults to None.'
        try:
            return self[key]
        except KeyError:
            return default

But is implemented with try-except where this is not the problem and sometimes needs to be repalced in subclasses.

pf_moore · May 21, 2024, 12:53pm

You missed my point completely.

len(lst) takes constant time to compute. In effect, it’s “free” to check if an index can be used to access a list without raising an exception.

Conversely, the only way to check if a dictionary contains a given key is to try to access it. The access check is what costs - getting the value once you’ve found that the key is in the dict is “free” as long as you do it as part of the access check.

Specifically, len(the_list) is not equivalent to key in dict if you consider performance (technically, algorithmic complexity).

But this is a side issue. I’m not trying to construct a compelling argument for why list.get isn’t worth adding. All I was doing was adding an extra supporting point to the comment made by @oscarbenjamin as to why he considers list.get different from dict.get.

dg-pb · May 21, 2024, 1:27pm

Ok, I see the difference.

In dict case it is the same operation, in list there is one more way to check for it. And all of these operations are:

list[0]      # O(1)
len(list)    # O(1)

dict[key]    # O(1)
key in dict  # O(1)

Also in practice len(list) is much slower than list[0], so one of the benefits would be that under the hood list would use the most efficient method to implement list.get, which is probably not using len.

But no, I agree these 2 are different objects. I don’t try to argue that they are the same, my position is more like: dict.get is convenient and we like it (do we?) so maybe list.get idea can be compared to dict.get for design purposes so that it is intuitive for anyone who knows how to use dict.get. It will not be the same, but I think the reasons for its addition would be very similar:
a) Convenience
b) Performance
c) Conciseness
d) Readability

I don’t think there is a real problem to be solved here. And I don’t think there was one for dict.get either.

What @oscarbenjamin said can be summarised to “he doesn’t use it”, don’t think there is more to that. He just doesn’t have list.get usage cases in the same way as for dict.get. And I agree there are many cases where workflow for lists is just very different. However, there are other cases e.g. def foo(*args, **kwds) where sequence is treated in the same way as mapping - “a collection of items of unknown length and contents”

oscarbenjamin · May 21, 2024, 3:03pm

Here is an example from the SymPy codebase simplified slightly. This function converts between two different basic representations of a univariate polynomial with one being a list and the other a dict:

def dup_to_dict(f):
    """
    >>> dup_to_dict([1, 0, 5, 0, 7])
    {(0,): 7, (2,): 5, (4,): 1}
    >>> dup_to_dict([])
    {}
    """
    n, result = len(f) - 1, {}

    for k in range(0, n + 1):
        if f[n - k]:
            result[(k,)] = f[n - k]

    return result

I didn’t write this code and if I had I would have written it differently but that doesn’t matter. The point is we call len(f) once before the loop. Then in the loop we always know that n - k will be a valid index for f. That is how indexing with lists always works: the indices are generated by something that is a function of the length of the list.

I don’t understand what is the situation where you are indexing into a list and you don’t know whether the index is valid: where did the index come from?

With dicts it is different because having or not having keys is often part of the purpose of the data structure. For example the dict returned by dup_to_dict omits keys that would have a corresponding value of zero because they can be represented simply by not being in the dict at all. The ith coefficient of the polynomial is found via result.get((i,), 0) if you happen to want it.

There is in fact a function in SymPy that gets the ith coefficient from the list representation that would almost seem to suit list.get:

def dup_nth(f, n, K):
    """
    Return the ``n``-th coefficient of ``f`` in ``K[x]``.
    """
    if n < 0:
        raise IndexError("'n' must be non-negative, got %i" % n)
    elif n >= len(f):
        return K.zero
    else:
        return f[len(f) - n - 1]

Unfortunately .get would not be quite right here because we want zero for n >= len(f) but that would result in asking .get for a negative list index which would not do what we wanted.

Here is the real point though: although someone apparently thought it was useful to add dup_nth because it presumably seemed like an obvious basic thing to want this function is not actually used anywhere in the SymPy codebase. The 272 other dup_* functions never need dup_nth because instead every operation just loops over the list or over a range of indices or uses enumerate etc.

dg-pb · May 21, 2024, 3:12pm

So this falls under the category of list iteration, which is the dominant one in scientific libraries.

Which is the one I am not even arguing about. If one needs to iterate, len will be called and get is never needed.

But the same applies to dict.get.

When one needs to iterate over items:
{... in dict.items()}

dict.get is not needed.

oscarbenjamin · May 21, 2024, 3:45pm

No it is not the same. With lists either indexing is not used at all or if it is then the indices are derived from the length. An out of bounds index can only happen if there is a bug somewhere.

This is the code for adding polynomials in the list representation:

df = len(f) - 1
dg = len(g) - 1

k = abs(df - dg)

if df > dg:
    h, f = f[:k], f[k:]
else:
    h, g = g[:k], g[k:]

return h + [ a + b for a, b in zip(f, g) ]

This is the equivalent for the dict representation:

p = p1.copy()
for k, v in p2.items():
    v = p.get(k, 0) + v
    if v:
        p[k] = v
    else:
        del p[k]
return p

Most basic operations on the dict representation need to handle missing keys mostly using .get. The operations with the list representation never need the equivalent because the keys are implicit in the ordered structure: there are no missing keys.

dg-pb · May 21, 2024, 4:43pm

Yes. I agree with all of what you said.

I just don’t agree with the conclusion that you are aiming to draw from this, which is “list.get doesn’t make sense”.

It does make sense, it is well defined construct with no ambiguities and in functionality is very similar to the one of dict: “get value of key/index, otherwise if key/index is not present return default”

Now as I said you base your argument on scientific library examples that constitute only a small part of list use-cases. And the example you provided is such a corner case that I haven’t seen it anywhere.

dict is almost never used in a way you portrayed. And list example doesn’t represent the full picture either.

What better represents full picture:
https://github.com/search?q=%2F.*\[.*\].+if.*len\(.%2B\).*else.*%2F+language%3Apython&type=code

48K files - /.*\[.*\]. if.*len\(.+\).*else.*/ language:python

Just scroll through the first page and you will find that there are many use cases such as I have indicated previously - treatment of list as a container of unknown length. Which is always the case with def foo(*args), argument parsing in general and many more cases.

The question is: Can list.get be useful enough to be implemented? (regardless if it has differences in rationality compared to dict.get)

dg-pb · May 21, 2024, 4:57pm

Refined regexp a bit. Many of those were false hits:

https://github.com/search?q=%2F.*\[.*\].+if.*len\(.%2B\).*else.*%2F+language%3Apython&type=code

48K files - /.*\[.*\]. if.*len\(.+\).*else.*/ language:python

A bit more realistic now.

oscarbenjamin · May 21, 2024, 6:15pm

Most of those are still false hits i.e. you couldn’t actually use .get(). Most cases have more to them than just getting a default value for an item. Many are of the form:

value = func(stuff[0]) if len(stuff) > 0 else some_value

It isn’t possible to use .get there unless you can pass some_value to func which you don’t really want to do even if it does work:

value = func(stuff.get(0, some_value))

The *args case looks like people not knowing how use default arguments:

def func(*args):
    value = args[0] if len(args) > 0 else default

That should be:

def func(value=default):

None of the examples looks even remotely performance sensitive so there is no need to be concerned about efficiency.

Many of the examples are expressing some nontrivial conditionality that I would want to show more explicitly in an if/else statement rather than a ternary expression or anything using .get.

Many of them seem to be using variable length containers to represent a boolean condition which is definitely something that I would avoid doing. If there is a need to handle being passed a list of either 2 or 3 items then you should handle that as early as possible by splitting it into separate variables. I don’t see .get() being right for that compared to e.g.:

if len(items) == 3:
    thing1, thing2, thing3 = items
else:
    thing1, thing2, thing3 = *items, default_value

I don’t see any examples that look like compelling use cases for list.get().

dg-pb · May 21, 2024, 6:50pm

First page has 20% correct hits.

So that brings it to roughly ~10K files.

However, this is only 1 case. It is hard to test for others.

Furthermore, it would be more cases if it existed in a first place.

It is not a concern about efficiency, but more an intent to improve it. E.g. project like beartype could be made faster. I like writing efficient code so this does matter to me. One bit there, another there and code ends up being 2x faster. E.g. beartype runs in ~700ns for simple cases. There are 3 variables = 3 len calls = 60 ns saving. That is almost 10%. Of course not everyone is so concerned about efficiency, but when writing performant code every little helps.

Oscar Benjamin:

if len(items) == 3:
    thing1, thing2, thing3 = items
else:
    thing1, thing2, thing3 = *items, default_value
I don’t see any examples that look like compelling use cases for list.get().

Here is one. This could be done with 3x less code and more readable:

thing1, thing2, thing3 = items[:2], items.get(3, default)