Adding the method find() to list

marcospb19 · May 6, 2020, 3:07pm

Before I talk about the real topic, I have to ask if the discussion of enhancements should start here, because I already created a issue in the issue tracker.

It is Ok to just create it, or the discussion should start here?

Now the real topic:
.
.
.

“”"
PROBLEM:

When trying to search the position of an element inside a list, we should use the in operator to first check if the element exists, and then use the index method to obtain the index.

in (__contains__) runs a linear search to return the boolean.
index also runs a linear search to return the index.

This makes the code slower, because we need to search for the same item twice.

FEATURE PROPOSAL:

Similar to str.find(), list.find() should be implemented, where -1 is returned when the element isn’t present
“”"

# Since there's no list.find(), this is my workaround to achieve making only one linear search per query
def find(container: list, index: int) -> int:
    """ Str.find() behavior but for lists """
    try:
        return container.index(index)
    except ValueError:
        return -1

# Example driver code:
index = find(list, possible_element)
if index_of_element == -1:
    pass # Not found
else:
    pass # Found

encukou · May 6, 2020, 3:44pm

We should only use index, and handle ValueError. That’s not a workaround, it’s the correct way to do this.

try:
    index = some_list.index(possible_element)
except ValueError:
    "Not found"
else:
    "Found"

marcospb19 · May 6, 2020, 3:52pm

Thanks for your reply.

Why do we have the method str.find() implemented?

Why don’t we type this for strings?

try:
    index = string.index(text)

Instead, we type

index = string.find(text)

I think it’s because find() semantics are better, can we discuss about it?

ericvsmith · May 6, 2020, 5:40pm

I think str.find() wouldn’t be added today if it didn’t already exist, because -1 is a valid index. Maybe you could argue that it should return None if the value isn’t found to prevent that error. But you’re still going to need some code to catch that, so an exception seems like the better design.

marcospb19 · May 6, 2020, 6:52pm

I think that I get it, this was my very main argument.

My view is that things should be consistent, searching for something in a list should not require a exception if searching it in a string does not (like, why???)

Also, I used to see exceptions as treatable errors

EOFError when trying to input() and STDIN stops
FileNotFoundError when trying to open invalid file
IndexError when index acess is out of bounds
SyntaxError if the syntax is incorrect

So I really don’t see how simply searching for a element in a list should require exception treatment, this looks like “not-so-good” design. How do this fit to the category of what I listed above?

This is so inconsistent, maybe it is just like that because it has always been like that.

But, that this is also not beginner-friendly, I bet a large amount of people are just using __contains__ followed by .index() because they don’t realized they need to catch the exception to avoid running a linear search twice.

Let’s make it easier for our programmers.

marcospb19 · May 6, 2020, 6:56pm

Maybe if str.find() returns None instead of -1 in a future python version, this would prevent people from acessing the last char.

But we can’t have str.find() returning -1 and list.find() returning None (more inconsistency ) so I can’t argue that.

That’s actually the point, I’m trying to present a POV from a person that don’t have as many experience as you guys have.

A experienced Python programmer maybe just know that searching in a list requires a try block, but how intuitive this really is for the millions of learners?

encukou · May 7, 2020, 11:25am

Add to the list:

ValueError when e.g. int(...) receives a non-numeric string
for loops work by repeatedly calling a “get the next item” function, which raises StopIteration to signal the end of the loop

When a function can’t return a reasonable value, it should raise an exception instead. This tends to result in safer programs: ones that fail cleanly, rather than calculating with incorrect values.
If a function returns a “error marker value” like None or -1 instead, any code that calls the function needs to check if it gets an “invalid value” marker. If you forget this, the program will behave in unexpected ways.

The find method was added long ago, and was (I presume) inspired by the C language. C doesn’t have exceptions, so most functions have a dedicated “error marker value” to return. If you look at some random module written in C, you’ll most likely find it has a lot of code that does essentially this:

initial_result = do_something()
if initial_result == None:
    clean_up()
    return None
intermediate_result = do_something_else(intermediate_result)
if intermediate_result == None:
    clean_up()
    return None
final_result = do_yet_another_thing(intermediate_result)
if final_result == None:
    clean_up()
    return None
clean_up()
return final_result

(This is funniest when the operations are basic arithmetic…)
If you forget one of those error handling blocks, you have a bug. (Except if you can prove you can safely omit it, like the last one here. But be careful about errors in your reasoning!)

In Python, you’d need to surround every call to index with try/except, which would be just as bothersome as the C approach. But if you know the error case won’t happen, you can leave out the try/except, and the program will do “the right thing”. Not so in C: if you forget error handling in C, the program will ignore all erorrs (think MemoryError or a Ctrl+C interrupt).

Python’s idea of “the right thing” is raising exceptions: giving callers a chance to handle the error, and if none do, display a traceback with (hopefully) useful information about what went wrong.
(There are other ideas of “the right thing”: Rust or Go generally force the programmer to think about the possible errors every time something can go wrong, which can be as tedious as the C style, but since you can’t forget, it tends to lead to more robust programs than Python’s exceptions.)
The C way (which you can also have in Python – see find’s -1 return value) is worst: it’s tedious and makes it easy to write buggy code. So we’re not likely to add new functions that act like this. On the other hand, we won’t remove functions that people already use: after all, it is possible to use find correctly, you just need to be a bit more careful. And it doesn’t really look tedious or bug-prone until you need to write a whole complex program in the C style – so it’s, sadly, not something you can easily explain to beginners :‍(

Maybe beginners’ courses should spend some time explaining that exceptions are friends, not monsters. Mine does. How can we make it more common?

marcospb19 · May 7, 2020, 8:09pm

Thank you so much, @encukou, for your extensive clarifications on exceptions.

Then, do you both think that the str.find() will go through any changes in future versions? Is it worth the change?

ericvsmith · May 8, 2020, 12:56am

If we can’t get rid of bytes.swapcase() or str.swapcase(), which I think are just about useless, then I don’t see us getting rid of str.find(), which at least has some valid uses. It’s just not worth the disruption.

encukou · May 8, 2020, 11:01am

The same holds for changing rather than removing it. Not worth the change.