When you need to find out if something is in some collection of items

such as a list, how slow (expensive) that it depends on the nature of

the collection.

If you have a list which was in no particular order you would need to

examine each item in the list to see if it matched. So the expense, on

average, would be proportional to the length of the list. We call this

cost *O(N)* where *N* represents the length of the list. Big-O notation

means the order (scale) of the expense as a function of some measure of

the problem - here the measure is the length of the list.

See: Big O notation - Wikipedia for detail and further

references.

If the list was known to be in order, for example ascending order, you

have a couple of choices to take advantage of this:

- search until you find a match or the items exceed your target value -

at that point you know there’'s no point in searhcing further; that is

also O(N), but better - about `o(N/2)`

on average
- do a bisect search

A bisect search starts in the middle of the list. If your item is less

than that entry, pick a point half way between the start and that entry

and look again. If it’s more, pick a point half way between that entry

and the end and look again. In this way we start with two bounds: the

start and end of the list. Pick an item half way between and compare.

Then choose the left or right side as the new boundary and repeat.

In this way the boundaries get closer together until you find a match or

the boundaries come together (no match). Because each iteration halves

the size of the range, you need about `O(log2(N))`

iterations. This is

*much* faster than the linear search. *For larger lists*. For a tiny

list like your 3-element examples the mucking about exceeds the cost of

just scanning the list. In the real world the bisect approach is a win

for this.

A set is better still. It uses a data structure internally called a hash

table: Hash table - Wikipedia

A hash table is essentially a list, each of whose entries is a small

list of items. You decide where an item goes in the list with a *hash*

function whose purpose is to (on average) distribute the items *evenly*

across the list, and the hash function always produces the same value

for a given item. You choose the size of the list based on the number of

values you’re storing in it in. The idea is that *on average* each entry

in the list has very few items stored it in, often 0 or 1.

Then, when you go to see if some test item is in the table, you compute

the test item’s hash value, and use that to pick which entry *would*

hold this item if it were in the list. Then you only need to see if the

item occurs in the *very short* sublist stored in the entry, and can

ignore *all the other entries completely*.

By sizing the table to have very few items in any entry, this makes the

cost of looking things up the cost of searching a 0 or 1 element list,

which is basicly a constant small time. We call this *O(1)* in big-O

notation.

This is amazingly effective. Of course it isn’t free: you need to do

work to create and populate the list using the hash function etc. But

once made, lookups are very fast.

A Python `set`

uses a hash table to index its element, and thus has

*O(1)* lookup time (the cost of a single `item in Akeys`

test).

A Python `dict`

which maps keys to values *also* uses a hash table for

its lookups.

When you make a set, eg:

```
s = set( (1, 2, 3, 4, 2, 6, 7) )
```

Python does all this work for you, and then you can test:

```
item in s
```

very cheaply.