What is a fast method to check if one list is a sublist of a bigger one? For example, in my case,

`A = [1, 2, 2, 3]`

is a subset of `B = [1, 2, 2, 3, 4]`

, but A is not a subset of `C = [1, 2, 3, 4]`

since one of the 2’s in A is missing in C.

What is a fast method to check if one list is a sublist of a bigger one? For example, in my case,

`A = [1, 2, 2, 3]`

is a subset of `B = [1, 2, 2, 3, 4]`

, but A is not a subset of `C = [1, 2, 3, 4]`

since one of the 2’s in A is missing in C.

edit: premature send

Can you make use of sets? `set(A) <= set(B)`

is extremely efficient. Perhaps, since duplicates seem to matter, `collections.Counter()`

can serve as a multiset.

Can’t use sets because the containers I’m working with may contain duplicate elements. But multisets can contain duplicates?

If all the numbers are non-negative ints, you can use numpy.bincount:

```
import numpy as np
A = [1, 2, 2, 3]
B = [1, 2, 2, 3, 4]
C = [1, 2, 3, 4]
cA = np.bincount(A)
cB = np.bincount(B)
cC = np.bincount(C)
def is_subset(ca, cb):
return len(ca) <= len(cb) and (ca <= cb[:len(ca)]).all()
print(is_subset(cA, cB)) # True
print(is_subset(cA, cC)) # False
```

Are the lists always sorted, as in your examples?

For sorted lists A and B, iterate an index through list B, to find the first occurrence at index i, of A[0] in B, i.e. B[i] == A[0].

So:

```
def is_sorted_sublist_of_sorted(A, B):
# TODO: reimplement using bisect.py
i = B.index(A[0])
for x, y in itertools.zip_longest(A, B[i:], fillvalue=float('nan')):
if x != y:
return False
return True
```

Your title says “sublist”, but your text says “subset”. OTOH, your examples are sets not lists, so I’ll assume list.

If also “sublist” means consecutive and anywhere (so that `[2,2,3]`

is a sublist of `B`

but `[1,3,4]`

is not a sublist of `B`

, then we are just dealing with the same problem as string search (as in `str.find`

), but with integers instead of characters.

For this, I like Boyer-Moore. There is a set-up cost proportional to the target size (squared?), but it is fast if the target sequence is a lot shorter than the searched sequence, or you use the same target on a lot of searched sequences.

By definition, yes. There isn’t a specific multi-set implementation provided, but `collections.Counter`

can function that way. They don’t implement comparison operators with the meaning that sets do, but we can do it more explicitly:

```
from collections import Counter
def is_sublist(x, y):
cx, cy = Counter(x), Counter(y)
cy.subtract(cx)
return min(cy.values()) >= 0
```

1 Like

Counters do support comparison operators in expected manner. So you can simplify the code a and just return Counter(y) >= Counter(x)

Ah, it looks like this was added in 3.10:

```
$ python3.10
Python 3.10.14 (main, May 2 2024, 15:07:09) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from collections import Counter
>>> Counter() >= Counter()
True
>>>
$ python3.9
Python 3.9.19 (main, May 2 2024, 15:03:24)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from collections import Counter
>>> Counter() >= Counter()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '>=' not supported between instances of 'Counter' and 'Counter'
```

If the input lists are always sorted as they are in your example then you can perform the test more cheaply as a subsequence test with an iterator:

```
def is_subsequence(a, b):
seq = iter(b)
return all(i in seq for i in a)
print(is_subsequence([1, 2, 2, 3], [1, 2, 2, 3, 4])) # outputs True
print(is_subsequence([1, 2, 2, 3], [1, 2, 3, 4])) # outputs False
```

1 Like

I have blogged this, but it depends on what you treat as “found”. I am looking for the elements of the sub-list in their given order, in the main list.