# Python sublist check

What is a fast method to check if one list is a sublist of a bigger one? For example, in my case,

`A = [1, 2, 2, 3]` is a subset of `B = [1, 2, 2, 3, 4]`, but A is not a subset of `C = [1, 2, 3, 4]` since one of the 2’s in A is missing in C.

edit: premature send

Can you make use of sets? `set(A) <= set(B)` is extremely efficient. Perhaps, since duplicates seem to matter, `collections.Counter()` can serve as a multiset.

Can’t use sets because the containers I’m working with may contain duplicate elements. But multisets can contain duplicates?

If all the numbers are non-negative ints, you can use numpy.bincount:

``````import numpy as np
A = [1, 2, 2, 3]
B = [1, 2, 2, 3, 4]
C = [1, 2, 3, 4]

cA = np.bincount(A)
cB = np.bincount(B)
cC = np.bincount(C)

def is_subset(ca, cb):
return len(ca) <= len(cb) and (ca <= cb[:len(ca)]).all()

print(is_subset(cA, cB))  # True
print(is_subset(cA, cC))  # False
``````

Are the lists always sorted, as in your examples?

For sorted lists A and B, iterate an index through list B, to find the first occurrence at index i, of A[0] in B, i.e. B[i] == A[0].

So:

``````def is_sorted_sublist_of_sorted(A, B):

# TODO: reimplement using bisect.py
i = B.index(A[0])

for x, y in itertools.zip_longest(A, B[i:], fillvalue=float('nan')):
if x != y:
return False
return True
``````

Your title says “sublist”, but your text says “subset”. OTOH, your examples are sets not lists, so I’ll assume list.

If also “sublist” means consecutive and anywhere (so that `[2,2,3]`is a sublist of `B` but `[1,3,4]` is not a sublist of `B`, then we are just dealing with the same problem as string search (as in `str.find`), but with integers instead of characters.

For this, I like Boyer-Moore. There is a set-up cost proportional to the target size (squared?), but it is fast if the target sequence is a lot shorter than the searched sequence, or you use the same target on a lot of searched sequences.

By definition, yes. There isn’t a specific multi-set implementation provided, but `collections.Counter` can function that way. They don’t implement comparison operators with the meaning that sets do, but we can do it more explicitly:

``````from collections import Counter

def is_sublist(x, y):
cx, cy = Counter(x), Counter(y)
cy.subtract(cx)
return min(cy.values()) >= 0
``````
1 Like

Counters do support comparison operators in expected manner. So you can simplify the code a and just return Counter(y) >= Counter(x)

Ah, it looks like this was added in 3.10:

``````\$ python3.10
Python 3.10.14 (main, May  2 2024, 15:07:09) [GCC 9.4.0] on linux
>>> from collections import Counter
>>> Counter() >= Counter()
True
>>>
\$ python3.9
Python 3.9.19 (main, May  2 2024, 15:03:24)
[GCC 9.4.0] on linux
>>> from collections import Counter
>>> Counter() >= Counter()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '>=' not supported between instances of 'Counter' and 'Counter'
``````

If the input lists are always sorted as they are in your example then you can perform the test more cheaply as a subsequence test with an iterator:

``````def is_subsequence(a, b):
seq = iter(b)
return all(i in seq for i in a)

print(is_subsequence([1, 2, 2, 3], [1, 2, 2, 3, 4])) # outputs True
print(is_subsequence([1, 2, 2, 3], [1, 2, 3, 4])) # outputs False
``````
1 Like

I have blogged this, but it depends on what you treat as “found”. I am looking for the elements of the sub-list in their given order, in the main list.