# Unique subarray to delete subarray with same elements

hi all,

i Have an array with many subarray as

a = [[1,2],[3,4][2,1]]

I would obtain a = [[1,2],[3,4]]

because [1,2] and [2,1] contain same elements

can someone suggest me a function to obtain that?

You will have to write your own function to do this. Actually you will need at least two functions:

• one function to test whether two lists have the same elements in any order;
• a second function to go through all of the sub lists and extract the ones which donâ€™t match using the first function.

The first function is the hard one. But once you have that, the second is easy:

``````def match(a, b):
"""Return True if a and b have the same elements in any order."""
...

def extract_unique(alist):
new = []
for sublist in alist:
if not any(match(sublist, L) for L in new):
new.append(sublist)
return new
``````

Normally when you want to remove duplicates from a list you use `list(set(mylist))`. However, in this case the list elements are themselves lists, and thus unhashable, which means the list cannot be turned into a set.

However, you can use pickle to serialize the elements, and then use `set` to remove duplicates:

``````import pickle
a = [[1, 2], [3, 4], [2, 1]]
a_sets = (set(sublist) for sublist in a)  # Sets are unordered, so set([1, 2]) == set([2, 1])
a_serialized = {pickle.dumps(s) for s in a_sets}  # Serialize and remove duplicates via set comprehension
a_uniques = [list(pickle.loads(p)) for p in a_serialized]  # Deserialize and turn elements back into lists
print(a_uniques)
``````
``````[[1, 2], [3, 4]]
``````

For purpose of clarity: are `[1, 2]` and `[1, 2, 1]` considered to â€ścontain same elementsâ€ť?

1 Like

Good point! If the number of elements in the sublists does matter (but their order does not), we can use collections.Counter instead of set in the second step:

``````import pickle
from collections import Counter
a = [[1, 2], [3, 4], [2, 1], [1, 2, 1]]
a_sorted = (sorted(sublist) for sublist in a)  # Need to sort first because Counter([1, 2]) is not equivalent to Counter([2, 1])
a_counts = [Counter(sublist) for sublist in a_sorted]
a_serialized = {pickle.dumps(s) for s in a_counts}  # Serialize and remove duplicates via set comprehension
a_uniques = [list(pickle.loads(p).elements()) for p in a_serialized]  # Deserialize and turn elements back into lists
print(a_uniques)
``````

You are assuming that the elements inside the lists can be pickled.

Weâ€™re also assuming that we donâ€™t have to preserve the order of the remaining elements.

Pickling is a very clever way of hashing (some) unhashble objects, but even when it works, whatâ€™s the performance hit of pickling these elements?

Way back in the Python 2.2 days, the Python Cookbook came up with seven different approaches for this sort of problem: