Unique subarray to delete subarray with same elements

hi all,

i Have an array with many subarray as

a = [[1,2],[3,4][2,1]]

I would obtain a = [[1,2],[3,4]]

because [1,2] and [2,1] contain same elements

can someone suggest me a function to obtain that?
many thanks in advance

You will have to write your own function to do this. Actually you will need at least two functions:

  • one function to test whether two lists have the same elements in any order;
  • a second function to go through all of the sub lists and extract the ones which don’t match using the first function.

The first function is the hard one. But once you have that, the second is easy:

def match(a, b):
    """Return True if a and b have the same elements in any order."""
    ...

def extract_unique(alist):
    new = []
    for sublist in alist:
        if not any(match(sublist, L) for L in new):
            new.append(sublist)
    return new

Normally when you want to remove duplicates from a list you use list(set(mylist)). However, in this case the list elements are themselves lists, and thus unhashable, which means the list cannot be turned into a set.

However, you can use pickle to serialize the elements, and then use set to remove duplicates:

import pickle
a = [[1, 2], [3, 4], [2, 1]]
a_sets = (set(sublist) for sublist in a)  # Sets are unordered, so set([1, 2]) == set([2, 1])
a_serialized = {pickle.dumps(s) for s in a_sets}  # Serialize and remove duplicates via set comprehension
a_uniques = [list(pickle.loads(p)) for p in a_serialized]  # Deserialize and turn elements back into lists
print(a_uniques)
[[1, 2], [3, 4]]

For purpose of clarity: are [1, 2] and [1, 2, 1] considered to “contain same elements”?

1 Like

Good point! If the number of elements in the sublists does matter (but their order does not), we can use collections.Counter instead of set in the second step:

import pickle
from collections import Counter
a = [[1, 2], [3, 4], [2, 1], [1, 2, 1]]
a_sorted = (sorted(sublist) for sublist in a)  # Need to sort first because Counter([1, 2]) is not equivalent to Counter([2, 1])
a_counts = [Counter(sublist) for sublist in a_sorted]
a_serialized = {pickle.dumps(s) for s in a_counts}  # Serialize and remove duplicates via set comprehension
a_uniques = [list(pickle.loads(p).elements()) for p in a_serialized]  # Deserialize and turn elements back into lists
print(a_uniques)

You are assuming that the elements inside the lists can be pickled.

We’re also assuming that we don’t have to preserve the order of the remaining elements.

Pickling is a very clever way of hashing (some) unhashble objects, but even when it works, what’s the performance hit of pickling these elements?

Way back in the Python 2.2 days, the Python Cookbook came up with seven different approaches for this sort of problem:

Worth reading.

Many thanks to all
I just add that 1,2 is equal to 2, 1 but different to 1,2,1