Unique subarray to delete subarray with same elements

dset · March 3, 2023, 7:29am

hi all,

i Have an array with many subarray as

a = [[1,2],[3,4][2,1]]

I would obtain a = [[1,2],[3,4]]

because [1,2] and [2,1] contain same elements

can someone suggest me a function to obtain that?
many thanks in advance

steven.daprano · March 3, 2023, 8:29am

You will have to write your own function to do this. Actually you will need at least two functions:

one function to test whether two lists have the same elements in any order;
a second function to go through all of the sub lists and extract the ones which don’t match using the first function.

The first function is the hard one. But once you have that, the second is easy:

def match(a, b):
    """Return True if a and b have the same elements in any order."""
    ...

def extract_unique(alist):
    new = []
    for sublist in alist:
        if not any(match(sublist, L) for L in new):
            new.append(sublist)
    return new

abessman · March 3, 2023, 9:07am

Normally when you want to remove duplicates from a list you use list(set(mylist)). However, in this case the list elements are themselves lists, and thus unhashable, which means the list cannot be turned into a set.

However, you can use pickle to serialize the elements, and then use set to remove duplicates:

import pickle
a = [[1, 2], [3, 4], [2, 1]]
a_sets = (set(sublist) for sublist in a)  # Sets are unordered, so set([1, 2]) == set([2, 1])
a_serialized = {pickle.dumps(s) for s in a_sets}  # Serialize and remove duplicates via set comprehension
a_uniques = [list(pickle.loads(p)) for p in a_serialized]  # Deserialize and turn elements back into lists
print(a_uniques)

[[1, 2], [3, 4]]

aivarpaalberg · March 3, 2023, 12:50pm

For purpose of clarity: are [1, 2] and [1, 2, 1] considered to “contain same elements”?

abessman · March 3, 2023, 1:15pm

Good point! If the number of elements in the sublists does matter (but their order does not), we can use collections.Counter instead of set in the second step:

import pickle
from collections import Counter
a = [[1, 2], [3, 4], [2, 1], [1, 2, 1]]
a_sorted = (sorted(sublist) for sublist in a)  # Need to sort first because Counter([1, 2]) is not equivalent to Counter([2, 1])
a_counts = [Counter(sublist) for sublist in a_sorted]
a_serialized = {pickle.dumps(s) for s in a_counts}  # Serialize and remove duplicates via set comprehension
a_uniques = [list(pickle.loads(p).elements()) for p in a_serialized]  # Deserialize and turn elements back into lists
print(a_uniques)

steven.daprano · March 3, 2023, 3:29pm

You are assuming that the elements inside the lists can be pickled.

We’re also assuming that we don’t have to preserve the order of the remaining elements.

Pickling is a very clever way of hashing (some) unhashble objects, but even when it works, what’s the performance hit of pickling these elements?

Way back in the Python 2.2 days, the Python Cookbook came up with seven different approaches for this sort of problem:

Worth reading.

dset · March 3, 2023, 8:04pm

Many thanks to all
I just add that 1,2 is equal to 2, 1 but different to 1,2,1

Topic		Replies	Views
Unique list with sub array in subarray Python Help	4	480	March 20, 2023
How can I remove all the same objects from a list? Python Help	14	740	February 19, 2023
How to efficiently uniquify a very large record of List of Lists in python Python Help help	3	2459	February 10, 2022
Remove elements from a list Python Help	4	772	March 15, 2021
Issue with list clear() function Python Help	2	465	April 22, 2021

Unique subarray to delete subarray with same elements

Related Topics