Rationale:
Currently there’s no straightforward way to remove duplicates while preserving order. Common solutions like list(dict.fromkeys(lst)) or using sets are either non-intuitive or don’t preserve order.
It probably ought to be .deduplicated() (or something else ending in ed) to make it clear that it’s a modified copy instead of an in-place operation.
There’s a question of would there be a set/dict(-like) operation under the hood (forcing the contents to be hashable) or would it use a slower sorting based or brute force implementation to find the items it’s seen already.
I think it’d be reasonable to require hashable objects but allow a key function to make other objects hashable for the purpose of detecting duplicates–if that isn’t possible I don’t think “duplicate” is sufficiently well-defined.
That said, there are already enough ways to do this.
If you want to preserve order, you should replace duplicates with a null sentinel, such as None. Removing an item will affect the order of the remaining items.
Intuitively, I’d have said that removing duplicates just means that there are no two elements left that compare equal. Is there a nice way of writing a hashing key that achieves that for unhashable types with custom __eq__ methods?
I don’t think there’s a generic way, hence the need for a key param. I’d say it depends on however the custom __eq__ is implemented: what about the objects must be equal?
Maybe I’m lacking in imagination, but it feels like if you have some definition of “equal” for two objects, you can write a function to turn those characteristics into a hashable value.
I’d say that for most realistic use cases it is possible, but potentially annoying. If we have dataclass-like objects then we can recursively hash them by hashing the tuple of their attributes. This doesn’t work in general since it breaks if the objects are mutated, but for this purpose we can assume that that won’t happen. Implementing a generic function for this is somewhat difficult since you only want to include those fields in the hash that are relevant to the equality comparison. So you’d probably end up writing a custom one when you use the deduplication method.
In full generality, I don’t think using a key function for this is possible. An arbitrary __eq__ can behave strangely, so we’d need to do n² comparisons to make sure no equal pair exists. But if there is a key function you can just use the normal dict based method, which should run in (amortized) linear time. Not sure if those cases are common enough to justify being concerned about though. Most objects will probably either be hashable or can use a key function.
The index of each item represents the order in the list. The item at index 0 comes before all other items. The index maintains the order even if an item is removed. However, the index is also crucial for a list because it differentiates a list from a mere ordered sequence. Removing an item from the middle of a list effectively creates a new list.
The OP implicitly shows that the proposed method should remove the first duplicate occurrence starting from index 0. Why?
I thought it was clear from the OP that this is what they are asking for: a new list, without duplicates, and unique elements appear in the order of first appearance.
You seem to be making some very specific distinction based on definitions that I’m not following. We can just drop it though.