Finding the most common element in a list

FelixLeg · April 30, 2024, 9:16am

Probably a super easy task, but I can’t get the solution for my case

You see, I have got a list of non-hashable elements, and I want to find an element that appears the most.

At first I thought about the dict() trick:

el_count = dict()
for elem in elements:
    el_count[elem] = 1 + el_count.get(elem, 0)

but as I said the elements are not hashable, so they can’t be set as keys for the el_count dict.

Then I thought about str()-izing them first:

el_count = dict()
for elem in elements:
    el_count[str(elem)] = 1 + el_count.get(str(elem), 0)

but then I got two problems: how to get the result from el_count and what if an element hasn’t got __str__ or __repr__ methods.

Rosuav · April 30, 2024, 9:23am

How do you define “the element”? By identity or equality? Non-hashable elements don’t always have a well-defined concept of equality, but if they do (eg with lists, where equality can vary as the object gets mutated, but it is well-defined at any given point in time), you can use a proxy for the value (in this case, a tuple would serve as a hashable equivalent). Alternatively, if you want to use identity, the id() of the object will be an integer, which is hashable.

Stefan2 · April 30, 2024, 11:32am

If you can’t even rely on __str__, what can you rely on?

Hard if not impossible to help when the data is such a mystery.

c-rob · April 30, 2024, 2:02pm

I’m new to this part. Why can’t the non-hashable elements be used as a dict key? Is each element super long like 5000 bytes or something?

jamestwebber · April 30, 2024, 2:10pm

The keys to a dictionary must be hashable. If they aren’t, the data structure (a hash table) doesn’t work.

FelixLeg · April 30, 2024, 2:35pm

Thank you all for help. With the advice from @Rosuav I found a way to make my data hashable. When I wrote the post I was super tried and haven’t spot an obvious solution.

c-rob · April 30, 2024, 2:38pm

But what does “hashable” mean in terms of a dict in Python?
I’ve never heard that term though I know what a dict data structure is. It’s like a Perl hash.

FelixLeg · April 30, 2024, 2:55pm

When you use a list and want to get a value at an index, then it is super easy for the list to find the value: you just start with the first value location in a computer memory and move n times forward and you got the value.

On the other hand, a structure like dict can’t do that, because a key is not an integer. So, you have to read the memory and test with each element if its key matches the key you want to read. This can be very slow, so one workaround for this is first to group keys into “buckets”. Each “bucket” is labeled with a special value called “hash”. When you do a_dict[a_key] then the dict first calculates the hash of a_key and search for bucket with this hash, and then looks for the value in the bucket with the key.

jamestwebber · April 30, 2024, 3:42pm

Following up on @FelixLeg’s answer, see hashable in the Python glossary.

The key bit is that the hash “never changes during its lifetime”. A list cannot be hashable because it could be modified, and the hash value wouldn’t remain the same^[1].

and there would be no way to compute the original hash once the data had changed ↩︎

kknechtel · May 1, 2024, 1:25am

Yes, and it works broadly the same way. The Perl name “hash” is really an abbreviation of “hash table”, which is the most common implementation for a “dictionary” (where the Python name dict comes from). But the actual action, “to hash”, is the process of turning a key (an arbitrary object) into a hash code that can be looked up in the hash table, via a hash function. This is a many-to-one mapping that transforms arbitrary objects into a simple integer value - in Python the hash code is a 64-bit integer, assuming a 64-bit build; and the hash function is the hash builtin, which delegates to the object’s __hash__ method.

From there, clever algorithmic tricks are used in order to retrieve the object from the table in O(1) (amortized) time:

(As I recall it, Python’s implementation uses “open addressing” and “linear probing” per the terminology on Wikipedia.)

Python refuses to hash certain types of object because it would cause problems to use them as dict keys. In particular, keys should be immutable objects, because of the consequences of mutating them. For example, imagine if we could make a dict like x = []; y = {x: 1}, and then we modified the x list. How should we expect to look up the 1 value in y - using another empty list? Using a list that matches the current x value? Using only the x object itself? Or just what?

Further, if changes to the list impacted the result of __hash__, that would potentially corrupt the dictionary structure. Now the key-value pair is in a different place in the table, versus where it would hash if it were re-inserted. That’s especially a problem when the table gets resized - which happens as needed, not under the user’s direct control.

c-rob · May 1, 2024, 9:19am

Ah, now I understand. I didn’t know multiple key values could be hashed to a single hash value, and thus overlap in a hash table. OP had a list with repeating values thus a dict would not work for them. Is my understanding correct?

I had made the assumption that a hash was the mangled value of the initial value, if the value was unique the hash was unique also.

When we hash a value is that a one-way hash only? Or are there 2-way hashes? I’m comparing this to encryption which is 2-way. With encryption you mangle (encrypt) a value with a key, and unmangle (decrypt) it with a key.

Sorry for the thread drift! But I understand dictionaries better.

kknechtel · May 1, 2024, 9:24am

Even if they had different hash values, they could still map to the same slot. Think about it: there are countless possible objects, but only a few slots available in the table. Every slot costs memory, after all, whether it’s used or not. (Modern Python optimizes this by putting small integer indices into the table, and using those to index a separate array of pointers. But it still costs.)

It is. But just like a cryptographic hash, a hash-table hash can’t avoid collisions perfectly.

There cannot ever be a 2-way hash, because there are more “things you can hash” than there are hash codes. Otherwise, you could compress any file to a few bytes by hashing it, and uncompress it by reversing the hash. This is perhaps the most basic result of information theory.

c-rob · May 1, 2024, 9:32am

OP did you ever find a solution to this? Is this what you want?


import inspect
import os
import re
import sys

# VARIABLES
#####################################################
# Functions
#####################################################
def countit(mydict,mykey):
    r'''If a key exists in a dict, add 1 to the value. Else just add a new key
    with a value of 1.
    '''
    procname = str(inspect.stack()[0][3]) + ':'
    if mykey in mydict: 
        val = mydict[mykey] # For existing value in dict.
        val += 1
        mydict[mykey] = val
    else: 
        mydict[mykey] = 1 # Brand new value in dict.

#####################################################
# Main program

r'''Count how many times an item in a list is in that list.
'''
mylist = ['a', 'b', 'd', 'aa', 'a', 'b', 'd', 'b', 'f', 'b', 'ab', 'b']
mydict = {}
for i in mylist: 
    countit(mydict, i)

# Now show key with greatest count.
topkey = ''
topval = 0
key = ''
val = ''
for key, val in mydict.items():
    if val > topval: 
        topval = val
        topkey = key

print(f"Most common key is {topkey} with a count of {topval}")

Topic		Replies	Views
Coding problem with if and else statement in the code. cannot understand Python Help	5	447	June 2, 2023
How to find common values in a list Python Help	13	348	June 10, 2024
Adding the method find() to list Ideas	9	9521	May 8, 2020
Lists / Sorting a List Python Help help	13	578	June 5, 2022
How to count the number of occurrences of dictionary keys Python Help help	7	3234	January 29, 2022

Finding the most common element in a list

Related Topics