Does "set" use hash table?

Problem: a matrix is presented as a list of lists. Set the whole row/column zero if an element is zero. The problem is not difficult. It can be done by a list or a set. My timer indicates that the list is faster than a set. I remember that the mechanism of searching an element in a set is, like a hashtable, faster than in a list (Correct me if I am wrong). Is my matrix too small?

class Solution:
    def setZeroes(self, matrix: List[List[int]]) -> None:
        """
        Do not return anything, modify matrix in-place instead.
        """
        m = len(matrix)  # number of rows
        n = len(matrix[0]) # number of columns
        row_z = []  # row/column numbers with a zero in the original matrix
        col_z = []
        #row_z = set()
        #col_z = set()
        for i in range(m):
            for j in range(n):
                if matrix[i][j] == 0:
                    row_z += [i]
                    col_z += [j]
                    #row_z.add(i)
                    #col_z.add(j)
        for i in row_z:
            for j in range(n):
                matrix[i][j] = 0
        for i in range(m):
            for j in col_z:
                matrix[i][j] = 0

Yes, sets in CPython use hash tables.

Your code is not searching the set, it is just iterating over the elements of the set one element at a time.

Internally, the list will be an array of values, with no gaps:

row_z = [2, 5, 7, 11, 15]

# like an array of cells [ 2 | 5 | 7 | 11 | 15 | blank | blank | blank ]

The blank cells at the end are so that append() is faster. The interpreter knows that there are only five values in the list, so it stops before hitting the blank cells.

But the set, being a hash table, has potentially many gaps:

row_z = {2, 5, 7, 11, 15}

# like an array of cells
# [ blank | blank | 2     | blank | blank | 11 | blank | blank | 
    blank | 5     | blank | blank | blank | 15 | blank | 7     | 
    blank | blank | blank ]

The exact layout of the array (the hash table) is unpredictable, and when Python tries to iterate over the elements it may have to visit each cell in turn, skipping the blanks.

The details as I have explained it may differ from version to version, or from one interpreter (CPython) to another (PyPy, IronPython, Jython, Stackless, etc). You would need to read the source code of the set class (written in C) to see exactly how it works.

But the essential detail is correct: to iterate over a set potentially takes more work than iterating over a list.

1 Like