May I implement the hashtable by a simple list?

john316 · February 25, 2022, 11:59am

This is the famous two sum question, LeetCode question #1
I rewrite one of the official solutions without “class Solution” and it works well

def twoSum2(nums, target):
    hashmap = {}
    for i in range(len(nums)):
        complement = target - nums[i]
        if complement in hashmap:
            return [i, hashmap[complement]]
        hashmap[nums[i]] = i

nums1 = [2,7,11,15]
target1 = 9
sol2 = twoSum2(nums1, target1)
print(sol2)

Output from PyCharm

[1,0]

I was able to implement the same idea via list rather than a dictionary (hashmap). It seems to me that the idea is the same and only the data structures are different. Will I be able to reach the same time/space complexity with a list rather than a dictionary (hashmap)?

def twoSum(nums, target):
    compl = []  # complement
    n = len(nums)
    for i in range(n):
        if nums[i] in compl:
            return i, compl.index(nums[i])
        compl += [target - nums[i]]  
    return False

nums = [2,7,11,15]
target = 9
print(twoSum(nums, target))

Output from Jupyter Notebook

(1,0)

cameron · February 25, 2022, 10:02pm

This is the famous two sum question, LeetCode question
#1
I rewrite one of the official solutions without “class Solution” and it works well

Good. I think it was remarked in another discussions that the “class
Solution” form is a Javaesque code structure which you do not need here.

Skipping to your list based implementation:

I was able to implement the same idea via list rather than a dictionary
(hashmap). It seems to me that the idea is the same and only the data
structures are different. Will I be able to reach the same time/space
complexity with a list rather than a dictionary (hashmap)?

In short, not simply. in your code:

def twoSum(nums, target):
    compl = []  # complement
    n = len(nums)
    for i in range(n):
        if nums[i] in compl:
            return i, compl.index(nums[i])
        compl += [target - nums[i]]
    return False

This test:

nums[i] in compl

has linear time (O(n)) with the length of the list compl, typically
requiring len(compl)/2 comparisons. When compl is a dictionary it
has constant time (O(1)).

This expression:

compl.index(nums[i])

is also linear time with the length of compl.

Because a dict uses a hash table to store values, lookup will be
constant time regardless of the size of the dict.

Starting point on the big O notation: Big O notation - Wikipedia

For small sizes of your nums list your two programmes will be similar
in performance, but the list based version will rapidly slow down if you
make nums quite long. Try each with 10 values, then again with 100
then 1000 and see the behaviour.

Cheers,
Cameron Simpson cs@cskk.id.au

john316 · February 26, 2022, 1:29am

Thank you Cameron. That’s exactly what I need. However, it’s a mystery and counterintuitive to me how a dictionary, with a hashmap structure could realize an O(1) search. Could you elaborate the mechanism or point me to any easy-to-understand posts?

If one can realize a O(1) search, do we need a O(log n) binary search? After all, any sequence can be converted into a dictionary and a dictionary search requires only O(1).

BowlOfRed · February 26, 2022, 1:59am

Creating the dictionary means visiting every element of the sequence and is therefore O(n). Assuming you have the list, it’s not worth it to look up one element. It might be worth if it you will be doing multiple searches.

cameron · February 26, 2022, 9:10am

He’s accessing it in a loop, so the accesses dominate over the
construction. - Cameron

cameron · February 26, 2022, 9:26am

Thank you Cameron. That’s exactly what I need. However, it’s a mystery
and counterintuitive to me how a dictionary, with a hashmap structure
could realize an O(1) search. Could you elaborate the mechanism or
point me to any easy-to-understand posts?

Well Wikipedia has an article to start with:

If one can realize a O(1) search, do we need a O(log n) binary search? After all, any sequence can be converted into a dictionary and a dictionary search requires only O(1).

As BowlOfRed has pointed out, we haven’t considered the cost to
construct a dictionary. A list you can just append to. A dictionary
requires computing a hash value for each entry and managing those
entries.

So, hash tables.

Effectively a hash table can be implemented as a list of lists. The main
list is indexed by the hash value, and the internal lists contain the
values themselves. The hash value is an index computed from a value; all
values with some hash value h are stored in the list at index h. To
effect a O(1) search, you size the main list so that evenly distributed
hash values typically put only one value in each internal list:

[ [values-with-hash-0,...],
  [values-with-hash-1,...],
  [values-with-hash-2,...],
  .....
]

So, hash functions. They can be any function which produces a number
from a value. A good hash function produces values over a wide range
pretty evenly, and is also pretty cheap. If we’ve got some basic
function hfn(value) producing these values, the index into the main
list would be hfn(value) % len(main_list).

To get the O(1) property you need to size the main list large enough to
hold few values in any internal list. That way you go directly to the
interbal list of values at index h, then do a linear O(n) search of
that list. But because that list is short (ideally 0 or 1 elements) that
is an O(1) search.

As such, when you build a dictionary (whose keys-value pairs are stored
in the hash table, hashed on the key) the main list starts small, but
gets resized every so often as you add more values (otherwise the
internal lists will start to get too long). That has a cost of course.

Normally you’ll be accessing a dictionary quite a lot compared to the
build phase.

Cheers,
Cameron Simpson cs@cskk.id.au

john316 · February 28, 2022, 8:07am

Thank you. It seems to me that the original “two sum” question could not be solved in a significantly lower computational complexity (e.g., O(log n) or O(1)). Since we need to build a dictionary (hash map) first, and the building takes O(n) time, which is the same as my algorithm with a list in order terms. Even if I build a hash map when reading the list and check back, the average case would be O(n).

class Solution:
    def twoSum(self, nums: List[int], target: int) -> List[int]:
        hashmap = {}
        for i in range(len(nums)):
            complement = target - nums[i]
            if complement in hashmap:
                return [i, hashmap[complement]]
            hashmap[nums[i]] = i