Dot product of two sparse vectors

Hi, I am trying to calculate dot product of two sparse vector

This is my code, without fancy data structure

def dotproduct(nums1, nums2):
    hm1= {} # hashmap for nums1
    n = len(nums1)
    if len(nums2) != n:
        raise NameError('sizes of the two vectors must be equal')

    for i in range(n):
        if nums1[i]!=0:
            hm1[i] = nums1[i]

    res = 0
#    for i in range(n):
#        if nums2[i]!= 0 and i in hm1:
#            res += nums2[i]*hm1[i]
    for i in hm1:
        if nums2[i]!= 0:
            res += nums2[i]*hm1[i]
    return res

and my code works well for a few test cases. I think it meet the algorithmic requirement – I store one of the vectors in a hashmap.

#nums1 = [1,0,0,2,3]
#nums2 = [0,3,0,4,0]
#nums1 = [0,1,0,0,0]
#nums2 = [0,0,0,0,2]
nums1 = [0,1,0,0,2,0,0]
nums2 = [1,0,0,0,3,0,4]
#nums1 = [0,1,0,0,2,0,0]
#nums2 = [1,0,0,0,3,0,4,2]
dotproduct(nums1, nums2)

Am I doing ok if I am asked this question at an interview?
Does the fancy data structure do me any good?
Thanks!

By John M via Discussions on Python.org at 21Aug2022 04:31:

Hi, I am trying to calculate dot product of two sparse
vector

This is my code, without fancy data structure

def dotproduct(nums1, nums2):
   hm1= {} # hashmap for nums1
   n = len(nums1)
   if len(nums2) != n:
       raise NameError('sizes of the two vectors must be equal')

   for i in range(n):
       if nums1[i]!=0:
           hm1[i] = nums1[i]

   res = 0
#    for i in range(n):
#        if nums2[i]!= 0 and i in hm1:
#            res += nums2[i]*hm1[i]
   for i in hm1:
       if nums2[i]!= 0:
           res += nums2[i]*hm1[i]
   return res

and my code works well for a few test cases. I think it meet the algorithmic requirement – I store one of the vectors in a hashmap.

#nums1 = [1,0,0,2,3]
#nums2 = [0,3,0,4,0]
#nums1 = [0,1,0,0,0]
#nums2 = [0,0,0,0,2]
nums1 = [0,1,0,0,2,0,0]
nums2 = [1,0,0,0,3,0,4]
#nums1 = [0,1,0,0,2,0,0]
#nums2 = [1,0,0,0,3,0,4,2]
dotproduct(nums1, nums2)

Am I doing ok if I am asked this question at an interview?
Does the fancy data structure do me any good?

Looks ok. But since the dot paroduct is the sum of the pairwise products
you can do this in one pass with the zip() builtin function (with
sum and map). That would obviate any need to store the nonzero
values from nums1.

Using a hash map has 2 costs:

  • memory, of course
  • runtime: it will be roughly O(n*log(n)) to construct, for all that it
    has O(1) cost to look up after construction

A zip based approach will be much shorter and have O(n) total runtime.
And no memory cost.

Um, where n is the length of the vectors of course.

Write the “on paper” form of the dot product, and see how that maps to
using zip. Come back with what you get.

Cheers,
Cameron Simpson cs@cskk.id.au

1 Like

By Cameron Simpson via Discussions on Python.org at 21Aug2022 21:47:

By John M via Discussions on Python.org at 21Aug2022 04:31:

Hi, I am trying to calculate dot product of two sparse
vector

I was thinking about this again while making coffee.

Hmm. That question requires a premium subscription to see the
description. Care to copy/paste the text?

This is my code, without fancy data structure

def dotproduct(nums1, nums2):
   hm1= {} # hashmap for nums1
   n = len(nums1)

[…]

nums1 = [0,1,0,0,2,0,0]
nums2 = [1,0,0,0,3,0,4]

It struck me that a sparse vector usually is not stored as a list.
Typically it would be a mapping such as a dict. Otherwise the vector
takes O(n) storage. A sparse vector would be expected to have many fewer
nonzero elements than n, so a mapping should be more compact.

Given that, the target cost of the computation should be aiming for
O(min(n)) for each n being the length of each vector.

Your function expects 2 lists, and converts one to a mapping for
comparison.

Think about how you might approach the problem if the vectors were
mappings, eg:

nums1 = {1: 9, 13: 12, 101: 33, 1019: 99}
nums1 = {3: 9, 23: 12, 101: 33, 1029: 99}

Ignore the values and think about the keys. How would you compute the
product now?

There are several things you could try; I’m leaning to a set
intersection myself.

But consider: in an interview:

  • you’ll be told if these are lists or mappings, or if you need to
    decide that yourself
  • you can get some feedback from the interviewers as you go if you’re
    really in an interview versus being stuck in a room doing some kind of
    quiz
  • in an interview, time is constrained - you do not need to get the
    optimum solution, just one that works (or would work aside from
    trivial mistakes)

Personally, to my mind to point of the coding exercise (usually on a
whiteboard) is for the interviewers to see you think. And what questions
you ask about the ambiguities in the question. And also to a degree,
your actual language familiarity is, if the language it very important
(often it is not - I’ve been to interviews where they say “use whatever
language you’re comfortable in”).

Cheers,
Cameron Simpson cs@cskk.id.au

1 Like