Suppose we have very large sparse vectors, which contains a lot of zeros and double .
- find a data structure to store them
- get the dot product of them
@nullpointerP Welcome to the new Discuss!
Although your description mentions it's dot product, matrix multiplication is just a series of dot products. I think it's very similar problems.
Could you please confirm?
@1337c0d3r hello, I think they are slightly different.
In this case, we first have to store the sparse vector using hash map.
for example [3,0,0,5,6] -> (0,3) (3,5) (4,6) The key is each element's position and the value is the number.
Then we have two hash tables, and we have to iterate through them to calculate the dot product
@1337c0d3r If you think my thought is wrong, please feel free to point out.
@nullpointerP I think that your toughts are in the right direction. Can you implement them? Thank you
@nullpointerP Why not store them in just a list of tuples for each vector? And then iterate through both list using two pointers.
@nullpointerP Ohh I see. Updating the data structure will be problematic with a list. Hash table provides O(1) update.
@nullpointerP Are you required to modify the sparse vector frequently? Or just given two sparse vectors, compute their dot product once and return the answer?
@1337c0d3r Yes, in terms of updating frequently, list is not a good idea. But the interviewer didn't mention that. So I think list with tuples is ok.
We represent the vectors as Hash Tables. We sort the them based on the tables sizes (o(nlogn) - where n is the number of tables).
we pick the smallest table, and traverse it. for each of its elements we look for it in all the tables. If it doesn't exist in one of them, we skip and move to the next element. Otherwise we multiply them and sum them up. run time: o(E*N) where E is the number of elements in the smallest table.
@nullpointerP If read/load array is costly, we can do this:
- read the first array and keep a linked list (or hash map) for non-zero entries. Each linked list node stores the index of the element, the value of the element, and a pointer to next node.
- Then read the second array, now when we do not need to read those entries, whose index are not recorded in previous operation. Instead, we only read indexes that have appeared before. This can be achieved by calculating the offset of the entry in an array, because array should be stored in continuous addresses in the disk. Therefore, we only care about the intersection of two arrays. For the intersecting entries, we do multiplication: node->val *= arr[node->index]. For those nodes with arr[node->index] == 0, we just delete the node in O(1) time.
- By doing this for all arrays, the linked list nodes are intersection of all arrays, and the values are dot-product of all arrays.
- Finally, we traverse the list again and sum the the values up.
- The time complexity should be O(E + K*N), where E is the size of an array, K is the number of non-zero elements in an array, and N is the number of arrays.
Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.