Solution to 3rd follow-up question


  • 98
    V

    What if elements of nums2 are stored on disk, and the memory is
    limited such that you cannot load all elements into the memory at
    once?

    • If only nums2 cannot fit in memory, put all elements of nums1 into a HashMap, read chunks of array that fit into the memory, and record the intersections.

    • If both nums1 and nums2 are so huge that neither fit into the memory, sort them individually (external sort), then read 2 elements from each array at a time in memory, record intersections.


  • 0
    X

    Nice thinking, especially the second point.


  • 1
    X

    nice solution!

    external sort is naive but good enough to solve the problem!


  • 52
    L

    Thanks for the solution. I think the second part of the solution is impractical, if you read 2 elements at a time, this procedure will take forever. In principle, we want minimize the number of disk access during the run-time.

    An improvement can be sort them using external sort, read (let's say) 2G of each into memory and then using the 2 pointer technique, then read 2G more from the array that has been exhausted. Repeat this until no more data to read from disk.

    But I am not sure this solution is good enough for an interview setting. Maybe the interviewer is expecting some solution using Map-Reduce paradigm.


  • 4

    @1337c0d3r is the answer provided by @vishalhemnani good enough?


  • 0
    A

    @vishalhemnani Could you please give some idea on how the external sorting works? Are there any specific algorithms that can sort such huge files on hard disk?


  • 7

  • 0

    Can someone explain what does "you cannot load all elements into the memory at once" mean? What am I trying to do if I want to load all elements into the memory at once? Thanks.


  • 1
    Y

    @传说选手 means that the input data is too large to fit in memory.


  • 0

    @yangjin_null Thanks for your explanation.


  • 0
    W

    If you can find a way to sort them, then you could do a modified binary search afterwards for each chunk of memory that each sorted piece resides on.


  • 0
    W

    This wikipedia article outlines the idea behind external merge sort: https://en.wikipedia.org/wiki/External_sorting#External_merge_sort


  • 0
    D

    I think in both situation, using HashMap is OK.
    if nums1 and nums2 are so huge that no one can load into memory, create HashMap for nums1, read chunks of nums1 until exhausted, record the number(key) and its frequency(value). Then read chunks of nums2, deal with hashmap of nums1 to find the intersection.


  • 1
    S

    @dong-fantastic That won't work if elements in array1 are different from each other. Then if you use HashMap, it will exceed memory.


  • 22
    I

    I think the goal of this question is to test whether the interview understands some of the data engineering techniques. From a data engineer's perspective, basically there are three ideas to solve the question:

    1. Store the two strings in distributed system(whether self designed or not), then using MapReduce technique to solve the problem;

    2. Processing the Strings by chunk, which fits the memory, then deal with each chunk of data at a time;

    3. Processing the Strings by streaming, then check.


  • 1
    J

    @ico-Meng The two arrays needs to be sorted (using external sort for example) first and then processed by chunk or by streaming.


  • 0

    @ico-Meng @jack76 Thank you for the info!


  • 0
    F

    Use ArrayList to dynamic increase size

    class Solution {
        public int[] intersect(int[] nums1, int[] nums2) {
            Arrays.sort(nums1);
            Arrays.sort(nums2);
            int i = 0;
            int j = 0;
            List<Integer> res = new ArrayList<>();
            
            while (i < nums1.length && j < nums2.length) {
                if (nums1[i] < nums2[j]) {
                    i++;
                } else if (nums1[i] > nums2[j]) {
                    j++;
                } else {
                    res.add(nums1[i]);
                    i++;
                    j++;
                }
            }
            
            int index = 0;
            int[] ans = new int[res.size()];
            
            for (int num : res) {
                ans[index++] = num;
            }
            
            return ans;
                
        }
    }
    
    

  • 0

    Given the range of the test case is not that massive, I used counting sort instead of HashMap. Beats 96%
    https://discuss.leetcode.com/topic/107558/optimal-2ms-java-solution-beats-96

    public int[] intersect(int[] nums1, int[] nums2) {
        //skip a stupid corner case:)
        if(nums1.length>0&&nums1[0]==Integer.MIN_VALUE) return new int[]{1,Integer.MIN_VALUE};
        
        int max = Integer.MIN_VALUE, min = Integer.MAX_VALUE;
        for (int i = 0; i < nums1.length; i++) {          
            min = Math.min(min, nums1[i]);
            max = Math.max(max, nums1[i]);
        }
        for (int i = 0; i < nums2.length; i++) {
            min = Math.min(min, nums2[i]);
            max = Math.max(max, nums2[i]);
        }
        int[] count = new int[max-min+1];
        for (int i = 0; i < nums1.length; i++) {
            count[nums1[i]-min]++;
        }
         List<Integer> intersect = new ArrayList<Integer>();
         for (int i = 0; i < nums2.length; i++) {
             if(count[nums2[i]-min]>0) {
                 count[nums2[i]-min]--;
                 intersect.add(nums2[i]);
             }
         }     
         return result(intersect);
    }
    public int[] result(List<Integer> intersect){
         int i = 0;
         int[] result = new int[intersect.size()];
         for (int num : intersect) {
             result[i++] = num;
         }
         return result;
    }

  • 0

    @vishalhemnani Its similar to Merge K sorted lists


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.