[bucket sort] JAVA solution with explanation, O(N) time and space


  • 208
    Z

    Suppose there are N elements in the array, the min value is min and the max value is max. Then the maximum gap will be no smaller than ceiling[(max - min ) / (N - 1)].

    Let gap = ceiling[(max - min ) / (N - 1)]. We divide all numbers in the array into n-1 buckets, where k-th bucket contains all numbers in [min + (k-1)gap, min + k*gap). Since there are n-2 numbers that are not equal min or max and there are n-1 buckets, at least one of the buckets are empty. We only need to store the largest number and the smallest number in each bucket.

    After we put all the numbers into the buckets. We can scan the buckets sequentially and get the max gap.
    my blog for this problem

    public class Solution {
    public int maximumGap(int[] num) {
        if (num == null || num.length < 2)
            return 0;
        // get the max and min value of the array
        int min = num[0];
        int max = num[0];
        for (int i:num) {
            min = Math.min(min, i);
            max = Math.max(max, i);
        }
        // the minimum possibale gap, ceiling of the integer division
        int gap = (int)Math.ceil((double)(max - min)/(num.length - 1));
        int[] bucketsMIN = new int[num.length - 1]; // store the min value in that bucket
        int[] bucketsMAX = new int[num.length - 1]; // store the max value in that bucket
        Arrays.fill(bucketsMIN, Integer.MAX_VALUE);
        Arrays.fill(bucketsMAX, Integer.MIN_VALUE);
        // put numbers into buckets
        for (int i:num) {
            if (i == min || i == max)
                continue;
            int idx = (i - min) / gap; // index of the right position in the buckets
            bucketsMIN[idx] = Math.min(i, bucketsMIN[idx]);
            bucketsMAX[idx] = Math.max(i, bucketsMAX[idx]);
        }
        // scan the buckets for the max gap
        int maxGap = Integer.MIN_VALUE;
        int previous = min;
        for (int i = 0; i < num.length - 1; i++) {
            if (bucketsMIN[i] == Integer.MAX_VALUE && bucketsMAX[i] == Integer.MIN_VALUE)
                // empty bucket
                continue;
            // min value minus the previous value is the current gap
            maxGap = Math.max(maxGap, bucketsMIN[i] - previous);
            // update previous bucket value
            previous = bucketsMAX[i];
        }
        maxGap = Math.max(maxGap, max - previous); // updata the final max value gap
        return maxGap;
    }
    

    }


  • 0
    G

    Please correct me if I'm wrong. But why "at least one of the buckets are empty"? Suppose the array is {1,2,3}. The 1st bucket is [1,2) and contains {1}. The 2nd bucket is [2,3] and contains {2,3}. Thus none of the bucket is empty.


  • 4
    Z

    Thanks for asking this question.

    In this case, 1 and 3 are the min and max. When we put number into the buckets, we'll skip the min and max.

    Please see the code blow:

    if (i == min || i == max)
    continue;


  • 0
    G

    Oh I missed that, thanks for your explanation. I should have read your code more carefully.


  • 4
    B
    class Solution:
    # @param num, a list of integer
    # @return an integer
    def maximumGap(self, num):
    	if len(num) < 2:
    		return 0
    	imin = imax = num[0]
    	for i in num:
    		imin = min(imin,i)
    		imax = max(imax,i)
    	gap = int( math.ceil( float(imax - imin)/(len(num)-1) ) )
    	# actually needed bucket numbers, reduce useless bucket
    	bucketNum = int( math.ceil(float(imax - imin)/gap ) )
    	bucketMin = [2147483647]* bucketNum
    	bucketMax = [0]* bucketNum
    	
    	for i in num:
    		if i == imin or i == imax:
    			continue
    		idx = (i - imin)/ gap
    		bucketMin[idx] = min(bucketMin[idx], i)
    		bucketMax[idx] = max(bucketMax[idx], i)
    	maxgap = 0
    	# consider min
    	previous = imin
    	for i in range(bucketNum):
    		if bucketMin[i] == 2147483647 and bucketMax[i] == 0:
    			#empty bucket
    			continue
    		maxgap = max(maxgap, bucketMin[i] - previous)
    		previous = bucketMax[i]
    	#consider max
    	maxgap = max(maxgap, imax - previous)
    	return maxgap
    

    Thanks for your inspiration. I rewrite it in python and make a minor change about the actually needed bucket number
    bucketNum = int( math.ceil(float(imax - imin)/gap ) )

    instead of num.length - 1 to save space


  • 1
    Z

    Thanks for your code!
    My buckets size of (num.length - 1) is really a waste of space.


  • 0
    G

    Nice improvement


  • 3
    G

    Thanks for your inspiration too and I have rewritten your code into C++ version.
    Just as @bowoshibo said, the bucket number is actually (max-min)/gap, and I take that point too.
    And thanks to @poker2008 too, I remove the ceiling function.

    class Solution {
    public:
        int maximumGap(vector<int> &num) {
            if(num.empty() || num.size() < 2)
                return 0;
            int maxNum = *max_element(num.begin(), num.end());
            int minNum = *min_element(num.begin(), num.end());
            //average gap from minNum to maxNum.
            int gap = (maxNum - minNum - 1) / (num.size() - 1) + 1;
            //number of buckets
            int bucketNum = (maxNum-minNum)/gap;
            vector<int> bucketsMin(bucketNum, INT_MAX);
            vector<int> bucketsMax(bucketNum, INT_MIN);
            //put into buckets
            for(int i = 0; i < num.size(); i ++)
            {
                if(num[i] != maxNum && num[i] != minNum)
                {
                    int buckInd = (num[i]-minNum)/gap;
                    bucketsMin[buckInd] = min(bucketsMin[buckInd], num[i]);
                    bucketsMax[buckInd] = max(bucketsMax[buckInd], num[i]);
                }
            }
            int maxGap = INT_MIN;
            int previous = minNum;
            for(int i = 0; i < bucketNum; i ++)
            {
                if(bucketsMin[i] == INT_MAX && bucketsMax[i] == INT_MIN)
                    continue;   //empty
                //i_th gap is minvalue in i+1_th bucket minus maxvalue in i_th bucket 
                maxGap = max(maxGap, bucketsMin[i]-previous);
                previous = bucketsMax[i];
            }
            maxGap = max(maxGap, maxNum-previous);
            return maxGap;
        }
    };

  • 0

    Why do you exclude min and max ? Can't you just use n+1 buckets ?


  • 1
    Z

    yes, you can


  • 0
    P

    do not need the ceiling function, try gap = (max - min - 1) / (num.length - 1) + 1


  • 0
    N

    Your solution will get a Runtime Error


  • 2
    M

    You should apply the trick to both of two statements, so the second statement should be:

    int bucketNum =(maxNum-minNum-1)/gap+1;


  • 38
    T

    Thanks for the nice solution. for people reading this, maybe this would help understanding how it works: basically we argue that the largest gap can not be smaller than (max-min)/(N-1), so if we make the buckets smaller than this number, any gaps within the same bucket is not the amount we are looking for, so we are safe to look only for the inter-bucket gaps.

    so making the buckets smaller doesn't affect the correctness. for safety we might just as well use (max-min)/2N


  • 1
    J

    why the gap will not exist in the same bucket? just compare bucketsMIN[i] and bucketsMAX[i]?


  • 0

    Hi, @jason25. The gap may also exist in the same bucket. However, the above code has handled it by excluding min and max.


  • 0

    Your code has a runtime error. Have you got it accepted?


  • 8

    Hi, @zkfairytale. I implement the suggested solution in C++, which has basically the same idea as yours. The major difference in implementation is that I do not leave min (l in my code) and max (u in my code) alone.

    class Solution {
    public:
        int maximumGap(vector<int>& nums) {
            int n = nums.size();
            if (n < 2) return 0;
            auto lu = minmax_element(nums.begin(), nums.end());
            int l = *lu.first, u = *lu.second;
            int gap = max((u - l) / (n - 1), 1);
            int m = (u - l) / gap + 1;
            vector<vector<int>> buckets(m, {INT_MAX, INT_MIN});
            for (int num : nums) {
                int k = (num - l) / gap;
                if (num < buckets[k][0]) buckets[k][0] = num;
                if (num > buckets[k][1]) buckets[k][1] = num;
            }
            int i = 0, j;
            gap = buckets[0][1] - buckets[0][0];
            while (i < m) {
                j = i + 1;
                while (j < m && buckets[j][0] == INT_MAX && buckets[j][1] == INT_MIN)
                    j++;
                if (j == m) break;
                gap = max(gap, buckets[j][0] - buckets[i][1]);
                i = j;
            }
            return gap;
        } 
    };

  • 0
    P

    It would be better if you indicated what "gap" was in "(max-min)/gap" because some people would read all posts and replies above so they wouldn't your assumptions.


  • 0
    H
    This post is deleted!

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.