An explanation to the bucket approach


  • 1
    B

    I've read about the bucket sort solution in the discuss. However, I find that the explanation to "the maximum gap absolutely exists between buckets" is not so clear to me. Thus I'm trying to give some thoughts from what I understand.

    First of all, let's assume that there are at least 2 number in the array. The smallest is min, biggest is max, and we use range to represent the difference between max and min (range = max - min). We use gapNum to denote the number of gaps in the array. Needless to say that gapNum = num.size() - 1. Thus we've got at least 1 gap. According to whether range could be divided by gapNum without remainder, this problem could be divided into 2 cases.

    (1). range could be completely divided. That is, range % gapNum == 0. In this case, bucketLen = range / gapNum, and bucketNum = gapNum + 1. For example, the input is [1, 101, 201, 301], thus range = 300, gapNum = 3, bucketLen = 100, bucketNum = 4. The 4 buckets will be: [1, 101), [101, 201), [201, 301), [301, 401). You might have noticed in this perfect situation, each number falls into a different bucket, and no numbers will share a bucket. So we could say, the max gap exists between the buckets.

    This is also true when the gap between the numbers become uneven. As min and max are fixed, range and gapNum are both fixed. The current 3 gaps between the numbers are 100, 100, 100. When the middle numbers change, the 3 gaps between will change accordingly. But what's for sure is when some gaps become smaller than 100, some gaps will definitely become larger than 100, since they are bounded by the fixed range sum. For example, 50, 75, 175. or 101, 99, 100. Under this condition, the biggest gap will be bigger than 100. Thus the two numbers constituting this gap cannot fall into the same bucket, since the biggest gap allowed in a single bucket is (bucketLen - 1) = 99. So when the gaps are uneven, the max gap exists between the buckets.

    (2). range could not be completely divided. That is, range % gapNum != 0. For example, 1, 4, 7, 11, 15. In this case, the most even gap arrangement is, some gaps are range / gapNum, while the other gaps are
    range / gapNum + 1. For our case, range = (15 - 1) = 14, gapNum = 4. Thus the most even arrangement is 3 3 4 4, or 3 4 3 4, or 4 3 3 4, and et al. Actually, for the sole sake of finding the max gap, we could choose either range / gapNum or range / gapNum + 1 as the buckeLen.

    Since bucketNum = range / bucketLen + 1, we will have one more bucket when choosing the smaller bucketLen than choosing the bigger bucketLen. For our example, when bucketLen = 3, bucketNum = 14 / 3 + 1 = 5. while when bucketLen = 4, 14 / 4 + 1 = 4. Their buckets are:

    3: [1, 4), [4, 7), [7, 10), [10, 13), [13, 16)

    4:[1, 5), [5, 9), [9, 13), [13, 17)

    It can be seen that in either cases, the max gap will be found between buckets for the most even gap arrangements. The only difference is that the smaller bucketLen will guarantee each number fall in to a different bucket (3 > 3 - 1), while for the bigger bucketLen, those numbers with the small gap might fall into the same bucket (4 - 1 >= 3). But no matter how, we could always find the max gap between the buckets, since 4 > 3 - 1, and 4 > 4 - 1;

    Fixing min and max, when the gaps become uneven, some gaps will be smaller, and some will correspondingly become larger, since their sum is fixed to range. For example, 2 4 4 4, or 1 3 5 5 for our case. When the max gap becomes larger than 4, it will definitely exist between buckets, since maxGap > 4 > 4 - 1;

    But why don't we choose range / gapNum as the bucketLen in our algorithms? The answer is that range / gapNum might be zero for some cases. For example, assume the input array is 1 1 1 1 4. Here range / gapNum = 3 / 4 = 0. But bucketLen should at least be 1 for the solution to work. To deal with this issue, we typically choose bucketLen = range / gapNum + 1 to guarantee the validity of bucketLen. For this exmaple, bucketLen = 1, bucketNum = 4, and the buckets are: [1, 2), [2, 3), [3, 4), [4, 5), two of them will be empty. But for those cases where the smaller one of the two even gaps is not zero, setting bucketLen to range / gapNum also works in the algorithm!


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.