Alternate to Boyer-Moore (similar to radix-sort)


  • 0
    D

    The Boyer-Moore is certainly better than this, but here is an alternate solution that satisfies the requirements of linear time and constant space, and is what I came up with before looking at the answers.

    (Terminology: we'll call 16-bits a word, such that a 32-bit int has an upper word and a lower word).

    The high-level is you can keep a histogram of the upper words and a histogram of the lower words . There are 2^16=65536 values a single word can have so you can keep this histogram in a pair of uint[65536] arrays; constant (granted, big) space.

    You fill these histograms in a single iteration though the input array: linear time.

    You then iterate through the histograms and retain the buckets with count > len/3: constant (granted, long) time.

    • You can do this because if some number is found N times, then its upper half will be found at least N times, and its lower half will be found at least N times; here n==len/3.
    • You know there will be at most 2 buckets retained from each histogram, as it's impossible to have 3 buckets where count > len/3.

    You then go through the various combinations of lower and upper words (at most 2*2=4 remaining combinations will exist), recombine them into an int, find the count of each of those ints in the original array: (at most 4 trips through the array, so linear time).

    Corollary: You could simplify it by simply keeping a single histogram of all 2^32 options and doing one iteration, though that will overflow memory. Or you could reduce memory further by keeping four histograms, one for each byte.

    public IList<int> MajorityElement(int[] nums1) {
        var nums = nums1.Select(n => (uint)n).ToArray(); //CLR bit shift fails on -1
        var minVal = (uint)nums.Length / 3;
    
        // iterate through nums, filling out histograms
        var histUpper = new uint[65536];
        var histLower = new uint[65536];
        foreach (var num in nums) {
            var lower = num & 0xFFFF;
            var upper = (num >> 16) & 0xFFFF;
            ++histLower[lower];
            ++histUpper[upper];
        }
    
        // retain only the words that appear at least len/3 times (there will be at most two left)
        var candidateUppers = GetCandidates(histUpper, minVal);
        var candidateLowers = GetCandidates(histLower, minVal);
    
        // reconstruct the full uints again and count each of them in the original list
        // this will be at most 4 iterations through the list.
        var output = new List<int>();
        foreach (var upper in candidateUppers) {
            foreach (var lower in candidateLowers) {
                var i = (upper << 16) | lower;
                var count = nums.Count(num => num == i);
                if (count > minVal) output.Add((int)i);
            }
        }
        return output;
    }
    
    public IList<uint> GetCandidates(uint[] histogram, uint minVal) {
        var output = new List<uint>();
        for (uint i = 0; i < histogram.Length; ++i) {
            if (histogram[i] > minVal) {
                output.Add(i);
            }
        }
        return output;
    
    

    Note technically this is more correct than Boyer-Moore, as Boyer-Moore's counter requires log(n) bits. In reality though a 128-bit counter will be sufficient for anything.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.