Java 12 ms true O(n) time / O(log n) space using BFPRT Median of Medians


  • 0

    First of all, I must admit that I couldn't solve it completely myself. I quickly figured out I need the median, I even did the BFPRT thing right, but I got completely confused by this wiggle indexing. I tried to put elements in positions 0, 2, 4, ... in ascending order instead of 1, 3, 5... in descending order, and this broke completely even for the simplest cases such as {4, 5, 5, 6}. So I thought the idea was completely wrong and had to look up solutions. I was really surprised to see that this exact idea is the right one except for the indexing part.

    public void wiggleSort(int[] nums) {
        if (nums.length <= 1) {
            return;
        }
        int p = bfprtSelect(nums, 0, nums.length, (nums.length - 1) / 2 + 1);
        // Reverse Dutch National Flag with Wiggle Indexing (StefanPochmann's Virtual Indexing).
        // Thanks to apolloydy for reversing this thing.
        final int n = nums.length;
        int m = 0, r = nums.length - 1;
        int lw = 1, mw = 1, rw = (1 + 2 * (nums.length - 1)) % (n | 1);
        while (m <= r) {
            if (nums[mw] > p) {
                int tmp = nums[mw];
                nums[mw] = nums[lw];
                nums[lw] = tmp;
                mw = (mw + 2) % (n | 1);
                ++m;
                lw = (lw + 2) % (n | 1);
            } else if (nums[mw] < p) {
                int tmp = nums[mw];
                nums[mw] = nums[rw];
                nums[rw] = tmp;
                rw = (rw - 2 + (n | 1)) % (n | 1);
                --r;
            } else {
                mw = (mw + 2) % (n | 1);
                ++m;
            }
        }
    }
    
    /**
     * Finds the kth order statistic in linear time. Thanks to Blum, Floyd, Pratt, Rivest and Tarjan,
     * hence this funny name BFPRT. Or just the Median of Medians, but really it can find more
     * than just medians. And anyway, "bfprtSelect" is shorter than "medianOfMediansSelect".
     * 
     * @param nums the input array
     * @param start the start of the input array
     * @param length the length of the input array
     * @param k the number of order statistic to find, {@code 1 <= k <= length}
     * @return the kth order statistic
     */
    private static int bfprtSelect(int[] nums, int start, int length, int k) {
        if (length <= 0) {
            throw new IllegalArgumentException("length=" + length);
        }
        if (k < 1 || k > length) {
            throw new IllegalArgumentException("length=" + length + " k=" + k);
        }
        if (length == 1) {
            return nums[start];
        }
        int gc = findGroupMedians(nums, start, length);
        int p = bfprtSelect(nums, start, gc, (gc - 1) / 2 + 1);
        int[] dnf = dnfPartition(nums, start, length, p);
        if (start + k - 1 >= dnf[0] && start + k - 1 < dnf[1]) {
            return p; // Got lucky, the pivot is the kth order statistic.
        } else if (start + k - 1 < dnf[0]) {
            return bfprtSelect(nums, start, dnf[0] - start, k);
        } else {
            return bfprtSelect(nums, dnf[1], length - (dnf[1] - start), k - (dnf[1] - start));
        }
    }
    
    private static int findGroupMedians(int[] nums, int start, int length) {
        final int gc = (length + 4) / 5; // group count (5 elements in each group)
        for (int g = 0; g < gc; ++g) {
            final int gs = start + 5 * g; // group start
            final int ge = start + Math.min(5 * (g + 1), length); // group end
            // Let's do some insertion sort to find the median!
            for (int i = gs, j = gs; i < ge - 1; j = ++i) {
                int v = nums[i + 1];
                while (j >= gs && nums[j] > v) {
                    nums[j + 1] = nums[j--]; // LHS evaluated first, so j-- is safe
                }
                nums[j + 1] = v;
            }
            // Move the medians to the beginning of the array to recurse on them later.
            final int mid = (gs + ge) / 2;
            int tmp = nums[start + g];
            nums[start + g] = nums[mid];
            nums[mid] = tmp;
        }
        return gc;
    }
    
    private static int[] dnfPartition(int[] nums, int start, int length, int p) {
        int l = start, m = start, r = start + length - 1;
        while (m <= r) {
            if (nums[m] < p) {
                int tmp = nums[m];
                nums[m++] = nums[l];
                nums[l++] = tmp;
            } else if (nums[m] > p) {
                int tmp = nums[m];
                nums[m] = nums[r];
                nums[r--] = tmp;
            } else {
                ++m;
            }
        }
        return new int[] {l, m};
    }
    

    As you can see, the algorithm is pretty straightforward, but it runs in 12 ms, while O(n log n) ones run in 7 ms. Indeed, I tried to compare it with a simple sorting solution and the sorting one runs approximately 3-4 times faster even on arrays of 500 million elements (although I had to increase the heap size for the sorting one). I guess that's why quicksort never actually uses Median of Medians even though it's linear.

    Update

    Further testing shows that the Arrays.sort approach can be much slower depending on how the data in the input array is sorted (guess that's branch prediction plays its part). In particular, for 500 million elements the sorting approach yields about 3 seconds time on an already-sorted array and about 50 seconds on a previously wiggle-sorted array (guess that's the worst case for branch prediction). The median of medians approach runs in 22 and 12 seconds for the same inputs, so it is actually faster for the already wiggle-sorted array.

    At first I thought it's O(1) space, but as was pointed out to me in the comment, it's really O(log n) because recursion uses stack space. I have no idea how to achieve true O(1) space if it is possible at all. Well, at least it's in-place.


  • 0

    Sadly it's only O(log n) space, not O(1), due to the recursion data.

    (Originally I also thought MoM can be implemented with O(1) space because Wikipedia said so, but that turned out to probably be a mistake).


  • 0

    Oh, I didn't think of that. Does it mean that there is no O(n)/O(1) solution at all?


  • 0

    Looks like it's possible, though I didn't take the time to really look into it yet. Have a look at the references posted by Yu-Han Lyu on Stack Overflow.


  • 0

    What I did see looked pretty complicated, though, that much I can tell.


  • 0

    Two of those cost money to buy, but the remaining one looks promising. Thanks.


  • 0
    T
    This post is deleted!

  • 0
    T

    BFPRT is a tail recursion. There is no need to store data in stack. Change it to a while loop and then it's O(1).


  • 0

    No, it's not. It uses an additional recursive call to find the median of medians itself. Look closely.


  • 0
    T

    Sorry, I didn't notice. Then it's only possible to write averagely O(n)/O(1) algorithm.


  • 0

    Using BFPRT/MoM? How would you get average O(1) space then?


  • 0
    T

    It's BFPRT/MoM with random pivot selection (i.e. use a random number instead of MoM). Since BFPRT is a variant of qsort, and it only cares at most one side of the pivot in its recursion, its estimated time complexity is O(n) while space complexity is O(1).

    Its time complexity can degrade to O(n^2), so it's not as good as BFPRT.

    This is introduced in Introduction to Algorithms.


  • 0

    Would it be any better than simple quickselect, though?


  • 0

    "It's BFPRT/MoM with random pivot selection"

    In other words, it's not BFPRT/MoM at all.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.