Solution explained


  • 293
    J

    This problem is well known and quite often can be found in various text books.

    You can take a couple of approaches to actually solve it:

    • O(N lg N) running time + O(1) memory

    The simplest approach is to sort the entire input array and then access the element by it's index (which is O(1)) operation:


    public int findKthLargest(int[] nums, int k) {
            final int N = nums.length;
            Arrays.sort(nums);
            return nums[N - k];
    }
    

    • O(N lg K) running time + O(K) memory

    Other possibility is to use a min oriented priority queue that will store the K-th largest values. The algorithm iterates over the whole input and maintains the size of priority queue.


    public int findKthLargest(int[] nums, int k) {
    
        final PriorityQueue<Integer> pq = new PriorityQueue<>();
        for(int val : nums) {
            pq.offer(val);
    
            if(pq.size() > k) {
                pq.poll();
            }
        }
        return pq.peek();
    }
    

    • O(N) best case / O(N^2) worst case running time + O(1) memory

    The smart approach for this problem is to use the selection algorithm (based on the partion method - the same one as used in quicksort).


    public int findKthLargest(int[] nums, int k) {
    
            k = nums.length - k;
            int lo = 0;
            int hi = nums.length - 1;
            while (lo < hi) {
                final int j = partition(nums, lo, hi);
                if(j < k) {
                    lo = j + 1;
                } else if (j > k) {
                    hi = j - 1;
                } else {
                    break;
                }
            }
            return nums[k];
        }
    
        private int partition(int[] a, int lo, int hi) {
    
            int i = lo;
            int j = hi + 1;
            while(true) {
                while(i < hi && less(a[++i], a[lo]));
                while(j > lo && less(a[lo], a[--j]));
                if(i >= j) {
                    break;
                }
                exch(a, i, j);
            }
            exch(a, lo, j);
            return j;
        }
    
        private void exch(int[] a, int i, int j) {
            final int tmp = a[i];
            a[i] = a[j];
            a[j] = tmp;
        }
    
        private boolean less(int v, int w) {
            return v < w;
        }
    

    O(N) guaranteed running time + O(1) space

    So how can we improve the above solution and make it O(N) guaranteed? The answer is quite simple, we can randomize the input, so that even when the worst case input would be provided the algorithm wouldn't be affected. So all what it is needed to be done is to shuffle the input.


    public int findKthLargest(int[] nums, int k) {
    
            shuffle(nums);
            k = nums.length - k;
            int lo = 0;
            int hi = nums.length - 1;
            while (lo < hi) {
                final int j = partition(nums, lo, hi);
                if(j < k) {
                    lo = j + 1;
                } else if (j > k) {
                    hi = j - 1;
                } else {
                    break;
                }
            }
            return nums[k];
        }
    
    private void shuffle(int a[]) {
    
            final Random random = new Random();
            for(int ind = 1; ind < a.length; ind++) {
                final int r = random.nextInt(ind + 1);
                exch(a, ind, r);
            }
        }
    

    There is also worth mentioning the Blum-Floyd-Pratt-Rivest-Tarjan algorithm that has a guaranteed O(N) running time.


  • 61
    O

    The last ALG has average O(N) time, but worst case is O(N^2).


  • 1
    D

    why do you need "exch(a, lo, j);" here? thanks!


  • 2
    J

    Yes you are completly correct, but you have in the same time a probabilistic guarantees that this is very unlikely to happen.


  • 0
    J

    This operation switches the pivot element with the last element from <lo, i - 1> that is lower then pivot.

    You could also do:
    exch(a, lo, i - 1)

    So in other words: it puts it in place in the array.


  • 1
    A

    Nice explanation! I think the 2nd solution is to store the 1st to K-th "largest" values, not the smallest ones


  • 0
    J

    Right, thanks!


  • 0
    D

    Hi ,Could you please tell me that why the time complexity for min-heap solution, is O(NlgN), I think it is O(N)


  • 4
    J

    This is due the fact that the PriorityQueue is implemented as a Binary Heap, which in fact is nothing more then complete binary tree. So both inserting and removing the values through offer() and poll() methods have O(lg K) complexity and altogether since you doing this operation N times the total complexity is O(N lg K)


  • 0
    E

    why "final" is used in "final int r = random.nextInt(ind + 1);" I don't understand


  • 0
    S

    how about in the scenario where there are duplicated numbers in the array? there is not one-to-one relationship between the array index and the kth largest?


  • 0
    J

    That's a bit different problem, but shouldn't be much more complicated in general what you want to do is to preprocess the input and find all unique values: if your input has some reasonable upper bound you can use array and counting, otherwise a HashSet would be reasonable alternative. After finding all unique values you can use one of the above solutions.


  • 0
    S

    Yes, I initially think using a variable to keep each pass in a bubble sorting algorithm. The Kth pass actually is moving the Kth largest number to the end of the array. Retrieving the number after Kth pass, we get the answer. But that will be O(N^2)


  • 8
    S

    I have a question for the O(n) solution. From my understanding, this algorithm is based on the quick sort. If there's a lot of duplication in the array, it will lower the performance. For instance, if there's only one number in the array, we only can eliminate one number in a recursion. Even randomize the partition can not solve that problem.

    So my method to improve that algorithm is to split the array into three parts, which are <, = , > here's my code. Please tell me your opinion.

    public class Solution {
    	    public int findKthLargest(int[] nums, int k) {
    	        return findK(nums,nums.length - k,0,nums.length-1);
    	    }
    	    
    	    private int findK(int[] nums,int k, int start, int end){
    	        int parti = nums[start],i=start,m=start;
    	        for(int j=start+1;j<=end;j++){
    	            if(nums[j]>parti)
    	                continue;
    	            if(nums[j]<=parti){
    	                swap(nums,++i,j);
    	                if(nums[j] != parti)
    	                    swap(nums,m++,i);
    	            }
    	        }
    	        if(k>=m && k<=i)
    	            return nums[k];
    	        else if(k < m)
    	            return findK(nums,k,start,m-1);
    	        else 
    	            return findK(nums,k,i+1,end);
    	    }
    	    
    	    private void swap(int[] nums, int a, int b){
    	        int temp = nums[a];
    	        nums[a] = nums[b];
    	        nums[b] = temp;
    	    }
    	}

  • 0
    C

  • 1
    C

    Hey guys, I try to proof O(N) algorithm in worst time.

    First see the introduction of algorithm

    Below are the steps for implementing algorithm which runs in O(n).

    1.If n is small, for example n<6, just sort and return the k the smallest number.( Bound time- 7)

    2.If n>5, then partition the numbers into groups of 5.(Bound time n/5)

    3.Sort the numbers within each group. Select the middle elements (the medians). (Bound time- 7n/5)

    4.Call your "Selection" routine recursively to find the median of n/5 medians and call it m. (Bound time-Tn/5)

    SELECTION ROUTINE

    If k=r, then return m

    If k is less than r, then return k th smallest of the set L .(Bound
    time T7n/10)

    If k is greater than r, then return k-r th smallest of the set R.

    5.Compare all n-1 elements with the median of medians m and determine the sets L and R, where L contains all elements m. Clearly, the rank
    of m is r=|L|+1 (|L| is the size or cardinality of L). (Bound time- n)

    I have also shown how the order is O(n).

    T(n)=O(n)+T(n/5)+T(7n/10) We will solve this equation in order to get
    the complexity.

    We assume that T (n)< C*n

    T (n)= a*n + T (n/5) + T (7n/10)

    Cn>=T(n/5) +T(7n/10) + an

    C>= Cn/5+ C7n/5 + an

    C>= 9*C/10 +a

    C/10>= a

    C>=10*a

    There is such a constant that exists….so T (n) = O (n)

    Why choose the magic number 5 as the partition? Here we go.

    Assume the partition number is p. The chosen pivot is not the exact median. We can only know at least 1/2 * n/p * p/2 numbers are less than the pivot. So the worst case is to find in 1 - 1/2 * n/p * p/2 = 3n/4.

    Then T (n)= a*n + T (n/p) + T (3n/4)

    Follow the proof above, we can see that

    n/p + 3n/4 < n

    1/p + 3/4 < 1

    p > 4

    so we can choose any partition number > 4, for example 5


  • 0

    Hi, jmnarloch. Thank you for your excellent solutions. I have reached the partition solution, but just don't know how to guarantee it to be O(n). Thank you for the random shuffle idea you share.


  • 1

    Hi, I run your code on some examples. But it is still a bit hard to understand the logic of findK, especially the usage of i and m. Could you give some explanations?


  • 1

    Well, a notoriously complicated algorithm taken from CLRS...


  • 0
    3

    i is the mark for the first number which is bigger than parti.
    m is the mark for the position of nums[start] at the very beginning. m means "middle" in fact.
    Hence, the two flags ensure that the left part is smaller than nums[start] when the right part is bigger than nums[start]. Be careful. It is nums[start], not nums[0].


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.