Python, Advanced O(n) solution (Convex Hull Window)


  • 29

    Let d(x, y) be the density of segment [x, y], ie. d(x, y) = (A[x]+...+A[y]) / (y-x+1). It can be computed quickly with prefix sums.

    Now we refer to section 3 of Kai-min Chung, Hsueh-I Lu - An Optimal Algorithm for the Maximum-Density Segment Problem. 2008.

    For each ending index j, the current interval for i under consideration, [0, j-K+1] (minus parts on the left we have already discarded), has been decomposed into minimum density segments of longest length [hull[i], hull[i+1]-1], and we discard these segments as appropriate. That is, for each i in increasing order, hull[i+1] is the largest index in [hull[i], j-K+1] so that [hull[i], hull[i+1]-1] has minimum density.

    This is simply a lower hull of candidate points i, in a geometric interpretation where d(a, b) is the slope of the line segment (a, P[a]) to (b+1, P[b+1]). Then, we can prove that discarding components with lower density than our current candidate d(hull[0], j) must leave us with the highest density option remaining.

    def findMaxAverage(self, A, K):
        N = len(A)
        P = [0]
        for x in A:
            P.append(P[-1] + x)
    
        def d(x, y):
            return (P[y+1] - P[x]) / float(y+1-x)
    
        hull = collections.deque()
        ans = float('-inf')
    
        for j in xrange(K-1, N):
            while len(hull) >= 2 and d(hull[-2], hull[-1]-1) >= d(hull[-2], j-K):
                hull.pop()
            hull.append(j-K + 1)
            while len(hull) >= 2 and d(hull[0], hull[1]-1) <= d(hull[0], j):
                hull.popleft()
            ans = max(ans, d(hull[0], j))
    
        return ans
    

  • 1
    S

    @awice Very elegant solution and amazing O(n) time complexity.

    cpp version 89ms

    class Solution {
    public:
        double findMaxAverage(vector<int>& nums, int k) {
            int n = nums.size();
            vector<double> preS(n, 0);
            preS[0] = nums[0];
            for (int i = 1; i < n; ++i)
            {
                preS[i] = preS[i-1] + nums[i];
            }
            
            deque<int> q;
            double ret = preS[n-1] / n;
            for (int j = k-1; j < n; ++j)
            {
                while(q.size() >= 2 && density(preS, q[q.size()-2], q.back()-1) >= density(preS, q.back(), j-k))
                {
                    q.pop_back();
                }
                q.push_back(j-k+1);
                while(q.size() >= 2 && density(preS, q[0], j) <= density(preS, q[1], j))
                {
                    q.pop_front();
                }
                ret = max(ret, density(preS, q.front(), j));
            }
            return ret;
        }
    private:
        double density(vector<double>& preS, int l, int r)
        {
            if (l == 0)
            {
                return preS[r] / (r + 1);
            }else
            {
                return (preS[r] - preS[l-1]) / (r - l + 1);
            } 
        }
    };
    

  • 0

    @awice This is a great solution which I like a lot because this provides a mathematically precise answer, unlike the binary search one.

    Maybe I am too pedantic: in the main for loop it doesn't always compute the i (i <= j-k+1) that maximizes d(i, j) for all j. However, the algorithm does compute the largest legal average.

    Here is a very sketchy proof if anyone needs it:

    Suppose the segment from i* to j* is the shortest sub-array among all legal sub-arrays that admit the largest average. Will show that d(i*, j*) will be considered in the for loop when j = j*. This is done in three steps:

    1. when j <= j* and i* is already in hull, i* won't be popped out in the first while loop:
    while len(hull) >= 2 and d(hull[-2], hull[-1]-1) >= d(hull[-2], j-K):
        hull.pop()
    
    1. when j <= j* and i* is already in hull, i* won't be poped out in the second while loop:
    while len(hull) >= 2 and d(hull[0], hull[1]-1) <= d(hull[0], j):
        hull.popleft()
    
    1. when j = j*, after the second while loop, i* = hull [0], in other words, all integers in front of i* will be popped out. QED.

    I didn't read the reference @awice cited, so maybe there is more geometric intuition that leads to a simpler proof. Please let me know if this is so.


  • 7

    @flamesofmoon Thanks for the correction and the proof. This is my favorite answer that I have written here. Your proof and notation is similar to the one in the paper.

    For a geometric argument, first consider the equivalent problem: we have a set of points {P_x} = {(x, P[x]) | x = 0...n} (abusing notation P[x] for the array, P_x for the point), and we want to find the largest slope between two points that have x coordinates with difference K or more. Here, d(a, b-1) is m(P_a, P_b), the slope between P_a and P_b.


    while len(hull) >= 2 and d(hull[-2], hull[-1]-1) >= d(hull[-2], j-K):
        hull.pop()
    hull.append(j-K + 1)
    

    After the above code, hull is exactly the lower convex hull of these points: the slopes d(hull[i], hull[i+1] -1) are increasing, and choosing in order any three of those points A, B, C, they are counterclockwise (B lies below AC.)

    Imagine we tried all points P_i (i < j-K) and see which one maximizes the slope of the line m(P_i, P_{j-K}). If we have two points P_0, P_1, then m(P_1, P_{j-K}) > m(P_0, P_{j-K}) iff P_1 is below line P_0 P_{j-K}, namely (P_0, P_1, P_{j-K}) is counterclockwise. This is a transitive property: we can perform this imaginary "tournament" in any order and find the winning point, and any points P_i with (P_h, P_i, P_{j-K}) clockwise (h < i < j-K) can be eliminated.

    Suppose (A, B, C) is clockwise, with A.x < B.x < C.x < T.x. The crux of the argument is that max(m(A, T), m(C, T)) > m(B, T). Refer to the picture below for a proof of this. That means that any B's that were eliminated when creating the convex hull could not have the largest slope m(B, T), where T = P_{j-K}.

    Proof


  • 0

    Why am I bad at Maths...Great solution BTW!


  • 2
    Z

    I really like your solution. I have read the paper, then I could understand your code.
    I have write a java version with some commons:

    public class Solution {
        int[] preSum;
        public double findMaxAverage(int[] nums, int k) {
            int len = nums.length;
            preSum = new int[len];
            // get preSum array
            preSum[0] = nums[0];
            for(int i=1; i<len; i++){
                preSum[i] =preSum[i-1]+nums[i];
            }
            // create the list such that for the range (i,j), element in this list would satisfy the following conditon
            // list[m]  list[m+1]-1 would be the smallest density from list[m] j.
            LinkedList<Integer> list = new LinkedList<Integer>();
            double res = -10000.0;
            for(int j =k-1; j<len; j++){
                // if the last point does not satisfy the condition, we need to remove it
                while(list.size()>=2 && 
                     getDensity(list.get(list.size()-2), list.get(list.size()-1)-1) >= getDensity(list.get(list.size()-2),j-k )
                     ){
                    list.pollLast();
                }
                list.add(j-k+1);
                // all points are OK. If the first two point satisfy the condition, we need to remove the first point. we need to find the max point
                while(list.size()>=2 && 
                     getDensity(list.get(0), list.get(1)-1) <= getDensity(list.get(0),j)
                     ){
                    list.pollFirst();
                }
                res = Math.max(res, getDensity(list.get(0),j));
            }
            
            return res ;
        }
        
        public double getDensity(int l, int r){
            double res = 0.0;
            if(l==0){
                res = ((double) preSum[r])/(1.0+r) ;
            }
            else{
                res = ((double) preSum[r]-preSum[l-1]) /(1.0+r-l);
            }
            return res;
        }
    }
    

  • 0
    S

    Nice work, here is the java version

    public class Solution {
        //convex hull window
        public double findMaxAverage(int[] nums, int k) {
           int n = nums.length;
           long[] sum = new long[n+1];
            
           for(int i = 0; i < n; i++){
               sum[i+1] = sum[i] + nums[i];
           }
            
           LinkedList<Integer> hull = new LinkedList<>();
           double ans = Integer.MIN_VALUE;
            
           for(int j = k - 1; j < n; j++){
                while(hull.size() >= 2 && getAvg(sum, hull.get(hull.size() - 2), hull.get(hull.size() - 1) - 1) >= getAvg(sum, hull.get(hull.size() - 2), j-k)){
                   hull.remove(hull.size() - 1);
                }
                     
                hull.add(j - k + 1);
                while(hull.size() >= 2 && getAvg(sum, hull.get(0), hull.get(1) - 1) <= getAvg(sum, hull.get(0), j)){
                    hull.removeFirst();
                }
                      
                ans = Math.max(ans, getAvg(sum, hull.getFirst(), j));
           } 
            
            return ans;
        }
        
        
        private double getAvg(long[] sum, int i, int j){
            return (sum[j+1] - sum[i])/(double) (j+1-i);
        }
    }
    

  • 0
    R

    What does "hull[-1]-1" mean?? why minus 1??


  • 0
    L

    Why can't you use j instead of j-K in the first while loop?

     while len(hull) >= 2 and d(hull[-2], hull[-1]-1) >= d(hull[-2], j-K):

  • 0
    E

    This is elegant! Though would be hard to come with similar solution if encountering 1st time in interview..


  • 0
    X

    My 43 ms Java solution using dynamic programming which beats 95%. But I do not know its time complexity. Could anyone help me with that? Thanks!

    class Solution {
        public double findMaxAverage(int[] nums, int k) {
            int n = nums.length;
            int[] prefix = new int[n]; // prefix[i] = sum from 0 to i, both inclusive
            int[] state = new int[n]; // state[i] = length of max average subarray ends at index i;
            double res = Integer.MIN_VALUE;
            for (int i = 0; i < n; i++) {
                prefix[i] = i == 0 ? nums[i] : nums[i] + prefix[i - 1];
                state[i] = findLen(i, 1, 10000, prefix, state);
                
                // update max average subarray with length >= k
                if (i < k - 1) {
                    continue;
                }
                int len = state[i] >= k ? state[i] : findLen(i, k, 2 * k, prefix, state);
                int sum = i == len - 1 ? prefix[i] : prefix[i] - prefix[i - len];
                res = Math.max(res, (double) sum / len);
            }
            return res;
        }
        /*/ 
            this method finds the length of max average subarray ends at index i under given length constraints:
            minLen <= length <= maxLen;
        /*/
        private int findLen(int i, int minLen, int maxLen, int[] prefix, int[] state) {
            int j = i - minLen;
            while (j >= 0 && i - j <= maxLen) {
                int sum = j - state[j] >= 0 ? prefix[j] - prefix[j - state[j]] : prefix[j];
                if ((double) sum / state[j] <= (double) (prefix[i] - prefix[j]) / (i - j)) {
                    break;
                }
                j -= state[j];
            }
            return i - j;
        }
    }
    

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.