Python, Advanced O(n) solution (Convex Hull Window)


  • 22

    Let d(x, y) be the density of segment [x, y], ie. d(x, y) = (A[x]+...+A[y]) / (y-x+1). It can be computed quickly with prefix sums.

    Now we refer to section 3 of Kai-min Chung, Hsueh-I Lu - An Optimal Algorithm for the Maximum-Density Segment Problem. 2008.

    For each ending index j, the current interval for i under consideration, [0, j-K+1] (minus parts on the left we have already discarded), has been decomposed into minimum density segments of longest length [hull[i], hull[i+1]-1], and we discard these segments as appropriate. That is, for each i in increasing order, hull[i+1] is the largest index in [hull[i], j-K+1] so that [hull[i], hull[i+1]-1] has minimum density.

    This is simply a lower hull of candidate points i, in a geometric interpretation where d(a, b) is the slope of the line segment (a, P[a]) to (b+1, P[b+1]). Then, we can prove that discarding components with lower density than our current candidate d(hull[0], j) must leave us with the highest density option remaining.

    def findMaxAverage(self, A, K):
        N = len(A)
        P = [0]
        for x in A:
            P.append(P[-1] + x)
    
        def d(x, y):
            return (P[y+1] - P[x]) / float(y+1-x)
    
        hull = collections.deque()
        ans = float('-inf')
    
        for j in xrange(K-1, N):
            while len(hull) >= 2 and d(hull[-2], hull[-1]-1) >= d(hull[-2], j-K):
                hull.pop()
            hull.append(j-K + 1)
            while len(hull) >= 2 and d(hull[0], hull[1]-1) <= d(hull[0], j):
                hull.popleft()
            ans = max(ans, d(hull[0], j))
    
        return ans
    

  • 0
    S

    @awice Very elegant solution and amazing O(n) time complexity.

    cpp version 89ms

    class Solution {
    public:
        double findMaxAverage(vector<int>& nums, int k) {
            int n = nums.size();
            vector<double> preS(n, 0);
            preS[0] = nums[0];
            for (int i = 1; i < n; ++i)
            {
                preS[i] = preS[i-1] + nums[i];
            }
            
            deque<int> q;
            double ret = preS[n-1] / n;
            for (int j = k-1; j < n; ++j)
            {
                while(q.size() >= 2 && density(preS, q[q.size()-2], q.back()-1) >= density(preS, q.back(), j-k))
                {
                    q.pop_back();
                }
                q.push_back(j-k+1);
                while(q.size() >= 2 && density(preS, q[0], j) <= density(preS, q[1], j))
                {
                    q.pop_front();
                }
                ret = max(ret, density(preS, q.front(), j));
            }
            return ret;
        }
    private:
        double density(vector<double>& preS, int l, int r)
        {
            if (l == 0)
            {
                return preS[r] / (r + 1);
            }else
            {
                return (preS[r] - preS[l-1]) / (r - l + 1);
            } 
        }
    };
    

  • 0
    F

    @awice This is a great solution which I like a lot because this provides a mathematically precise answer, unlike the binary search one.

    Maybe I am too pedantic: in the main for loop it doesn't always compute the i (i <= j-k+1) that maximizes d(i, j) for all j. However, the algorithm does compute the largest legal average.

    Here is a very sketchy proof if anyone needs it:

    Suppose the segment from i* to j* is the shortest sub-array among all legal sub-arrays that admit the largest average. Will show that d(i*, j*) will be considered in the for loop when j = j*. This is done in three steps:

    1. when j <= j* and i* is already in hull, i* won't be popped out in the first while loop:
    while len(hull) >= 2 and d(hull[-2], hull[-1]-1) >= d(hull[-2], j-K):
        hull.pop()
    
    1. when j <= j* and i* is already in hull, i* won't be poped out in the second while loop:
    while len(hull) >= 2 and d(hull[0], hull[1]-1) <= d(hull[0], j):
        hull.popleft()
    
    1. when j = j*, after the second while loop, i* = hull [0], in other words, all integers in front of i* will be popped out. QED.

    I didn't read the reference @awice cited, so maybe there is more geometric intuition that leads to a simpler proof. Please let me know if this is so.


  • 6

    @flamesofmoon Thanks for the correction and the proof. This is my favorite answer that I have written here. Your proof and notation is similar to the one in the paper.

    For a geometric argument, first consider the equivalent problem: we have a set of points {P_x} = {(x, P[x]) | x = 0...n} (abusing notation P[x] for the array, P_x for the point), and we want to find the largest slope between two points that have x coordinates with difference K or more. Here, d(a, b-1) is m(P_a, P_b), the slope between P_a and P_b.


    while len(hull) >= 2 and d(hull[-2], hull[-1]-1) >= d(hull[-2], j-K):
        hull.pop()
    hull.append(j-K + 1)
    

    After the above code, hull is exactly the lower convex hull of these points: the slopes d(hull[i], hull[i+1] -1) are increasing, and choosing in order any three of those points A, B, C, they are counterclockwise (B lies below AC.)

    Imagine we tried all points P_i (i < j-K) and see which one maximizes the slope of the line m(P_i, P_{j-K}). If we have two points P_0, P_1, then m(P_1, P_{j-K}) > m(P_0, P_{j-K}) iff P_1 is below line P_0 P_{j-K}, namely (P_0, P_1, P_{j-K}) is counterclockwise. This is a transitive property: we can perform this imaginary "tournament" in any order and find the winning point, and any points P_i with (P_h, P_i, P_{j-K}) clockwise (h < i < j-K) can be eliminated.

    Suppose (A, B, C) is clockwise, with A.x < B.x < C.x < T.x. The crux of the argument is that max(m(A, T), m(C, T)) > m(B, T). Refer to the picture below for a proof of this. That means that any B's that were eliminated when creating the convex hull could not have the largest slope m(B, T), where T = P_{j-K}.

    Proof


  • 0

    Why am I bad at Maths...Great solution BTW!


  • 2
    Z

    I really like your solution. I have read the paper, then I could understand your code.
    I have write a java version with some commons:

    public class Solution {
        int[] preSum;
        public double findMaxAverage(int[] nums, int k) {
            int len = nums.length;
            preSum = new int[len];
            // get preSum array
            preSum[0] = nums[0];
            for(int i=1; i<len; i++){
                preSum[i] =preSum[i-1]+nums[i];
            }
            // create the list such that for the range (i,j), element in this list would satisfy the following conditon
            // list[m]  list[m+1]-1 would be the smallest density from list[m] j.
            LinkedList<Integer> list = new LinkedList<Integer>();
            double res = -10000.0;
            for(int j =k-1; j<len; j++){
                // if the last point does not satisfy the condition, we need to remove it
                while(list.size()>=2 && 
                     getDensity(list.get(list.size()-2), list.get(list.size()-1)-1) >= getDensity(list.get(list.size()-2),j-k )
                     ){
                    list.pollLast();
                }
                list.add(j-k+1);
                // all points are OK. If the first two point satisfy the condition, we need to remove the first point. we need to find the max point
                while(list.size()>=2 && 
                     getDensity(list.get(0), list.get(1)-1) <= getDensity(list.get(0),j)
                     ){
                    list.pollFirst();
                }
                res = Math.max(res, getDensity(list.get(0),j));
            }
            
            return res ;
        }
        
        public double getDensity(int l, int r){
            double res = 0.0;
            if(l==0){
                res = ((double) preSum[r])/(1.0+r) ;
            }
            else{
                res = ((double) preSum[r]-preSum[l-1]) /(1.0+r-l);
            }
            return res;
        }
    }
    

  • 0
    S

    Nice work, here is the java version

    public class Solution {
        //convex hull window
        public double findMaxAverage(int[] nums, int k) {
           int n = nums.length;
           long[] sum = new long[n+1];
            
           for(int i = 0; i < n; i++){
               sum[i+1] = sum[i] + nums[i];
           }
            
           LinkedList<Integer> hull = new LinkedList<>();
           double ans = Integer.MIN_VALUE;
            
           for(int j = k - 1; j < n; j++){
                while(hull.size() >= 2 && getAvg(sum, hull.get(hull.size() - 2), hull.get(hull.size() - 1) - 1) >= getAvg(sum, hull.get(hull.size() - 2), j-k)){
                   hull.remove(hull.size() - 1);
                }
                     
                hull.add(j - k + 1);
                while(hull.size() >= 2 && getAvg(sum, hull.get(0), hull.get(1) - 1) <= getAvg(sum, hull.get(0), j)){
                    hull.removeFirst();
                }
                      
                ans = Math.max(ans, getAvg(sum, hull.getFirst(), j));
           } 
            
            return ans;
        }
        
        
        private double getAvg(long[] sum, int i, int j){
            return (sum[j+1] - sum[i])/(double) (j+1-i);
        }
    }
    

  • 0
    R

    What does "hull[-1]-1" mean?? why minus 1??


  • 0
    L

    Why can't you use j instead of j-K in the first while loop?

     while len(hull) >= 2 and d(hull[-2], hull[-1]-1) >= d(hull[-2], j-K):

  • 1
    E

    This is elegant! Though would be hard to come with similar solution if encountering 1st time in interview..


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.