O(n*log(n)) Time C++ Solution Using Two Heaps and a Hash Table


  • 10
    I

    There are a few solutions using BST with worst case time complexity O(n*k), but we know k can be become large. I wanted to come up with a solution that is guaranteed to run in O(n*log(n)) time. This is in my opinion the best solution so far.

    The idea is inspired by solutions to Find Median from Data Stream: use two heaps to store numbers in the sliding window. However there is the issue of numbers moving out of the window, and it turns out that a hash table that records these numbers will just work (and is surprisingly neat). The recorded numbers will only be deleted when they come to the top of the heaps.

    class Solution {
    public:
        vector<double> medianSlidingWindow(vector<int>& nums, int k) {
            vector<double> medians;
            unordered_map<int, int> hash;                          // count numbers to be deleted
            priority_queue<int, vector<int>> bheap;                // heap on the bottom
            priority_queue<int, vector<int>, greater<int>> theap;  // heap on the top
            
            int i = 0;
            
            // Initialize the heaps
            while (i < k)  { bheap.push(nums[i++]); }
            for (int count = k/2; count > 0; --count) {
                theap.push(bheap.top()); bheap.pop();
            }
            
            while (true) {
                // Get median
                if (k % 2) medians.push_back(bheap.top());
                else medians.push_back( ((double)bheap.top() + theap.top()) / 2 );
                
                if (i == nums.size()) break;
                int m = nums[i-k], n = nums[i++], balance = 0;
                
                // What happens to the number m that is moving out of the window
                if (m <= bheap.top())  { --balance;  if (m == bheap.top()) bheap.pop(); else ++hash[m]; }
                else                   { ++balance;  if (m == theap.top()) theap.pop(); else ++hash[m]; }
                
                // Insert the new number n that enters the window
                if (!bheap.empty() && n <= bheap.top())  { ++balance; bheap.push(n); }
                else                                     { --balance; theap.push(n); }
                
                // Rebalance the bottom and top heaps
                if      (balance < 0)  { bheap.push(theap.top()); theap.pop(); }
                else if (balance > 0)  { theap.push(bheap.top()); bheap.pop(); }
                
                // Remove numbers that should be discarded at the top of the two heaps
                while (!bheap.empty() && hash[bheap.top()])  { --hash[bheap.top()]; bheap.pop(); }
                while (!theap.empty() && hash[theap.top()])  { --hash[theap.top()]; theap.pop(); }
            }
            
            return medians;
        }
    };
    

    Since both heaps will never have a size greater than n, the time complexity is O(n*log(n)) in the worst case.


  • -3

    Just want to say the coding style is ... beautiful.


  • 1
    I

    @Undo Thanks! I have to admit that this was initially messier.


  • 4
    H

    My Python version of the above code

    import collections
    from heapq import heappush, heappop, heapify
    class Solution(object):
        def medianSlidingWindow(self, nums, k):
            '''Similar to the median stream problem, we maintain 2 heaps which represent
            the top and bottom halves of the window.
            Since deletion from a heap is an O(1) operation, we perform it lazily. 
    
            At any time, if a number leaves a window, we delete it if it is at the top
            of the heap. Else, we stage it for deletion, but alter the count of this 
            half of the array.
    
            When this element eventually comes to the top of the heap at a later instance, 
            we perform the staged deletions.
            '''
            to_be_deleted, res = collections.defaultdict(int), []
            top_half, bottom_half = nums[:k], []
            # We first begin by heapifying the first k-window
            heapify(top_half)
    
            # Balancing the top and bottom halves of the k-window
            while len(top_half) - len(bottom_half) > 1:
                heappush(bottom_half, -heappop(top_half))
    
            for i in xrange(k, len(nums)+1):
                median = top_half[0] if k%2 else 0.5*(top_half[0]-bottom_half[0])
                res.append(median)
                if i<len(nums):
                    num, num_to_be_deleted = nums[i], nums[i-k]
                    top_bottom_balance = 0 #top_bottom_balance = len(top_half) - len(bottom_half)
    
                    # If number to be deleted is in the top half, we decrement the top_bottom_balance
                    if num_to_be_deleted >= top_half[0]:
                        top_bottom_balance-=1
                        # If the number to be deleted is at the top of the heap, we remove the entry
                        if num_to_be_deleted == top_half[0]:
                            heappop(top_half)
                        # Else, we keep track of this number for later deletion
                        else:
                            to_be_deleted[num_to_be_deleted]+=1
                    else:
                        top_bottom_balance+=1
                        if num_to_be_deleted == -bottom_half[0]:
                            heappop(bottom_half)
                        else:
                            to_be_deleted[num_to_be_deleted]+=1
    
                    # If the new number to be inserted falls into the top half, we insert it there and update the top_bottom_balance
                    if top_half and num >= top_half[0]:
                        top_bottom_balance+=1
                        heappush(top_half, num)
                    else:
                        top_bottom_balance-=1
                        heappush(bottom_half, -num)
    
                    # top_bottom_balance can only be -2, 0 or +2
                    # If top_bottom_balance is -2, then we deleted num_to_be_deleted from the top half AND added the new number to the bottom half
                    # We hence add the head of the bottom half to the top half to balance both trees
                    if top_bottom_balance>0:
                        heappush(bottom_half, -heappop(top_half))
                    elif top_bottom_balance<0:
                        heappush(top_half, -heappop(bottom_half))
    
                    # While the head of the top_half has been staged for deletion
                    # previously, remove it from the heap
                    while top_half and to_be_deleted[top_half[0]]:
                        to_be_deleted[top_half[0]]-=1
                        heappop(top_half)
                    while bottom_half and to_be_deleted[-bottom_half[0]]:
                        to_be_deleted[-bottom_half[0]]-=1
                        heappop(bottom_half)
            return map(float, res)
    
    

  • 1
    B

    You solution is Great!!!
    But....

      if      (balance < 0)  { bheap.push(theap.top()); theap.pop(); }
      else if (balance > 0)  { theap.push(bheap.top()); bheap.pop(); }
    

    What if theap.top() or bheap.top() is not available? I cannot figure it out...


  • 0
    I

    @BURIBURI balance is changed twice in the code, each time it is either ++balance or --balance. If say, balance < 0 after the two changes, then it must be that they are both --balance, so something has been pushed to theap on the line:

    else { --balance; theap.push(n); }

    This took me a while to figure out!


  • 0
    B

    @ipt Got it! If we have --balance; and m == bheap.top(), bheap.pop(); and finally balance < 0 , actually we will not visit bheap.top()at all , no need to worry about if it is available! Thank you very much!


  • 0
    V

    Do you need to rebalance the two heaps after removing the numbers that should be discarded at the top of the two heaps? I use a while loop to rebalance and remove top numbers until there is no top numbers that need to be removed after rebalancing, but it does not give the correct result..


  • 0
    T

    I have a question, why you remove numbers that should be discarded after rebalance? how do we guarantee it is still balanced when doing medians.push_back?


  • 0
    I

    @VincentSatou Balancing here only cares about the numbers that are in the window but now those that are discarded.


  • 0
    I

    @t3cmax I guess the answer is the same with the one above, I used balance to count only the effective numbers (those in the window).


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.