Sliding Window Median


@harleyquinn Using two multisets is not very different from using two heaps. Both maintain the sorted property.
IIRC Java
PriorityQueue
class has a greatremove()
method. So you don't need lazy removal there. C++ doesn't have such a method (Hence the lazy removal). The two multisets approach is thus just a workaround to make life easy for those who don't have access to a great Priority Queue or Heap class. If you are using Java, you are good to go withPriorityQueue
.Regarding the last approach: Using a
multiset
helps with two things: Maintaining the sorted property even with duplicate elements and insertions/deletions.
 Efficient logarithmic order insertion and deletion of elements.
You can get both with a modified AVL tree. Otherwise you require a continually sorted vector/linked list with random iterator support (so that you could run binary search on it).
Note that in the entire duration of the program, the window size is either
k
ork+1
. So repeated insertions and deletions can be reduced to amortizedO(1)
operations.

@babhishek21 My concern was not just regarding this question. In c++ you have a dynamically sorted set which helps in cases when you need a sorted set. Multiset also allows duplicates. Java doesnt have that ease. TreeMap or TreeSet are sorted data structures with no duplicates allowed. You have to implement a AVL tree for these cases. Let me know if you know of a better approach to keep an input stream sorted and do binary search on it and get index of, ceiling floor for any number.

@babhishek21 said in Sliding Window Median:
IIRC Java PriorityQueue class has a great remove() method.
You mean the one that takes linear time?

@StefanPochmann I did not know that. Thanks for pointing that out. In which case, lazy removals from heaps continue to be handy. ;)

@StefanPochmann Hi, I remember in Java's PriorityQueue, remove(Object) will take O(k) time, so if we use this remove(Object) function, the total time complexity will be O(nk + nlogk), right?

For the two heaps approach, could we also use a "LinkedTwoHeaps" data structure (compare with LinkedHashMap in Java) in which we have two heaps, each of whose elements are linked list nodes that point to other nodes based on insertion order? As we move our sliding window, we would delete the first node, add a new node, and then rebalance the two heaps.

For approach #3 with two heaps,
is it possible that all invalid numbers never occur at the top of the heap, thus are never actually removed from the heap? And since the actual heap size may be O(n), the time complexity may be O(n * log(n)). Consider 102, 101, 1, 2, 3, 100, 4, 99, 5, 98, 6, 97... with k =4.
Correct me if I'm wrong.

@ColinBin
Yes, I have the same concern. If we do not remove the element, the heap size could be larger than k, which means the time complexity is not logk.