Find Median From Data Stream


  • 0

    Click here to see the full article post


  • 1

    Just shortening the last solution a bit further...

    class MedianFinder {
        multiset<int> data;
        multiset<int>::iterator mid;
    
    public:
        void addNum(int num) {
            data.insert(num);
            mid = data.size() == 1 ? data.begin()
                                   : next(mid, data.size() % 2 - (num < *mid));
        }
    
        double findMedian() {
            return (*mid + *next(mid, 1 - data.size() % 2)) * 0.5;
        }
    };
    

  • 0

    @StefanPochmann Thanks for this! Implicit type conversion FTW.
    I am not going to change the solution in the article though. It is better the reader discovers it in the comments.


  • 1

    @babhishek21 Thanks for the great write-up. Minor nit: I didn't quite follow your section on buckets. My understanding is that you were advocating a model where you maintain a certain number of buckets that could be determined by knowing the statistical distribution of the input (say, values between 1-30). You could create 3 buckets (1-10, 11-20, 21-30) and send the stream values in said buckets. Finally when you're asked to find the median, knowing the size of each bucket will allow you surgically sort 1 specific bucket and leave the rest alone.

    If this understanding is correct, then this has nothing to do with reservoir sampling (which is a nifty way to keep k out of N elements handy in a statistically unbiased way).


  • 0

    @schumpeter You are right. It can be misleading. My bad. But did you read the StackOverflow answer? I think i'll clarify a bit.

    EDIT: Fixed.

    My understanding is that you were advocating a model where you maintain a certain number of buckets ... will allow you surgically sort 1 specific bucket and leave the rest alone.

    Yep. Pretty much. But what if you want to maintain just one magical bucket that solves all your problems?


  • 1

    @babhishek21 Thanks for your edit. I did read the SO link. While it is implicit, I think in an interview it would be work calling the need for precision explicitly. Ultimately, reservoir sampling is a quick approximation to finding things like median, quantiles etc. but it doesn't guarantee complete accuracy (and often times, that's OK).


  • 0
    L

    Amazing algorithm! I have never think about use two pointer to solve problem like this.


  • 0
    A

    @StefanPochmann by not keeping track of 2 iterators findMedian becomes O(logn) right not O(1) since next on a bst is logn if i'm not mistaken. I wanted to go with the 1 pointer approach until i thought about this :(


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.