Find Median From Data Stream


Just shortening the last solution a bit further...
class MedianFinder { multiset<int> data; multiset<int>::iterator mid; public: void addNum(int num) { data.insert(num); mid = data.size() == 1 ? data.begin() : next(mid, data.size() % 2  (num < *mid)); } double findMedian() { return (*mid + *next(mid, 1  data.size() % 2)) * 0.5; } };

@StefanPochmann Thanks for this! Implicit type conversion FTW.
I am not going to change the solution in the article though. It is better the reader discovers it in the comments.

@babhishek21 Thanks for the great writeup. Minor nit: I didn't quite follow your section on buckets. My understanding is that you were advocating a model where you maintain a certain number of buckets that could be determined by knowing the statistical distribution of the input (say, values between 130). You could create 3 buckets (110, 1120, 2130) and send the stream values in said buckets. Finally when you're asked to find the median, knowing the size of each bucket will allow you surgically sort 1 specific bucket and leave the rest alone.
If this understanding is correct, then this has nothing to do with reservoir sampling (which is a nifty way to keep k out of N elements handy in a statistically unbiased way).

@schumpeter You are right. It can be misleading. My bad. But did you read the StackOverflow answer? I think i'll clarify a bit.
EDIT: Fixed.
My understanding is that you were advocating a model where you maintain a certain number of buckets ... will allow you surgically sort 1 specific bucket and leave the rest alone.
Yep. Pretty much. But what if you want to maintain just one magical bucket that solves all your problems?

@babhishek21 Thanks for your edit. I did read the SO link. While it is implicit, I think in an interview it would be work calling the need for precision explicitly. Ultimately, reservoir sampling is a quick approximation to finding things like median, quantiles etc. but it doesn't guarantee complete accuracy (and often times, that's OK).

@StefanPochmann by not keeping track of 2 iterators findMedian becomes O(logn) right not O(1) since next on a bst is logn if i'm not mistaken. I wanted to go with the 1 pointer approach until i thought about this :(