Simple Reservoir Sampling solution


  • 67
    public class Solution {
    
        int[] nums;
        Random rnd;
    
        public Solution(int[] nums) {
            this.nums = nums;
            this.rnd = new Random();
        }
        
        public int pick(int target) {
            int result = -1;
            int count = 0;
            for (int i = 0; i < nums.length; i++) {
                if (nums[i] != target)
                    continue;
                if (rnd.nextInt(++count) == 0)
                    result = i;
            }
            
            return result;
        }
    }```

  • 0
    Y
    This post is deleted!

  • 37

    @yidongwang Actually it does.

    public int pick(int target) {
    int result = -1;
    int count = 0; // to record how many targets in the array
    for (int i = 0; i < nums.length; i++) {
    if (nums[i] != target)
    continue;
    /*
    For the nth target, ++count is n. Then the probability that rnd.nextInt(++count)==0 is 1/n. Thus, the probability that return nth target is 1/n.
    For (n-1)th target, the probability of returning it is (n-1)/n * 1/(n-1)= 1/n.
    */
    if (rnd.nextInt(++count) == 0)
    result = i;
    }
    return result;
    }


  • 0
    Y

    @Chuqi
    thank you very much,

    I have read the proof of "Reservoir Sampling".

    yes, It is equal probability of returning, I'm wrong.

    thanks


  • 1
    I

    @dettier So O(n) pick is acceptable, O(1) extra space is a must?


  • 0

    @iambright Take a look at this discussion https://discuss.leetcode.com/topic/58322/what-on-earth-is-meant-by-too-much-memory
    It seems like some O(n) memory solutions are accepted and some are not.
    So I think originally problem was expected to be solved with O(1) memory solution.


  • 3

    So basically you may return -1 at the end? Correct me if I am wrong, I think the question requires a result which has been guaranteed.

    Oops, I got it. Actually random.nextInt(++counter) this will guarantee that at least the first one will be picked up. Thanks.


  • 0
    J
    public class Solution {
        // sort + binary search? You can't really sort it, as it is 
        int[] nums;
        public Solution(int[] nums) {
            this.nums = nums;
        }
        
        public int pick(int target) {
            int result = 0;
            int cnt = 0;
            Random rand = new Random();
            
            for (int i = 0; i < nums.length; ++i) {
                if (nums[i] == target) {
                    cnt ++;
                    if (rand.nextLong() % cnt == 0) {
                        result = i;
                    }
                }
            }
            return result;
        }
    }

  • 0

    @dettier said in Simple Reservoir Sampling solution:

    Your Java code doesn't get accepted in C++:

    class Solution {
        vector<int> v; vector<int>& vect = v;
        int n;
    public:
        Solution(vector<int> nums){
            vect = nums;
            n = nums.size();
        }
        int pick(int t) {
            int count = 0, res = -1;
            for(int i = 0; i < n; i++) {
                if (vect[i] == t) {
                    count += 1;
                    if (not (rand() % count)) 
                        res = i;
                }
            }
            return res;
        }
    };
    
    

  • 1
    L

    Hey, I don't quite understand, why it is "if (rnd.nextInt(++count) == 0)
    result = i;" not "if (rnd.nextInt(count++) == 0)
    result = i;" we don't select the current number, right? We get a random one from the previous, and decide if swap,right?


  • 0

    @liu971 Hi, this is because we need to save current element with probability 1 / (total # of encountered candidates). So it's 100% for first target element, 50% for the second one, and so on...
    Check out wikipedia article, it explains it pretty well (sample size 1 example).
    https://en.wikipedia.org/wiki/Reservoir_sampling


  • 32
    L

    To those who don't understand why it works. Consider the example in the OJ
    {1,2,3,3,3} with target 3, you want to select 2,3,4 with a probability of 1/3 each.

    2 : It's probability of selection is 1 * (1/2) * (2/3) = 1/3
    3 : It's probability of selection is (1/2) * (2/3) = 1/3
    4 : It's probability of selection is just 1/3

    So they are each randomly selected.


  • 1
    X

    Is there anyone who knows how to test the results have equal probability ? What are the test cases for this question?


  • 0
    L

    @xinzh1010163.com I literally just showed why it works above. You can't test randomness.


  • 0
    X

    @lekzeey Thank you. I know how to prove it. Just want to know how the design the test case for randomness. Some one asked me about it in an interview.


  • 0
    L

    @xinzh1010163.com I think you can test edges cases and reasonable values in randomness like Leetcode does. E.g. if it's just one index then only that index would be returned or it should be in a certain range 2,3,4 etc. But testing randomness in general is pretty hard, even if you take numerous samples because they wouldn't exactly fit the pmf of the random variable.


  • 2
    G

    hello!
    I have the same question about the line : if (rnd.nextInt(++count) == 0),
    I know it is the right answer, but I still don't understand why it is not count++?
    This is what I think:
    The first time we execute this if statement will be : (rnd.nextInt(1) == 0), am I right?
    So we will just have 1/2 probability to make result = i,, not 1.
    Could anyone tell me why am wrong?
    Thanks a lot.


  • 0

    @Grain_In_Ear

    It is because that rnd.nextInt(n) will return an Integer in [0, n).
    Thus rnd.nextInt(1) won't return 1, but will only return 0.


  • 0
    G

    @Chuqi
    I got it.
    Thank you so much!


  • 0
    G

    @agave
    What's the result, wrong answer or time limit exceed?


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.