Why median is better than average?


  • 3
    M

    I'm confused for this problem that we choose median to be the number everyone tries to move to, but not the average. Some post gives the reason that if we choose a number in the array, we save operations to change itself. But I'm thinking this reason doesn't stand because if we choose the average, other numbers may take less moves than choosing median.

    Can anyone help me understand the thinking process of choosing median instead of average?

    Thanks!


  • 2
    S

    Consider an exmaple:

    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1000000


  • 1
    Y

    @starmdrog yes, it is indeed a good example. But it does not prove that the median will be always the best choice.


  • 2

    Actually, when n is even, any number between the median two numbers can give a minimum number of moves. Here is a simple example, assume the array is sorted, x is the number chosen:

    a0 a1 a2 a3 x a4 a5 a6 a7; // moves needed: (a4 - a3) + (a5 - a2) + (a6 - a1) + (a7 - a0)

    a0 a1 a2 (a3 a4 x) a5 a6 a7; // moves needed: (2x - a3 - a4) + (a5 - a2) + (a6 - a1) + (a7 - a0).

    a0 a1 (a2 a3 a4 a5 x) a6 a7; // moves needed: (2x - a3 - a4) + (2x - a5 - a2) + (a6 - a1) + (a7 - a0)

    As you can see, the number of moves needed for the numbers outside the parentheses remains the same, and inside the parentheses increases. So moving x leftward or rightward will increase the moves needed. This is similar when n is odd, and because there is only one median, so the median should be chosen.


  • 2
    S

    There is mathematical proof here
    https://math.stackexchange.com/questions/113270/the-median-minimizes-the-sum-of-absolute-deviations

    But in higher terms, the reason is because
    mean is where
    sum of distances to smaller elements = sum of distances to larger elements.
    So net number of moves
    = 2 * (either sum of distances to lower or larger elements)

    But this value might not be the lowest value for net distance.


  • 5
    J

    I have a simpler proof. The meet point must be somewhere between current min and max. No matter which point you pick, the total running length for min and max is the same, i.e. abs(min_point-meet_point)+abs(max_point-meet_point) = SOME_CONSTANT.

    So, we can effectively reduce the problem size from n to n-2 by discarding min and max points. Do you see it? That is the definition of median, isn't it?

    Let's have an example. suppose we have an array, [1,2,3,4,5,6,7]. The meet point must lie between 1 and 7. For any value in this range, the total running length for 1 and 7 is the same. So, array => [2,3,4,5,6]...

    I hope it can help people who don't want to read some long proofs.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.