O(N) time , O(1)space solution


  • 10
    N

    in order to reach O(N) time ,we need to sort array in O(N) time.
    In this problem, we can do that using counting sort.
    However, counting sort requires extra O(N) space to do the trick, and as a result ,we will get the "count" for every element.
    But in this problem, we only need the "count" information for element smaller than n
    because h-index will be in range [0, n]

    So we can get rid of elements bigger than n, then consider the array as a linkedlist and do the counting sort right on itself

    Here is my implementation.

    public int hIndex(int[] citations) {
        int c=0, num=0;
        
        for(int i=0;i<citations.length;i++){
            c = citations[i];
            if(c<0)continue;
            //-1 means that the count of number is 0
            //-2 means that the count of number is 1, and so forth
            citations[i] = -1;
            if(c<citations.length){
                //loop like a linkedlist
                while((num=citations[c])>-1){
                    citations[c] = -2;  
                    if(num>citations.length-1){
                        break;
                    }
                    c = num;
                }
                if(num<citations.length){
                    citations[c]--;
                }
            }
        }
        c = 0;
        for(int i=0;i<citations.length;i++){
            num = -citations[i]-1;  //occurrences of i
            if(num>0){
                if(i>=citations.length-(c+num-1)){
                    return i>=citations.length-c ? citations.length-c : i;
                }
                c += num;
            }
        }
        return citations.length-c;
    }

  • 5

    A brilliant solution. Took me a while to figure it out. Here is my code based on understanding of this solution with comments:

    public int hIndex(int[] citations) {
        // Here, we fill the input array with counts, where
        // (-citations[i] - 1) is exactly the number
        // of papers having i publications.
        // Negative because we need to distinguish it from
        // the citation counts that we haven't processed yet.
        // Note that we'll just throw away any counts >= citations.length,
        // but we'll never need those.
        for (int i = 0; i < citations.length; ++i) {
            int count = citations[i];
            if (count < 0)
                continue; // already processed
            citations[i] = -1; // the count starts with 0
            for (int nextCount; count < citations.length &&
                                (nextCount = citations[count]) >= 0; ) {
                // We haven't got enough space to count those
                // >= citations.length, but neither we need them.
                citations[count] = -2; // we've just seen one
                count = nextCount;
            }
            // The loop above could have terminated either
            // 1) because count >= citations.length (we don't count those) or
            // 2) because we hit an element that already stores a count.
            // In the second case we need to increment that count since
            // we've just encountered another element with the same value.
            if (count < citations.length) {
                --citations[count];
            }
        }
        for (int h = 0, less = 0; h < citations.length; ++h) {
            int count = -citations[h] - 1;
            // Logically, the loop below must have this condition:
            // citations.length - less >= h && less + count >= citations.length - h,
            // but the first of these is really redundant. Indeed, it is obviously
            // true on the first iteration, and it follows that if it was true for
            // some "h", then it would be true for "h + 1". Indeed, the "less" variable
            // on this iteration is what "less + count" was on the previous one, so 
            // the (citations.length - less >= h) condition, in terms of
            // the previous-iteration values, is nothing but really
            // (citations.length - (less + count) >= h + 1),
            // which is exactly the same as (citations.length - (h + 1) >= (less + count))
            // or (citations.length - h > (less + count)), but if that was false, then
            // (less + count >= citations.length - h) would be true on the previous
            // iteration, and the whole thing would have terminated earlier.
            if (less + count >= citations.length - h)
                return h;
            less += count;
        }
        return citations.length;
    }
    

    In case anyone wonders why this loop is different,

    for(int i=0;i<citations.length;i++){
        num = -citations[i]-1;  //occurrences of i
        if(num>0){
            if(i>=citations.length-(c+num-1)){
                return i>=citations.length-c ? citations.length-c : i;
            }
            c += num;
        }
    }
    

    this is because it skips those elements for which num == 0, and therefore it has to sort of "backtrack" to the correct value because it may have missed that earlier, and hence the i>=citations.length-c ? citations.length-c : i line. In my solution, I do redundant less += count instead, even if count == 0, but that helps me to avoid unnecessary branching and make the whole thing a bit shorter.


  • 0
    N

    You've got it man. Very detailed explanation by the way.


  • 0
    R

    Hi, I'm still confused, could you explain more of why we should return h when (less + count >= citations.length - h) ?


  • 0

    By definition, citations.length - h publications must have <= h citations. count is the number of publications having exactly h citations. Therefore, count + less is the number of publications having no more than h citations. Therefore, if count + less >= citations.length - h, then we satisfy half of the definition. We can't use == here because some publications may have exactly h citations and therefore fall into both "no more than" and "at least" categories.

    Now when we satisfied one half, what about the other? Do we have at least h publications that have at least h citations? Yes, because of the reasons explained in the comment (recursively: it's true for h == 0 and it must be true for h + 1). Alternatively, you can look at it this way: h must have some valid value, right? We terminate the loop when we have just enough publications with at least h citations. There is no point in continuing the loop because we'll just get more publications with at least h citations than we really need. On the other hand, h can't possibly be smaller because the loop would have terminated earlier. Therefore, h on the current iteration must be the right one.


  • 4

    @newdive Cool idea. Here's my Python solution based on it:

    def hIndex(self, citations):
        n = len(citations)
        for i, c in enumerate(citations):
            if c >= 0:
                citations[i] = ~0
                while c < n and citations[c] >= 0:
                    citations[c], c = ~1, citations[c]
                if c < n:
                    citations[c] -= 1
        count = 0
        for c, papers in enumerate(citations):
            count += ~papers
            if count >= n - c:
                return c
        return n

  • 0

    @StefanPochmann I'm just curious: what's the reason for writing -2 as ~1?


  • 1

    @SergeyTachenov Partly because I want to say "1 paper", and I think "~1" looks nicer for that than "-2", because it's the same digit. You don't need to remember that there's a shift by 1 and in which direction that shift is. And note that I use ~ consistently, also for ~0 and ~papers, the latter being shorter and simpler than -papers - 1 (again I don't need to think about whether I need to add or subtract 1, I just slap ~ onto it and I'm done).


  • 0

    It is clever to use negative sign to differentiate a count from a citation. Remind me of Problem 448.
    And to make room for a count in a particular index, we need to continuously do the same process on the following indexes until we see a negative number. The rest is simple, we decrease this negative number by one, which actually means increasing its count by 1.

    In the processed citation array:
    -1 means 0
    -2 means 1
    -3 means 2
    and so forth.

    Take citations = [3, 0, 6, 1, 5] for example. After the first for loop, the citation array will be something like as follows. The index of the array is regarded as number of citations. The value is the number of papers having the corresponding cites.
    index 0 1 2 3 4
    value-2 -2 -1 -2 -1
    True value 1 1 0 1 0

    Next, we use the processed citations array to find H-Index. I will explain based on @StefanPochmann 's code above. I make a slight change of the if condition from if count >= n - c to if c >= n - count because I think it's easier to understand.

    for c, papers in enumerate(citations):
            count += ~papers
            if c >= n - count:
                return c
        return n
    

    Note that n-count actually holds the number of papers with citations >= c+1. So if c < n-count, then we know c cannot be H-Index, because c+1 <= l-count, which basically says that there are more than c+1 papers with citation >= c+1, meaning that H-Index must be at least c+1.
    When we reach some c that makes c >= n-count, we know this c must be H-Index. The reason is that currently c >= n-count, so c+1>n-count. Recall that n-count actually holds the number of papers with citations >= c+1. Basically c+1>n-count says there are less than c+1 papers with citation >= c+1.


  • 0

    @StefanPochmann said in O(N) time , O(1)space solution:

    I'm

    I love the use of ~ for the conversion. It is self-consistent. Two ~ will get you the original value. Simple and neat.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.