Simple O(n) with Explanation - Just walk each streak


  • 258

    First turn the input into a set of numbers. That takes O(n) and then we can ask in O(1) whether we have a certain number.

    Then go through the numbers. If the number x is the start of a streak (i.e., x-1 is not in the set), then test y = x+1, x+2, x+3, ... and stop at the first number y not in the set. The length of the streak is then simply y-x and we update our global best with that. Since we check each streak only once, this is overall O(n). This ran in 44 ms on the OJ, one of the fastest Python submissions.

    def longestConsecutive(self, nums):
        nums = set(nums)
        best = 0
        for x in nums:
            if x - 1 not in nums:
                y = x + 1
                while y in nums:
                    y += 1
                best = max(best, y - x)
        return best

  • 3
    A

    excellent method


  • 1
    K

    Shouldn't the insertion into set be O(nlogn) for inserting n elements, since set maintains them into sorted order ?


  • 2

    No, Python's set doesn't maintain them into sorted order. Are you thinking about a different language, C++ maybe?


  • 1
    K

    Yes, actually I assumed python set is same as C++ set. Thanks for correcting me.


  • 3
    H

    This algo can be O(n^2) in the worst case, I think. Suppose nums are 1, 2, 3, 5, … n, then the complexity is n^2, right?


  • 1

    No. How would that cost n^2?


  • 1
    H

    I was wrong. I missed the line "if n - 1 not in nums:". Sorry to bother.


  • 6
    C

    Nice idea for checking "if n - 1 not in nums".


  • 0
    W

    if the nums in assending order, like 1, 2, 3...this should return best = 0, and if the nums in descending order, like n, n-1, n-2,... the running time would be O(n^2). Is this correct? I guess the correctness of this algorithm is based on the random order when convert the nums in to set? I am not sure if I am right, please correct if I was wrong.


  • 1

    No, you just don't understand the algorithm.


  • 28
    Y

    Just translated your answer to Java, your solution is awesome.

       public int longestConsecutive(int[] nums) {
                Set<Integer> set = new HashSet<>();
                for(int n : nums) {
                    set.add(n);
                }
                int best = 0;
                for(int n : set) {
                    if(!set.contains(n - 1)) {  // only check for one direction
                        int m = n + 1;
                        while(set.contains(m)) {
                            m++;
                        }
                        best = Math.max(best, m - n);
                    }
                }
                return best;
            }

  • 0
    S

    This is the cleanest solution I ever seen for this Longest consecutive sequence problem!


  • 0

    your algorithm is pretty good. However, it needs two run. if it is a datastream. it does not work.


  • 0

    @xuyirui It does work with a datastream as well, as my first line reads it all into my set.


  • 1

    @StefanPochmann According to https://wiki.python.org/moin/TimeComplexity, the operation x in s for an element x with respect to the set s is O(n) worst case. You are doing this for every element in the set nums. Assume the input list contained all distinct elements. Would then your algorithm not run in O(n**2) worst case? Or am I seriously missing something?

    Take the case:

    [1,2, 4,5, 7,8, 10,11]
    

    Half of the time, your loop breaks at while m in nums: and the other half of the time, it breaks one level higher at if n - 1 not in nums:. Would you say each of these lookups are essentially O(1)?

    For a language like C++, I found that simply sorting the list and running a linear scan through it was much faster (12 ms) compared to building an ordered set and going through it like in your solution (20 ms).1 Both run in O(nlogn).

    Honestly I find it hard to imagine that one can manage to extract any kind of answer going though this list without incurring anything short of a O(nlogn) cost (unless we I used hashed data structures).


    1 I accidentally discovered this when my friend did the exact same thing as you, albeit in C++.


  • 0

    @babhishek21 Quick question first: What do you mean with "unless we used hashed data structures"?


  • 0

    @StefanPochmann Something like unordered_set in C++.

    Aahhh. I see. My best solution is O(n) in average case and reduces to O(n**2) in worst case. Is that what you've been trying to say in the title?

    Additionally, what I'm trying to say is that maybe we cannot do any better than O(nlogn) worst case. In which case is it right to say that a solution is O(n)?


  • 1

    @babhishek21 Sorry, I didn't mean what data structure, I meant the statement. Your "anything short of a O(nlogn) cost (unless we used hashed" sounded like you're saying that hashed data structures would lead to beating O(nlogn), contradicting the only O(n) per operation that you said right before.

    Anyway... as far as I see it, "normal" hash tables aren't really O(1) per operation, maybe not even amortized O(1), but everybody acts like they are. I think there's a common understanding that we're not being strict, and hash tables are the only exception I know where it seems common and acceptable. I guess it's because in practice we do get at least amortized O(1) and because they're not the topic at hand but merely used as a basic building block in the "actual" algorithm. At least that's my theory. I doubt that when the hash table itself is the topic, e.g., when someone writes a paper about a hash table, they'll just say "O(1)". Also, maybe there actually are hash tables that do give us real worst case or amortized O(1) for our needs, there's some stuff about that on Wikipedia's hash table article.


  • 0

    @StefanPochmann said in Simple O(n) with Explanation - Just walk each streak:

    @babhishek21 Sorry, I didn't mean what data structure, I meant the statement. Your "anything short of a O(nlogn) cost (unless we used hashed" sounded like you're saying that hashed data structures would lead to beating O(nlogn), contradicting the only O(n) per operation that you said right before.

    Sorry about that. I meant to say I used an unordered_set in my solution.

    Anyway... as far as I see it, "normal" hash tables aren't really O(1) per operation, maybe not even amortized O(1), but everybody acts like they are. I think there's a common understanding that we're not being strict, and hash tables are the only exception I know where it seems common and acceptable. I guess it's because in practice we do get at least amortized O(1) and because they're not the topic at hand but merely used as a basic building block in the "actual" algorithm. At least that's my theory. I doubt that when the hash table itself is the topic, e.g., when someone writes a paper about a hash table, they'll just say "O(1)". Also, maybe there actually are hash tables that do give us real worst case or amortized O(1) for our needs, there's some stuff about that on Wikipedia's hash table article.

    Thanks for clarifying. As I said earlier all of this started when I couldn't explain to my friend why a basic O(nlogn) sorting based solution ran faster (~8ms) over my supposedly O(n) unordered_set based solution.

    Later, all I could do was tell him that the test cases might be so small, the gains from using a unordered_set were lost due the copy constructor overhead that it incurred.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.