Python, no recursion, O(n)


  • 1
    C
    class Solution(object):
    def removeDuplicateLetters(self, s):
    
        indexes={}        
        for i in xrange(len(s)-1,-1,-1):
            if s[i] in indexes:
                indexes[s[i]].append(i)
            else:
                indexes[s[i]]=[i]
    
        cList=sorted(indexes.keys())
        result=''
        while len(result)<len(indexes):
            bottom=min([indexes[c][0] for c in cList])
            candidate=0
            while indexes[cList[candidate]][-1]>bottom:
                candidate+=1
            result+=cList[candidate]
            cList=cList[:candidate]+cList[candidate+1:]
    
            for c in cList:
                while indexes[c][-1]<indexes[result[-1]][-1]:
                    indexes[c].pop()
        return result
    

    Here is the version with annotation:

    class Solution(object):
    def removeDuplicateLetters(self, s):
        """
        :type s: str
        :rtype: str
        """
        indexes={}
        
        #construct a dict, to store a list of indexes for each letter.  
        for i in xrange(len(s)-1,-1,-1):
            if s[i] in indexes:
                indexes[s[i]].append(i)
            else:
                indexes[s[i]]=[i]
    
        # make a list of letters in this string, sorted.
        cList=sorted(indexes.keys())
    
        result=''
        # construct the result string until all letters are included
        while len(result)<len(indexes):
            # bottom is the boundary of the range to pick a letter.  The definition of bottom
            # is the last point beyound (and including itself) which there is at least one instance of 
            # each letter.  Obviously, if we take the indexes of the last instance of each letter, then
            # bottom is the min of these indexes.
            # it is trivial to get the last instances since they are just the first element of each list
            # in the indexes dict
            bottom=min([indexes[c][0] for c in cList])
    
            # obviously clearly, for the first element, we will have to pick before or at bottom
            # otherwise, we will not be able to include all letters.  And, in this [0, bottom] range, 
            # we will prefer to pick the smallest letter that's available.
    
            # we use candidate to reach into cList, and since cList is sorted, we just need to find 
            # the first element in cList that has an instance before bottom.  Conveniently, we 
            # just need to check the [-1] instance for each letter in cList, since that's the smallest 
            # index.  
            candidate=0
            while indexes[cList[candidate]][-1]>bottom:
                candidate+=1
    
            # once candidate is found, append it to result
            result+=cList[candidate]
    
            # cut the candidate out of cList, we are not going to search this again in the furture.
            cList=cList[:candidate]+cList[candidate+1:]
    
            # all future candidates will be generated only after the current candidate index, 
            # so, pop all the earlier instances of all unused letters from indexes lists.
            for c in cList:
                while indexes[c][-1]<indexes[result[-1]][-1]:
                    indexes[c].pop()
    
            #start again in the new range, and repeat the process for finding the first letter
    
        return result

  • 1

    That's only O(n^2), not O(n).

    Edit: Not anymore now.


  • 0
    C

    I think this should be O(n). Both indexes and cList are at most 26 in length. For the loop part:
    for c in cList:
    while indexes[c][-1]<indexes[result[-1]][-1]:
    indexes[c].pop()
    Since every element can only be popped at most once during the whole execution, so the total time spent at this loop is O(n). Thus, the outer while loop is O(n) excluding this for-loop, and this for-loop itself is O(n) in total, adding together is still O(n). Right? Am I missing something else?


  • 1

    Yes, it's indeed something else that you're missing :-)

    Just to show that it really is quadratic:

    from time import time
    for e in range(10, 18):
        n = 2**e
        s = 'a' * n
        t0 = time()
        Solution().removeDuplicateLetters(s)
        t = time() - t0
        print '%7.3f seconds for n = %d' % (t, n)
    

    That prints something like this:

      0.006 seconds for n = 1024
      0.020 seconds for n = 2048
      0.090 seconds for n = 4096
      0.320 seconds for n = 8192
      1.168 seconds for n = 16384
      5.928 seconds for n = 32768
     22.374 seconds for n = 65536
    111.239 seconds for n = 131072
    

    Doubling n should roughly double the time if the solution were O(n). But the time roughly quadruples, indicating O(n^2).


  • 0
    C

    Thank you for testing, Stefan. I copied your testing code, and here is the result I got:

    0.001 seconds for n = 1024
      0.002 seconds for n = 2048
      0.002 seconds for n = 4096
      0.004 seconds for n = 8192
      0.009 seconds for n = 16384
      0.016 seconds for n = 32768
      0.033 seconds for n = 65536
      0.081 seconds for n = 131072
    

    This looks to me is doubling time with doubling input. I'm not sure why we got different results...could anyone else test it?


  • 0
    C

    My bad. I got it. It is in the first for-loop. On my own computer, it is:

        for i in xrange(len(s)-1,-1,-1):
            if s[i] in indexes:
                indexes[s[i]].append(i)
            else:
                indexes[s[i]]=[i]
    

    I will change the posted version. Thank you so much!


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.