Two solutions - bit manipulation and python substring (second is a lot faster)


  • 0
    O
     def findRepeatedDnaSequences(self, s):
            res,codeset = set(), set()           
            charmap = {"A":0,"C":1,"G":2,"T":3}
            for i in xrange(0,len(s)-9):
                code = 0
                for j in xrange(i,i+10):
                    code <<= 2
                    code |= charmap[s[j]]
                
                if code in codeset:
                    res.add(s[i:i+10])
                else:
                    codeset.add(code)                
            return list(res)
    

    I thought this bitwise operations would be faster but I was wrong. The next solution beats 82%.

    def findRepeatedDnaSequences(self, s):
    
            res,codeset = set(), set()
            for i in xrange(0,len(s)-9):
                if s[i:i+10] in codeset:
                    res.add(s[i:i+10])
                else:
                    codeset.add(s[i:i+10])
            print list(res)
            return list(res)

  • 0
    Z

    Your 1st solution is slow is because it's not optimized. in your for loop, every time you loop through all 10 characters and do encoding on all of them. 9 out of 10 operations are redundant as you only need to encode the new incoming char in each loop. Please check my solution which beats around 80% (there is run-to-run variance):

    def findRepeatedDnaSequences(self, s, nucleMap = {'A': 0, 'C': 1, 'G': 2, 'T': 3}):
        """
        :type s: str;; rtype: List[str]
        :dual sets
        """
        if len(s) < 11: return []        
        enc = 0
        #prime the encoded 10-char-long nucleotides
        for i in xrange(10): enc = ((enc << 2) & 0xFFFFF) | nucleMap[s[i]]
        found = {enc,}
        foundAgain = set([])
        for i in xrange(10, len(s)):
            #pop head and add tail
            enc = ((enc << 2) & 0xFFFFF) | nucleMap[s[i]]
            if enc in found: foundAgain.add(s[i-9:i+1])
            else:            found.add(enc)
        return list(foundAgain)

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.