Awesome solution using bit manipulation


  • 0
    • to reduce time cost, we use array instead of map
    • to further reduce time cost, we use count == 1 instead of set to avoid duplicates

    For further detailed explanation, please check my C version.

    class Solution {
    public:
        vector<string> findRepeatedDnaSequences(string s) 
        {
            int sLen = s.length();
            vector<string> v;
            if(sLen < 11) return v;
            char keyMap[1<<21]{0};
            int hashKey = 0;
            for(int i = 0; i < 9; ++i) hashKey = (hashKey<<2) | (s[i]-'A'+1)%5;
            for(int i = 9; i < sLen; ++i)
            {
                if(keyMap[hashKey = ((hashKey<<2)|(s[i]-'A'+1)%5)&0xfffff]++ == 1)
                    v.push_back(s.substr(i-9, 10));
            }
            return v;
        }
    };
    

  • 1
    P

    I think this code probably has a bug. Since you're using char[] as the counter array, if a particular 10-nucleotide sequence appeared more than 256 times and the counter overflows, you will include that sequence in your result more than once.


  • 0

    @pmdiano You are right about this, so here is a post I created handling this case.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.