Awesome solution using bit manipulation

  • 0
    • to reduce time cost, we use array instead of map
    • to further reduce time cost, we use count == 1 instead of set to avoid duplicates

    For further detailed explanation, please check my C version.

    class Solution {
        vector<string> findRepeatedDnaSequences(string s) 
            int sLen = s.length();
            vector<string> v;
            if(sLen < 11) return v;
            char keyMap[1<<21]{0};
            int hashKey = 0;
            for(int i = 0; i < 9; ++i) hashKey = (hashKey<<2) | (s[i]-'A'+1)%5;
            for(int i = 9; i < sLen; ++i)
                if(keyMap[hashKey = ((hashKey<<2)|(s[i]-'A'+1)%5)&0xfffff]++ == 1)
                    v.push_back(s.substr(i-9, 10));
            return v;

  • 1

    I think this code probably has a bug. Since you're using char[] as the counter array, if a particular 10-nucleotide sequence appeared more than 256 times and the counter overflows, you will include that sequence in your result more than once.

  • 0

    @pmdiano You are right about this, so here is a post I created handling this case.

Log in to reply

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.