C++ Solution Based on Pointer


  • 2
    T

    This is a solution which may apply to all kinds of strings (not only strings with ATCG charactors). It is a little slow and also not very short. I just want to show another way to solve this problem.

    The idea is simple... First, it is easy to write out this codes:

    class Solution {
    public:
        vector<string> findRepeatedDnaSequences(string s) {
            set<string> c, r;
            for (int i = 0, l = s.length(); i < l - 9; i++) {
                string sub = s.substr(i, 10);
                if (!c.insert(sub).second) {
                    r.insert(sub);
                }
            }
            return vector<string>(r.begin(), r.end());
        }
    };
    

    But it got MLE instead of AC. So, what we should do is reducing the memory usage. One way to do this is storing only the pointer for starter of string, so the original string could be reused. After some changes the code looks like this:

    class Solution {
        template <int N>
        class NChar {
        public:
            const char* val;
            NChar(const char* val) : val(val) {}
            bool operator<(const NChar<N>& that) const {
                return memcmp(val, that.val, N) < 0;
            }
            operator string() const {
                return string(val, val + N);
            }
        };
        typedef NChar<10> TenChar;
    public:
        vector<string> findRepeatedDnaSequences(string s) {
            set<TenChar> c, r;
            const char* cs = s.c_str();
            for (int i = 0, l = s.length(); i < l - 9; i++) {
                TenChar sub(cs + i);
                if (!c.insert(sub).second) {
                    r.insert(sub);
                }
            }
            return vector<string>(r.begin(), r.end());
        }
    };
    

    This code could get AC. Maybe it is much slower than the version which use integer to compare. But I think the performance should be acceptable.


  • 1
    X

    I encounter the same problem with you, but I use bitset instead to reduce memory, the code is clean and efficient. It only takes 19ms.

    class Solution {
    public:
    vector<string> findRepeatedDnaSequences(string s) {
        if (s.size()<10) return vector<string>();
        bitset<1048576> cou;
        set<string> temp;
        int index = 0;
        for (int i=0; i<10; ++i){
            index = index * 4 + decode(s[i]); 
        }
        cou.set(index);
        for(int i=10; i<s.size();++i){
            index = index % 262144;
            index = index * 4 + decode(s[i]);
            if (cou.test(index))
                temp.insert(s.substr(i-9, 10));
            else
                cou.set(index);
        }
        return vector<string>(temp.begin(), temp.end());
    }
    
    int decode(char c){
        switch (c){
            case 'A': return 0;
            case 'C': return 1;
            case 'G': return 2;
            case 'T': return 3;
        }
    }
    

    };


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.