# Evolve from straightforward solution to optimal, a review of all solutions

1. It is straightforward to use one hashtable to store all sequences and another hashtable to store repeated sequences
``````    vector<string> findRepeatedDnaSequences(string s) {
unordered_set<string> seq, res;
for(int i=0;i<(int)s.size()-9;i++) {
string sub = s.substr(i,10);
if(!seq.insert(sub).second) res.insert(sub);
}
return vector<string>(res.begin(),res.end());
}
``````
1. The above solution does not use the information that the string contains only ACGT. Since each substr of length 10 contains only 4 distinct characters, we can encode each char with 2 bits and a substr with 20 bits to save memory.
``````    vector<string> findRepeatedDnaSequences(string s) {
char ht[85] = {0};
ht['C'] = 1;
ht['G'] = 2;
ht['T'] = 3;
vector<string> res;
int sub=0;
for(int i=0;i<s.size();i++) {
sub = (sub<<2)+ht[s[i]] & 0xfffff;
if(i>8 && !seq.insert(sub).second && add.insert(sub).second) res.push_back(s.substr(i-9,10));
}
return res;
}
``````
1. Two things can still be improved from #2. First, the ascii code of ACGT are differnt in the last 3 digits. So we can just use it to encode without the additonal coding table. Second, we can count the occrences of each substr and add to the result when it reaches 2. So we only need 1 hashtable. Idea is from @JayXon
``````     vector<string> findRepeatedDnaSequences(string s) {
unordered_map<int,int> ct;
vector<string> res;
int sub=0;
for(int i=0;i<s.size();i++) {
sub = (sub<<3)+ (s[i] & 7) & 0x3fffffff;
if(ct[sub]++==1) res.push_back(s.substr(i-9,10));
}
return res;
}
``````
1. We actually do not need to count the occurences of a substr. We just need to know if it is a repeated and if it has been added to the result. So we can change the value of the hashtable to bool to further save memory.
``````    vector<string> findRepeatedDnaSequences(string s) {
vector<string> res;
int sub=0;
for(int i=0;i<s.size();i++) {
sub = (sub<<3)+ (s[i] & 7) & 0x3fffffff;