Easy to understand java solution with well commented code.


  • 13
    S

    This solution is inspired. I worked to understand it myself and commented the code.

    public List<String> findRepeatedDnaSequences(String s) {
            Set<Integer> firstTime = new HashSet<Integer>();
            Set<Integer> secondTime = new HashSet<Integer>();
            List<String> list = new ArrayList<String>();
            
            char[] map = new char[26];
            int len = s.length();
            
            // Hashing function. We have only 4 letters which we can represent by 2 bits.
            map['A' - 'A'] = 0; // A = 00
            map['C' - 'A'] = 1; // B = 01
            map['G' - 'A'] = 2; // C = 10
            map['T' - 'A'] = 3; // D = 11
            
            for(int i=0; i<= len - 10; i++)
            {
                int sequence = 0;
                for(int j=i; j< i+10; j++)
                {
                    // Shift existing sequence by two to make space for the new character coming
                    sequence = sequence << 2;
                    
                    // Copy the character from the map and paste those two bits in the newly created space. Read bit wise OR.
                    sequence = sequence | map[s.charAt(j) - 'A'];
                }
                
                // For this number to be added in the list, this should be the second time this number is appearing
                // For this if condition to be true, firstTime.add() should be false.
                // firstTime.add() will be false when there is already the same number present.
                // How it will behave?
                // First time - firstTime.add(sequence) will return T
                // !firstTime.add(sequence) will become F
                // secondTime.add(sequence) will NOT be executed
                
                // Second time addition: 
                // First time - firstTime.add(sequence) will return F
                // !firstTime.add(sequence) will become T
                // secondTime.add(sequence) will be executed
                if(!firstTime.add(sequence) && secondTime.add(sequence))
                {
                    list.add(s.substring(i, i+10));
                }
            }
            
            return list;
        }

  • 2
    J

    for easier reading:

    public class Solution {
       public List<String> findRepeatedDnaSequences(String s) {
            Set<Integer> firstTime = new HashSet<Integer>();
            Set<Integer> secondTime = new HashSet<Integer>();
            List<String> list = new ArrayList<String>();
    
            for(int i=0; i<= s.length() - 10; i++)
            {
                int sequence = encoding(s.substring(i, i+10));
    
                if(!firstTime.add(sequence) && secondTime.add(sequence))
                {
                    list.add(s.substring(i, i+10));
                }
            }
    
            return list;
        }
    
        public int encoding(String s){
            char[] map = new char[26];
            int r = 0; // 32 bit int can represent 16 encoded chars
            if(s.length() != 10) return -1;
            map['A' - 'A'] = 0; // A = 00
            map['C' - 'A'] = 1; // C = 01
            map['G' - 'A'] = 2; // G = 10
            map['T' - 'A'] = 3; // T = 11
            for (int i=0; i<s.length(); i++){
                int temp = map[s.charAt(i) - 'A'];
                r = r<<2;
                r = r|temp;
            }
            return r;
        }
    }

  • 0
    S

    Building on top of san89kalp's solution, but without using a nested for:

    public List<String> findRepeatedDnaSequences(String s) {
        Set<Integer> firstTime = new HashSet<Integer>();
        Set<Integer> secondTime = new HashSet<Integer>();
        List<String> list = new ArrayList<String>();
    
        char[] map = new char['Z'];
        int len = s.length();
        if (len < 10)
            return list;
    
        // Hashing function. We have only 4 letters which we can represent by 2 bits.
        map['A'] = 0; // A = 00
        map['C'] = 1; // B = 01
        map['G'] = 2; // C = 10
        map['T'] = 3; // D = 11
        
        int sequence = 0;
    
        // Let's compute the hash for the first 10
        for(int j=0; j< 10; j++)
        {
            // Shift existing sequence by two to make space for the new character coming
            sequence = sequence << 2;
    
            // Copy the character from the map and paste those two bits in the newly created space. Read bit wise OR.
            sequence = sequence | map[s.charAt(j)];
        }
        
        firstTime.add(sequence);
    
        for(int i=10; i< len; i++)
        {
            sequence = sequence & ((1 << 18) - 1); // nillify the 2 MSBs
            sequence = sequence << 2; // shift to make room
            sequence = sequence | map[s.charAt(i)]; // set the 2 LSBs
    
            String target = s.substring(i-10 + 1, i + 1);
            if(!firstTime.add(sequence) && secondTime.add(sequence))
            {
                list.add(target);
            }
        }
    
        return list;
    }

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.