Clear solution Traverse each character once (JAVA)


  • 4

    This algorithm use hashmap and bit operation.

    we use '00' -> 'A', '01' -> 'C', '10' -> 'G', '11' -> 'T' to save memory.

    Here is the code:

    public class Solution {
        public List<String> findRepeatedDnaSequences(String s) {
            int len = s.length();
            if(len <= 10){
                return new ArrayList<String>();
            }
            
            List<String> result = new ArrayList<String>();
            //to store 10-letter-long sequence information
            int sequence = 0;
            //creat a mask to clear character move out side
            //where clearMask = 1111 1111 1100 1111 1111 1111 1111 1111
            int clearMask = ~(3 << 20);
            Map<Integer,Byte> recordMap = new HashMap<Integer,Byte>();
            
            //initial sequence
            for(int i = 0; i < 10; i++){
                sequence = readChar(s.charAt(i), sequence, clearMask);
            }
            
            //put initial pattern into hash map
            recordMap.put(sequence, (byte) 0);
            //find all pattern in the DNA string
            for(int i = 10; i < len; i++){
                sequence = readChar(s.charAt(i), sequence, clearMask);
                if(recordMap.containsKey(sequence)){
                    if(recordMap.get(sequence) == 0){
                        recordMap.put(sequence, (byte) 1);
                        result.add(s.substring(i - 9, i + 1));
                    }
                }
                else{
                    recordMap.put(sequence, (byte) 0);
                }
            }
            return result;
        }
        
        public int readChar(char c, int seq, int mask){
            //move record sequence left 2 position make room for this character
            seq <<= 2;
            //clear character which may out of bound
            seq &= mask;
            switch(c){
                    /* It is no need to do the 'or' operater
                       because we use "00" to present character 'A'
                    case 'A':
                        seq |= 0;
                        break;
                    */
                    case 'C':
                        seq |= 1;
                        break;
                    case 'G':
                        seq |= 2;
                        break;
                    case 'T':
                        seq |= 3;
                        break;
            }
            return seq;
        }
    }

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.