8ms of java solution


  • 3
    N

    In my solution,I use 2 bit to represent A(0), C(1), G(2), T(3), So a DNA sequence of 10 letters would take up 20 bits , which could be represented by an integer in range [0, 0xFFFFF]

    According to this , i pre-allocate an array of (0xFFFFF/64 = 16384 ) longs , and use bitset method to mark and search recurrence of DNA sequence.

    Here is my implementation

    (If anyone has better idea or implementation,please don't hesitate to share it !)

    public class Solution {
        private static final byte[]TIDE_VALUE = new byte[26];
        static{
            TIDE_VALUE['A'-'A'] = 0;
            TIDE_VALUE['C'-'A'] = 1;
            TIDE_VALUE['G'-'A'] = 2;
            TIDE_VALUE['T'-'A'] = 3;
        }
         //for fast cleanup of other mark array
        private static final long[]dummyMrk = new long[16384]; 
        // mark recurrence of DNA sequence in molecule
        private static final long[]mrk = new long[16384];    
        // mark if DNA sequence has been added before
        private static final long[]addMrk = new long[16384];  
    
        public List<String> findRepeatedDnaSequences(String s) {
            ArrayList<String>dupSeqs = new ArrayList<String>();
         
            if(s.length()>=10){
                //String.charAt will be slower than char array access 
                char[]sChars = s.toCharArray();  
                int i=0, dnaSeqRep=0, pos=0;
                System.arraycopy(dummyMrk,0,mrk,0,dummyMrk.length);  // clear up mark
                System.arraycopy(dummyMrk,0,addMrk,0,dummyMrk.length); // clear up mark
                for(;i<10;i++){
                    pos = i<<1;
                    dnaSeqRep |= TIDE_VALUE[sChars[i]-'A']<<pos;
                }
                mrk[dnaSeqRep>>6] |= 1L<<(dnaSeqRep&0x3f);
                for(;i<sChars.length;i++){
                    dnaSeqRep >>= 2;
                    dnaSeqRep |= TIDE_VALUE[sChars[i]-'A']<<pos;
                    int idx = (dnaSeqRep>>6);
                    long bitRep = 1L<<(dnaSeqRep&0x3f);
                    //if the sequence has a duplicate and haven't been added before
                    if((mrk[idx]&bitRep)!=0 ){
                        if((addMrk[idx]&bitRep)==0 ){
                            addMrk[idx] |= bitRep;
                            dupSeqs.add(s.substring(i-9,i+1)); 
                        }
                    }else{
                        mrk[idx] |= bitRep;
                    }
                }
            }
            return dupSeqs;
        }
    }

  • 2
    O

    my implementation is more straightforward. 12ms. Don't know why its slower than you.

    public class Solution {
        private static final byte[] t = new byte[128];
        static {
            t['A'] = 0; t['C'] = 1; t['G'] = 2; t['T'] = 3;
        }
    
        public List<String> findRepeatedDnaSequences(String s) { //12ms 99.77%
            boolean[] has = new boolean[1048576];
            boolean[] written = new boolean[1048576];
            List<String> ans = new ArrayList();
            char[] c = s.toCharArray();
            int n = c.length, cur = 0;
            if (n < 10) return ans;
            for (int i = 0; i < 9; i++)
                cur = (cur << 2) | t[c[i]];
            for (int i = 9; i < n; i++) {
                cur = ((cur << 2) | t[c[i]]) & 0xFFFFF;
                if (has[cur]) {
                    if (!written[cur]) {
                        ans.add(s.substring(i - 9, i + 1));
                        written[cur] = true;
                    }
                } else
                    has[cur] = true;
            }
            return ans;
        }
    
    /* 44ms 46%
        public List<String> findRepeatedDnaSequences(String s) {
            HashMap<Integer, Boolean> h = new HashMap();
            List<String> ans = new ArrayList();
            char[] c = s.toCharArray();
            int n = c.length;
            if (n < 10) return ans;
            for (int i = 0, cur = 0; i < n; i++) {
                cur = ((cur << 3) | (c[i] & 7)) & 0x3FFFFFFF;
                if (h.containsKey(cur)) {
                    if (!h.get(cur)) {
                        ans.add(s.substring(i - 9, i + 1));
                        h.put(cur, true);
                    }
                } else
                    h.put(cur, false);
            }
            return ans;
        }
        */
    }

  • 0
    N

    @newdive SMART


  • 0
    N

    @ofLucas
    boolean[] has = new boolean[1048576];// 1048576 = 1>>20;
    but 8ms' final long[] mrk = new long[16384];//16384 = 1<<14
    1048576>16384
    I think your code wastes addressing arrays.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.