7 lines simple Java, O(n)


  • 154
    public List<String> findRepeatedDnaSequences(String s) {
        Set seen = new HashSet(), repeated = new HashSet();
        for (int i = 0; i + 9 < s.length(); i++) {
            String ten = s.substring(i, i + 10);
            if (!seen.add(ten))
                repeated.add(ten);
        }
        return new ArrayList(repeated);
    }

  • 0
    O

    why don't you directly let repeated be a arraylist


  • 0
    O

    oh I got it. Nice work. Thank you


  • 2
    A

    Why we can't use an array list to store the results? I got the error when using the arraylist to store the results when in put is "AAAAAAAAAAAA"


  • 0
    O

    the first set is only for checking string is repeated or not, but one string can have several same repeated strings like the ex. you given, list can store repeated strings, but set can not, so use another set instead of array list


  • 1
    Z

    If one string repeats more than twice, then the arraylist will contain duplicates. So, a hashset is needed to avoid duplicates.


  • 0
    M

    same as my code. I really don't understand why they use bit manipulation???


  • 0
    N

    In this approach you end up storing, (n-10) times 10 character length strings in the memory . If you use bitwise you can represent this 10 character string to a integer and save in the Set . Which reduces the space complexity .


  • 0

    nice! it's cleaner than 'rolling hash' method


  • 0

    Thats very smart! I'm impressed. I used indexOf() to check if their occurrence are more than once. But it costs O(n) time. So my solution TLE. Yours is just use another set to see if it is repeated or not.


  • 1
    D

    How is it O(n) ? There is a String#substring call also which will be O(m) where m is the length of pattern. Shouldn't it be O(mn).
    Are are you considering m to be a very small constant which doesn't affect the complexity ?
    Or, are you taking String#substring as O(1) operation ?
    Kindly suggest. Thanks.


  • 4

    @dmr Yes, the substring length is the constant 10 here.


  • 0
    D

    @StefanPochmann Alright. Thanks.


  • 4
    P

    This will not work.
    If I have AAAAAAAAAAA (11 A's)
    When i = 0, ten = AAAAAAAAAA (first 10 A's). It is added to seen.
    when i = 1, ten = AAAAAAAAAA (last 10 A's). Since tem is seen, it is added to repeated.

    But in theory these are overlapping sequence and hence are not repeated sequence. But your solution will give it as repeated sequence.


  • 1

    @zizizi this answers "aaabbbjj" question precisely.


  • 2

    @peddinti If you enter your "AAAAAAAAAAA" as custom test and click "Run Code", you'll see that the expected answer is ["AAAAAAAAAA"]. So my solution does what's expected here.


  • 0

    Just curious about this solution. This is the "brute force" approach, correct? When I tried this I get TLE and went ahead with the bit representation solution which is simple in concept as well but definitely trickier to code. Can you explain why your solution did not get TLE. Am I missing something? Thanks!


  • 0
    I

    TLE, check string in hashset seems costs more than O(1)


  • 0
    Y

    I have a similar solution, but it meets run time exceeds, could anyone tell me why? Thanks!

    public class Solution {
        public List<String> findRepeatedDnaSequences(String s) {
            Set<String> set = new HashSet<String>(), repeat = new HashSet<String>();
            for(int i = 0; i<s.length()-10+1; i++){
                String temp = s.substring(i, i+10);
                if(!set.add(temp) && !ret.contains(temp)){
                    ret.add(temp);
                } else{
                    set.add(temp);
                }
            }
            return ret;
        }
    }

  • 0

    @ynteng That gets Compile Error.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.