Why not just simply use a HashSet to track duplicate? (JAVA solution)


  • 0
    S
    public List<String> findRepeatedDnaSequences(String s) {
    	
    	Set<String> set = new HashSet<String>();
    	
    	List<String> out = new ArrayList<String>();
    	
    	if (s.length() <= 10) return out;
    	
    	int n = s.length();
    	
    	for (int i = 0; i <= n - 10; ++i) {
    		String cur = s.substring(i, i + 10);
    		if (!set.add(cur)) {
    			if (!out.contains(cur))
    				out.add(cur);
    			continue;
    		}
    	}
    	
    	return out;
        
    }
    

    I've seen a lot of answers which write methods or functions to convert the 10-letter-long strings to integers and then compare whether they are equal or not. They used HashMap to track duplicates. Why not just use a simple set to track duplicates and once the sequence has appeared, store it into the output list?
    The above answer was accepted as well. Could someone help? Thank you.


  • 0
    A

    Yes, your idea is simple and effective. I write your idea in Python and only 124ms, which is faster than converting string to number

    class Solution:
        # @param {string} s
        # @return {string[]}
    	def findRepeatedDnaSequences(self, s):
    		result = []
    		hash = set()
    		inResult = set()
    		bound = len(s) - 9
    		if bound < 2:	return result
    		for i in xrange(0,bound):
    			if s[i:i+10] in hash and s[i:i+10] not in inResult:
    				result.append(s[i:i+10])
    				inResult.add(s[i:i+10])
    			else:
    				hash.add(s[i:i+10])
    		return result

  • 0
    P

    I noticed that too. First I thought may be it is because of substring() is very expansive, and sometimes you get the MLE. But I also noticed that substring() was used in their codes too. Can someone explain why they use HashMap please.


  • 0

    Um... at least the top-voted Java solutions do use a set for that, not a map. What are you talking about?

    Also, your solution is mighty inefficient, thanks to the out.contains(cur) test. To speed that up, you could use a second set, containing the duplicates. Or... you could just use one map. In C++ in particular, ctr[...]++ == 1 is a very neat way to combine it all.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.