I found an interesting OJ bug causing huge performance drop (about hashmap)


  • 0
    J

    My idea is like this: https://leetcode.com/discuss/20151/an-o-n-solution-with-detailed-explanation

    Here are two solutions with different usage of hashmap.

    The two solutions have the same steps but with little difference about hashmap usage. However the first solution costs 51ms and the second one costs 1300ms on OJ.

    Note: Both cost approximately same time on my VS2010.

    The first solution has the following core steps:

    1. Put every word in L into a hashmap called hashmap.

    2. Traverse S with two pointers, i and j. Put substring starting from j with the same length of word into a hashmap called temphash, and move j forward. for every substring already in temphash, add the string's value with 1. If the temphash has a string with value more than hashmap, move pointer i forward.

    The second solution is slightly different in step 2. Instead of inserting strings to temphash, second solution initialize temphash with hashmap, and for every substring, subtract the string's value with 1 in temphash. And when the substring's value in temphash is -1, move i forward.

    See? For every substring already in temphash, the first solution will set value plus 1 and the second solution minus 1.

    The consumed time? The first solution 51ms, and the second solution 1300ms.

    C++ AC code like this:

    First solution:

    class Solution {
    public:
    	vector<int> findSubstring(string S, vector<string> &L) {
    		vector<int> res;
    		unordered_map<string, int> hashmap, temphash;
    		int lenl = L.size();
    		if(lenl==0) return res;
    		int len_word = L[0].size();
    		for(int i=0; i<lenl; i++) {
    			hashmap[L[i]]++;
    		}
    		
    		int lens = S.size();
    		
    		for(int start=0; start<len_word; start++) {
    			int i=start, j=start; 
    			temphash.clear();
    			while(j<=lens-len_word){
    				string newword = S.substr(j, len_word);
    				if(hashmap.count(newword) == 0) {
    					i = j + len_word;
    					j = i;
    					temphash.clear(); // re-init temphash to original
    					continue;
    				}
    				
    				temphash[newword]++;
    				j += len_word;
    				// newword is in dict, but might not be enough
    				while(temphash[newword] > hashmap[newword]) {
    					temphash[S.substr(i, len_word)]--;
    					i += len_word;
    				}
    
    				if(j-i==lenl*len_word) { // found a substring
    					res.push_back(i);
    					temphash[S.substr(i, len_word)]--;
    					i += len_word;
    				}
    			}
    		}
    		return res;
    	}
    };
    

    Second solution:

    class Solution {
    public:
    	vector<int> findSubstring(string S, vector<string> &L) {
    		vector<int> res;
    		unordered_map<string, int> hashmap, temphash;
    		int lenl = L.size();
    		if(lenl==0) return res;
    		int len_word = L[0].size();
    		for(int i=0; i<lenl; i++) {
    			hashmap[L[i]]++;
    		}
    		temphash = hashmap;
    		int lens = S.size();
    		
    		for(int start=0; start<len_word; start++) {
    			int i=start, j=start; 
    			temphash = hashmap;
    			while(j<=lens-len_word){
    				string newword = S.substr(j, len_word);
    				if(temphash.count(newword) == 0) {
    					i = j + len_word;
    					j = i;
    					temphash = hashmap; // re-init temphash to original
    					continue;
    				}
    				
    				temphash[newword]--;
    				j += len_word;
    				// newword is in dict, but might not be enough
    				while(temphash[newword] == -1) {
    					temphash[S.substr(i, len_word)]++;
    					i += len_word;
    				}
    
    				if(j-i==lenl*len_word) { // found a substring
    					res.push_back(i);
    					temphash[S.substr(i, len_word)]++;
    					i += len_word;
    				}
    			}
    		}
    		return res;
    	}
    };

  • 0
    D

    The reason that second solution is slow could be that too much time is spent on copying the hashmap to temphash.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.