Accepted Java solution 12ms with explanation


  • 68
    S

    It's not too hard to find some resemblance between this problem and minimum-window-substring. Actually the main difference is the fact that we are interested at some interval length: we want intervals with fixed length K * M, where K is the number of strings in the "words" array and M the length of each target string. In order to apply the same idea we used for that problem, all we need to do is to map each string from the "words" array to something we are able to index (I prefer to use hashing for this). Also, in order to speed up the algorithm, we can find all occurrences of those strings in S (which is equivalent to do it on demand, but we will potentially do the same matching twice). Notice that, we can simply apply these occurrences as they appear because we are assured that no word is contained by some other. Finally, we use all this information to process each possibility. Notice here that, the fact that all strings has the same length, implies that we have just M (being M the length of each target string) possible starting points, hence we end up performing M linear scans over array with length O(N/M) (being N the length of S) and that makes the scanning stage of the algorithm to be linear on the length of S.

    public List<Integer> findSubstring(String s, String[] words) {
    	int N = s.length();
    	List<Integer> indexes = new ArrayList<Integer>(s.length());
    	if (words.length == 0) {
    		return indexes;
    	}
    	int M = words[0].length();
    	if (N < M * words.length) {
    		return indexes;
    	}
    	int last = N - M + 1;
    	
    	//map each string in words array to some index and compute target counters
    	Map<String, Integer> mapping = new HashMap<String, Integer>(words.length);
    	int [][] table = new int[2][words.length];
    	int failures = 0, index = 0;
    	for (int i = 0; i < words.length; ++i) {
    		Integer mapped = mapping.get(words[i]);
    		if (mapped == null) {
    			++failures;
    			mapping.put(words[i], index);
    			mapped = index++;
    		}
    		++table[0][mapped];
    	}
    	
    	//find all occurrences at string S and map them to their current integer, -1 means no such string is in words array
    	int [] smapping = new int[last];
    	for (int i = 0; i < last; ++i) {
    		String section = s.substring(i, i + M);
    		Integer mapped = mapping.get(section);
    		if (mapped == null) {
    			smapping[i] = -1;
    		} else {
    			smapping[i] = mapped;
    		}
    	}
    	
    	//fix the number of linear scans
    	for (int i = 0; i < M; ++i) {
    		//reset scan variables
    		int currentFailures = failures; //number of current mismatches
    		int left = i, right = i;
    		Arrays.fill(table[1], 0);
    		//here, simple solve the minimum-window-substring problem
    		while (right < last) {
    			while (currentFailures > 0 && right < last) {
    				int target = smapping[right];
    				if (target != -1 && ++table[1][target] == table[0][target]) {
    					--currentFailures;
    				}
    				right += M;
    			}
    			while (currentFailures == 0 && left < right) {
    				int target = smapping[left];
    				if (target != -1 && --table[1][target] == table[0][target] - 1) {
    					int length = right - left;
    					//instead of checking every window, we know exactly the length we want
    					if ((length / M) ==  words.length) {
    						indexes.add(left);
    					}
    					++currentFailures;
    				}
    				left += M;
    			}
    		}
    		
    	}
    	return indexes;
    }
    

  • 2
    J

    This solution is amazingly fast. Definitely deserve more up votes.


  • 2
    J

    I think the main reason with no many up votes because the code is too long


  • 3
    Z

    I doubt if u would have enough time to do this much coding during an interview


  • 14
    S

    @zhidu you're absolutely right. Usually in a real interview, you don't have enough time to write large codes but, the good news are... you are rarely required to write large codes too. In my experience (took an Amazon SDE in-person interview round just for preparation, took a Google SRE interview seriously and got a job offer), I was not required to write such large codes but, instead I was required to write the crucial part of potentially large solutions and simply explain (very clearly) the remaining non-written code. Thing is here you cannot explain something to the compiler, you have to write it, and that's why some very efficient solutions have relatively large codes here.


  • 1
    H

  • 0
    A

    Hey @shaka-shadows . Thank you for the solution. Great explanation! Please correct me if I'm wrong but I don't think this solution is O(n).

    Take a look at this section:

    //find all occurrences at string S and map them to their current integer, -1 means no such string is in words array
    	int [] smapping = new int[last];
    	for (int i = 0; i < last; ++i) {
    		String section = s.substring(i, i + M);
    		Integer mapped = mapping.get(section);
    		if (mapped == null) {
    			smapping[i] = -1;
    		} else {
    			smapping[i] = mapped;
    		}
    	}
    

    You're looping (N-M) times here. The substring call is linear to M in Java 7+. So is the call to mapping.get() since calculating the hashcode for a string is linear to the string length. So overall you're doing (N-M) * M work. For the case M=N/2 for example this results in (N^2)/4.


  • 0
    S

    @autoconf1g you've misread my analysis, the scanning section of the algorithm is the one that is linear. I agree with you regarding the non-linearity of the whole algorithm, whenever substring running time isn't O(1), which is the case of JDK 7 or superior. However, if you want to, you can implement that particular section you've pointed out in linear time too, using hashing as well, but that certainly will complicate the implementation even more and judging for the execution time, I thought that won't worth the effort.


  • 0
    A

    @shaka.shadows Ah I see. Thanks for the clarification! Would you mind clarifying your suggestion to reduce that section to linear time? You could eliminate the substring call by using pointers to a char array if this was C. But given that the hashmap uses strings as keys this wouldn't work here as far as I can see. Also even if that could be optimized away I can't see any way of getting around the implicit hashcode calculation that occurs when doing mapping.get(section) which is O(section.length).


  • 0
    S

    @autoconf1g, the basic idea is to be able to compute a hash for every substring of S in O(1), and then use a hashmap (or something similar) with the keys being those. Here you can find a way to do so, and you basically need to consider all substrings in the form [i, i + M - 1].


  • 0
    W

    It took me a while to understand what's going on with this code. Have to say it's brilliant.
    I think there's several point of why it's fast compare to other solutions (certainly mine included):

    1. The sliding window in this code is rather "elastic", the right pointer could go all the way to the right, in search for a full "failures" count.
      Compare to other solution, the sliding window is fixed size to the length of all words, and sliding one word at a time, and do a check there.
    2. The table[1] is rather clever, I thought of using a HashMap<String,Integer> to keep the counts, but this code convert that word to an index first, and
      put in a table. Also it does not try to compare the whole frequency distribution of words as HashMap against HashMap, it only keeps track of the count
      of "failures", again, this works well. Basically it's like poping out a unique word from left, and eating up another unique word to the right.
      I think this algorithm is probably the best you can get.
      Well, the substring implementation of Java is good catch but not the point of the brilliance.

  • 0
    S

    @wang0109 good thing you liked it :)


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.