Visualized explanation and Python solution

  • 0

    Obviously, we will travel (1-step jumps) string s and check whether the substring exists in the list words or occurs not over the time of appearance in the list words. But it is not good since running time is so long.

    • Tip 1: we use dictionary to store the time of appearance in the list words since it's faster to find a word in words.
    • Tip 2: As we can see in the Figure 1, the greens are the satisfied substrings, the pinks are the substrings being over the time of occurrence, the reds are the substrings not matching. Because we discovered the second substring (from 4th-cell to 6th-cell) is the satisfied substring in the first step and we have to discover again the second substring, it's waste job. So we will use Dynamic Programming.
      0_1508632472841_brutal force.png
      Figure 1. 1-step jumps

    We will travel m-step jumps (m is the length of word). If substring is over the time of occurrence, we will continue to travel at the position that the substring is not over the time of occurence. Then, we continue travel until the end of string even we found the satisfied index.
    Figure 2. 3-step jumps - Move left to 4th-cell as the light green is over once => reduce the time of occurrence of the light green word.

    If substring does not match, we will continue to travel at the position of the next substring.
    0_1508633068702_brutal force.png
    Figure 3. 3-step jumps - Move left to 10th-cell as the red cells do not match the list words => skip the violated cell.


    class Solution(object):
        def findSubstring(self, s, words):
            :type s: str
            :type words: List[str]
            :rtype: List[int]
            ans,main_dict = [],{}
            len_s,length,n_words = len(s),len(words[0]),len(words)
            # generate main dictionary
            for word in words:
                main_dict[word] = main_dict[word] + 1 if word in main_dict else 1
            _dict = {}
            _count = 0
            def reset():
                _dict = {}
                _count = 0
            # Each pass, jump with length of word
            for i in xrange(min(len_s - length*n_words + 1,length)):
                chunk = s[i:i+length]
                left_window,base = i,i
                # Jump until the end of string
                while left_window + length * n_words <= len_s:
                    base += length
                    # Skip the substring having the intervening characters
                    if not(chunk in main_dict):
                        left_window = base
                        chunk = s[base:base+length]
                    _count += 1
                    _dict[chunk] = _dict[chunk] + 1 if chunk in _dict else 1
                    # Shorten the substring => skip the chunks one of which occur more than once
                    while _dict[chunk] > main_dict[chunk]:
                        skipped_chunk = s[left_window:left_window+length]
                        _dict[skipped_chunk] -= 1
                        _count -= 1
                        left_window += length
                    # Add satisfied substring
                    if _count >= n_words:
                        left_window += length
                        base = left_window
                    chunk = s[base:base+length]
            return ans

Log in to reply

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.