O(N * M) Python solution hits Time-Limit Exceeded


  • 0
    S

    Below, I have written a solution in python that runs in O(N * M) where N is the length of s and M is the length of words -- i.e. the number of words, not the length of each word, L.

    In addition to the main findSubstring method, I have two data structures. I have described fairly well with comments, but they really have simple purposes.

    CharTree performs a single initial pass on words in O(M * L) time, calculates the necessary word counts, and structures the words so that nebulous "string-slicing" operations are unnecessary.

    The Match class is used to keep track of a traversal through a CharTree. You feed it one letter at a time and it tells you whether a solution has been found yet, or if the sequence is invalid and can be eliminated. The important thing to note is that Match.consume(char) runs in O(1).

    Now for the question: Is this solution truly O(N * M) and, if so, is this a reasonable runtime for the problem?

    I have seen a number of comments about people claiming different runtimes for their solutions, but many hand-wave their string-slicing operations.

    All operations in my main loops are O(1). This makes it easy to calculate the final runtime. The outer loop goes through every character of s, so N steps. The inner loop goes through every "active match," and there are at most M of those. This is true because any match that reaches M words long must either be complete or invalid, and so it gets filtered out before the next step.

    This solution passes all the tests except for the last which uses a words list that is 200 items long, resulting a runtime of just under (200 * 10,000). Until someone can show clearly what the optimal runtime is, I just have to think the time limit is too low for the python interpreter to finish.

    class Solution(object):
        def findSubstring(self, s, words):
            # Initial checks for corner cases.
            s_len = len(s)
            word_count = len(words)
            if word_count == 0:
                return []
            word_len = len(words[0])
            if word_len == 0:
                return [i for i in range(len(s) + 1)]
            # This is how long a fully-formed substring would be.
            concat_len = word_count * word_len
            if concat_len > s_len:
                return []
            # Parsing the words list into a CharTree is O(concat_len)
            char_tree = CharTree(words)
            result_indices = []
            # This maps a start-index to the Match object that has
            # consumed all characters since that index.
            active_matches = {}
            # Debugging the loop.
            debug_total_steps = 0
            for index, char in enumerate(s):
                # Only create new matches if there are enough characters
                # left in the string.
                if concat_len < (s_len - index + 1):
                    active_matches[index] = Match(char_tree)
                # Loop through all matches and filter out the ones
                # that are complete or invalid.
                filtered_matches = {}
                for match_index, match in active_matches.iteritems():
                    debug_total_steps += 1
                    match.consume(char)
                    if match.is_complete:
                        result_indices.append(match_index)
                    elif match.is_valid:
                        # Only matches that are incomplete
                        # but still valid get carried on to
                        # the next step of the main loop.
                        filtered_matches[match_index] = match
                active_matches = filtered_matches
            print '(Debug) Total steps:', debug_total_steps
            print '(Debug) word_count * s_len = ', word_count * s_len
            return result_indices
    
    
    class CharTree(object):
        """
        This is a "trie"-like structure that uses dicts as nodes.
        The leaf nodes contain special keys that mark the actual word
        that the leaf finishes and how many times that word appears in
        the original list. This structure makes it easy to keep track of
        potential sub-strings without having to do
        expensive string-slicing operations.
    
        Example: words = [cow, cot, cow]
        CharTree.total_word_count = 3
        CharTree.root = {
          'c': {
            'o': {
              'w' {
                'LEAF_WORD': 'cow',
                'LEAF_COUNT': 2,
              },
              't': {
                'LEAF_WORD', 'cot',
                'LEAF_COUNT': 1,
              },
            }
          }
        }
        """
        LEAF_WORD = 'LEAF_WORD'
        LEAF_COUNT = 'LEAF_COUNT'
    
        def __init__(self, words):
            self.root = {}
            self.total_word_count = len(words)
            for word in words:
                node = self.root
                for char in word:
                    child_node = node.get(char, {})
                    node[char] = child_node
                    node = child_node
                node[CharTree.LEAF_WORD] = word
                node[CharTree.LEAF_COUNT] = node.get(CharTree.LEAF_COUNT, 0) + 1
    
    
    class Match(object):
        """
        This data structure handles traversing a CharTree one character
        at a time. Once it consumes a sequence of characters that uses
        every word the correct number of times, the flag `is_complete`
        is set to `True`. If it ever consumes a character that does not
        fit any substring or that would repeat a word too many times,
        the flag `is_valid` is set to `False`.
        """
    
        def __init__(self, char_tree):
            self.char_tree = char_tree
            self.current_node = char_tree.root
            self.word_counts = {}
            self.total_word_count = 0
            self.is_valid = True
            self.is_complete = False
    
        def consume(self, char):
            if self.is_complete or not self.is_valid:
                pass
            elif char not in self.current_node:
                self.is_valid = False
            else:
                child_node = self.current_node[char]
                leaf_word = child_node.get(CharTree.LEAF_WORD, None)
                if leaf_word is None:
                    self.current_node = child_node
                else:
                    leaf_count = child_node[CharTree.LEAF_COUNT]
                    self.word_counts[leaf_word] = self.word_counts.get(
                        leaf_word, 0) + 1
                    self.total_word_count += 1
                    if self.word_counts[leaf_word] <= leaf_count:
                        self.is_complete = (
                            self.total_word_count == self.char_tree.total_word_count)
                        self.current_node = self.char_tree.root
                    else:
                        self.is_valid = False
            return self.is_valid
    

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.