O(N * M) Python solution hits Time-Limit Exceeded

• Below, I have written a solution in python that runs in `O(N * M)` where N is the length of `s` and M is the length of `words` -- i.e. the number of words, not the length of each word, `L`.

In addition to the main `findSubstring` method, I have two data structures. I have described fairly well with comments, but they really have simple purposes.

`CharTree` performs a single initial pass on `words` in `O(M * L)` time, calculates the necessary word counts, and structures the words so that nebulous "string-slicing" operations are unnecessary.

The `Match` class is used to keep track of a traversal through a `CharTree`. You feed it one letter at a time and it tells you whether a solution has been found yet, or if the sequence is invalid and can be eliminated. The important thing to note is that `Match.consume(char)` runs in `O(1)`.

Now for the question: Is this solution truly `O(N * M)` and, if so, is this a reasonable runtime for the problem?

I have seen a number of comments about people claiming different runtimes for their solutions, but many hand-wave their string-slicing operations.

All operations in my main loops are `O(1)`. This makes it easy to calculate the final runtime. The outer loop goes through every character of `s`, so `N` steps. The inner loop goes through every "active match," and there are at most `M` of those. This is true because any match that reaches `M` words long must either be complete or invalid, and so it gets filtered out before the next step.

This solution passes all the tests except for the last which uses a `words` list that is 200 items long, resulting a runtime of just under (200 * 10,000). Until someone can show clearly what the optimal runtime is, I just have to think the time limit is too low for the python interpreter to finish.

``````class Solution(object):
def findSubstring(self, s, words):
# Initial checks for corner cases.
s_len = len(s)
word_count = len(words)
if word_count == 0:
return []
word_len = len(words[0])
if word_len == 0:
return [i for i in range(len(s) + 1)]
# This is how long a fully-formed substring would be.
concat_len = word_count * word_len
if concat_len > s_len:
return []
# Parsing the words list into a CharTree is O(concat_len)
char_tree = CharTree(words)
result_indices = []
# This maps a start-index to the Match object that has
# consumed all characters since that index.
active_matches = {}
# Debugging the loop.
debug_total_steps = 0
for index, char in enumerate(s):
# Only create new matches if there are enough characters
# left in the string.
if concat_len < (s_len - index + 1):
active_matches[index] = Match(char_tree)
# Loop through all matches and filter out the ones
# that are complete or invalid.
filtered_matches = {}
for match_index, match in active_matches.iteritems():
debug_total_steps += 1
match.consume(char)
if match.is_complete:
result_indices.append(match_index)
elif match.is_valid:
# Only matches that are incomplete
# but still valid get carried on to
# the next step of the main loop.
filtered_matches[match_index] = match
active_matches = filtered_matches
print '(Debug) Total steps:', debug_total_steps
print '(Debug) word_count * s_len = ', word_count * s_len
return result_indices

class CharTree(object):
"""
This is a "trie"-like structure that uses dicts as nodes.
The leaf nodes contain special keys that mark the actual word
that the leaf finishes and how many times that word appears in
the original list. This structure makes it easy to keep track of
potential sub-strings without having to do
expensive string-slicing operations.

Example: words = [cow, cot, cow]
CharTree.total_word_count = 3
CharTree.root = {
'c': {
'o': {
'w' {
'LEAF_WORD': 'cow',
'LEAF_COUNT': 2,
},
't': {
'LEAF_WORD', 'cot',
'LEAF_COUNT': 1,
},
}
}
}
"""
LEAF_WORD = 'LEAF_WORD'
LEAF_COUNT = 'LEAF_COUNT'

def __init__(self, words):
self.root = {}
self.total_word_count = len(words)
for word in words:
node = self.root
for char in word:
child_node = node.get(char, {})
node[char] = child_node
node = child_node
node[CharTree.LEAF_WORD] = word
node[CharTree.LEAF_COUNT] = node.get(CharTree.LEAF_COUNT, 0) + 1

class Match(object):
"""
This data structure handles traversing a CharTree one character
at a time. Once it consumes a sequence of characters that uses
every word the correct number of times, the flag `is_complete`
is set to `True`. If it ever consumes a character that does not
fit any substring or that would repeat a word too many times,
the flag `is_valid` is set to `False`.
"""

def __init__(self, char_tree):
self.char_tree = char_tree
self.current_node = char_tree.root
self.word_counts = {}
self.total_word_count = 0
self.is_valid = True
self.is_complete = False

def consume(self, char):
if self.is_complete or not self.is_valid:
pass
elif char not in self.current_node:
self.is_valid = False
else:
child_node = self.current_node[char]
leaf_word = child_node.get(CharTree.LEAF_WORD, None)
if leaf_word is None:
self.current_node = child_node
else:
leaf_count = child_node[CharTree.LEAF_COUNT]
self.word_counts[leaf_word] = self.word_counts.get(
leaf_word, 0) + 1
self.total_word_count += 1
if self.word_counts[leaf_word] <= leaf_count:
self.is_complete = (
self.total_word_count == self.char_tree.total_word_count)
self.current_node = self.char_tree.root
else:
self.is_valid = False
return self.is_valid
``````

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.