# Count the occurrence of a set of words in a character stream.

• Given a stream of characters and a list of words find and display count of each and every word once the stream ends.
Input: stream = "acacabcatghhellomvnsdb", words = ["aca","cat","hello","world"]
Output : ["aca" : 2 , "cat" : 1 , "hello" : 1 , "world" : 0].

• A straightforward n^2 solution is iterate over all substrings and check if it is available in the map (add words given to an unordered map) and increment its count.

• @gvishal using a Trie is better.

• I think that we could use Aho - Corasick algorithm, which integrates successfully Trie with Knuth–Morris–Pratt failure function

• @agave I was thinking about that. You are going to add words to the trie right? And then how are you going to update the counts? You'll need to check whether l..r, l+1....r exist in the trie, ryt? I don't see how that decreases the runtime.

• @gvishal never said it decreases the worst-case runtime. I've said it's better.

• Better in what manner?

• @gvishal what do you think I mean by "better"?

• I would like to provide my implementation of Aho–Corasick.To achieve further optimization, merging of output sets could be implemented by using bitmaps.

``````class Node:
def __init__(self, root):
self.map = defaultdict()
self.output = []
self.failure = root

class Trie:
def __init__(self):
self.root = Node(None)

def insert(self, word):
current = self.root
for letter in word:
current = current.map.setdefault(letter, Node(self.root))
current.output.append(word)

def makeAutomaton(self, words):
self.keywords = words
for w in words:
self.insert(w)
current = self.root
queue = []
for key, value in current.map.items():
value.failure = self.root;
queue.append(value)
while len(queue) > 0:
current = queue.pop(0)
for key, value in current.map.items():
queue.append(value)
tmp = current.failure
while tmp != self.root and key not in tmp.map:
tmp = tmp.failure
if key in tmp.map:
value.failure = tmp.map[key]
value.output =  value.output + tmp.map[key].output;
else:
value.failure = self.root

def search(self, word):
f = {}
for w in self.keywords:
f[w] = 0
current = self.root
for letter in word:
while current != self.root and letter not in current.map:
current = current.failure
if letter in current.map:
current = current.map[letter]
for w in current.output:
f[w] += 1
return f

if __name__ == '__main__':
trie = Trie()
trie.makeAutomaton(["aca", "cat", "hello", "world"])
stat = trie.search('acacabcatghhellomvnsdb')
for u, v in stat.items():
print u,v``````

• @elmirap said

Knut Moris Prat
Acho - Corasic

I notice you removed or added exactly one character from/to each of those names. Is this some kind of insider joke? I don't get it :-)

• @StefanPochmann It was a typo error sorry.

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.