Count the occurrence of a set of words in a character stream.


  • 0
    M

    Given a stream of characters and a list of words find and display count of each and every word once the stream ends.
    Input: stream = "acacabcatghhellomvnsdb", words = ["aca","cat","hello","world"]
    Output : ["aca" : 2 , "cat" : 1 , "hello" : 1 , "world" : 0].


  • 0
    C

    A straightforward n^2 solution is iterate over all substrings and check if it is available in the map (add words given to an unordered map) and increment its count.


  • 1

    @gvishal using a Trie is better.


  • 0

    I think that we could use Aho - Corasick algorithm, which integrates successfully Trie with Knuth–Morris–Pratt failure function


  • 0
    C

    @agave I was thinking about that. You are going to add words to the trie right? And then how are you going to update the counts? You'll need to check whether l..r, l+1....r exist in the trie, ryt? I don't see how that decreases the runtime.


  • 0

    @gvishal never said it decreases the worst-case runtime. I've said it's better.


  • 0
    C

    Better in what manner?


  • 0

    @gvishal what do you think I mean by "better"?


  • 0

    I would like to provide my implementation of Aho–Corasick.To achieve further optimization, merging of output sets could be implemented by using bitmaps.

    class Node:
        def __init__(self, root):
            self.map = defaultdict()
            self.output = []  
            self.failure = root
            
    class Trie:  
        def __init__(self):
            self.root = Node(None)
            
       
        def insert(self, word):
            current = self.root
            for letter in word:           
                current = current.map.setdefault(letter, Node(self.root))                     
            current.output.append(word)
            
        def makeAutomaton(self, words):
            self.keywords = words
            for w in words:
                self.insert(w)                  
            current = self.root
            queue = []
            for key, value in current.map.items():
                value.failure = self.root;
                queue.append(value)            
            while len(queue) > 0:
                current = queue.pop(0)
                for key, value in current.map.items():                
                    queue.append(value)                 
                    tmp = current.failure               
                    while tmp != self.root and key not in tmp.map:
                        tmp = tmp.failure                  
                    if key in tmp.map:
                        value.failure = tmp.map[key]
                        value.output =  value.output + tmp.map[key].output;
                    else:
                        value.failure = self.root      
        
        def search(self, word):  
            f = {}
            for w in self.keywords:                  
                f[w] = 0            
            current = self.root
            for letter in word: 
                while current != self.root and letter not in current.map: 
                    current = current.failure                        
                if letter in current.map:
                    current = current.map[letter]            
                    for w in current.output:
                        f[w] += 1                            
            return f    
    
    if __name__ == '__main__':
        trie = Trie()
        trie.makeAutomaton(["aca", "cat", "hello", "world"])  
        stat = trie.search('acacabcatghhellomvnsdb')    
        for u, v in stat.items():
           print u,v

  • 0

    @elmirap said

    Knut Moris Prat
    Acho - Corasic

    I notice you removed or added exactly one character from/to each of those names. Is this some kind of insider joke? I don't get it :-)


  • 0

    @StefanPochmann It was a typo error sorry.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.