C, Trie, 6mS


  • 0
    M

    Whenever we need to scan for some sequence, use of a trie is typically a good approach. Implementing this in C might not be all that difficult. The data structure of the node is given below. Note that we can use the word count variable to track more than one instance of a sequence.

    struct TrieNode
    {
        struct TrieNode *cptr[CCOUNT];
        int word; // word count (> 0 for a valid word)
    };
    

    The basic loop is given below:

        /* Look for sequences */
        for (i = 0; i <= len - SEQUENCE_LEN; ++i)
        {
            /* Add the word to the trie, and get the terminating node. */
            if (!(node = AddToTrie(root, &s[i], SEQUENCE_LEN, map, salloc)))
                goto __exit;
    
            /* If there was a previous instance of the same sequence,
            then add this string to the solution. */
            if (node->word == 1)  {
                /* Validate the allocation */
                if (!(sarr = malloc(sizeof(char) * (SEQUENCE_LEN + 1))))
                    goto __exit;
                /* Copy the sequence */
                strncpy(sarr, &s[i], SEQUENCE_LEN);
                sarr[SEQUENCE_LEN] = 0; // NULL terminate;
                str[*returnSize] = sarr; // add this to the solution
                (*returnSize) += 1; // increment the sequence count
            }
            node->word++; // increment the word instance count
        }
    

    Trie traversal function AddToTrie is given below:

    struct TrieNode *AddToTrie(struct TrieNode *root, char *str, int count,
                               int *map, void *salloc)
    {
        int off;
    
        /* If we are done, then return */
        if (!count)
            return root;
    
        /* Get the offset */
        off = map[*str - 'A'];
    
        /* If there is not child at that location, then allocate */
        if (!root->cptr[off] &&
            !(root->cptr[off] = GetSAlloc(salloc, sizeof(struct TrieNode))))
            return NULL;
    
        /* Continue the traversal */
        return AddToTrie(root->cptr[off], str + 1, count - 1, map, salloc);
    }
    
    

    As you can see, the traversal is not that complex. Definitely simpler than implementing a HashTable. Note that we return the last node, and if the word count associated with that is one, then we know there was a previous instance of the same sequence.

    The main problem with this approach is the generation of the tree. This requires several malloc calls, dynamic allocation calls are costly. Above code uses an optimized allocator call "GetSAlloc", this allocates memory in larger chunks and reduces the malloc library call overhead. After this optimization, the total time taken was reduced from 122mS to 6mS.

    Full source code at GitHub


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.