16 ms C solution using trie


  • 1
    M

    Below extract from Blog Link

    For a string of length n and a word list of size m containing words of length k, a brute force approach will have an in O(n x m x m x k) complexity. With the use of a trie the time complexity would be O(n x m x k). But the obvious trade-off is the overhead of creating this tree.

    Structure of a trie node used in this algorithm is illustrated below.

    /********************************/
    /* Trie Node */
    /********************************/
    struct tnode
    {
     struct tnode *tc[TRIE_SZ]; // child nodes
     int tcount;                // total number of instances of a word
     int count;                 // decremented during searching
     struct tnode *next;        // linked list node
    };
    

    Each trie node can have up to 26 children, please note that here we assume the comparison is not case sensitive, otherwise we would need 52. Variable ‘tcount’ records the total number of instances of a word on the list. And the variable ‘count’ is used for keeping track of the number of times a word is encountered on the main string. It is initialized with the same value as ‘tcount’ but decremented during the scan whenever the associated word is found on the main string.

    Consider the string “wordgoodgoodgoodbestword”, and a given set of words list [“word”,”good”,”best”,”good”]: ‘count’ for the word “good” would be decremented to 1 as soon as it’s encountered for the first time at the offset 4. So when ‘count’ reaches zero, it actually means that a valid sub-string cannot have any more instances of that word. Here as soon as the algorithms encounters the third instance of word “good” it would realize that the sub-string starting at 0 is invalid, because even though “good” is present in the list its ‘count’ is now zero. So now we want to restart the scanning from the next offset 1.

    Before restarting this scan we need to also revert the variable ‘count’ to ‘tcount’. This would mean that we have to keep track of the nodes decremented during the last scan. Here we can utilize the ‘next’ pointer field.

    During the previously failed scan which started at offset 0 our algorithm would have used the next pointer to generate a linked list of encountered words. The above example would generate a link from “word” to “good”. Please note that during the scan the word “good” was encountered after “word”, hence the link in that direction. Now we can run through this linked list and quickly reset the counters which were decremented. Obviously this linked list would be dismantled as soon as the reset is done. It’s merely an optimization to quickly revert the trie to its original pristine state.

    int* findSubstring(char* s, char** words, int wordsSize, int *returnSize)
    {
      struct tnode *root = NULL, *head = NULL, *node = NULL;;
      int wlen, slen, i, j, tw = 0, wsize, s_end, arr_off = 0;
      int *ret_arr = NULL;
      struct mstack m = {NULL};
      struct tnode **m_arr = NULL;
      struct tnode dummy;
     
      /* Maintain Sanity */
      *returnSize = 0;
      if (!s || !words || !wordsSize || !returnSize)
        return NULL;
     
      /* Calculate the input array size */
      wlen = strlen(words[0]);
      slen = strlen(s);
     
      /* Initialize variables */
      wsize = wlen * wordsSize;
      s_end = slen - wsize;
     
      /* Initialize dummy to zero */
      dummy.count = dummy.tcount = 0;
     
      /* Allocate a trie for the array */
      if (s_end >= 0)
      {
        /* Allocate memory for simple allocator */
        if (!m_stack_alloc(&m, SEGMENT_COUNT, SEGMENT_SIZE))
          goto subs_exit;
     
        /* Memoization Array */
        m_arr = calloc(slen, sizeof(struct tnode *));
        if (!m_arr)
          goto subs_exit;
     
        /* Create trie */
        root = create_trie(words, wordsSize, &m);
        if (!root)
          goto subs_exit;
      }
     
      /* Loop as long as there is a possibility for a match */
      for (i = 0; i <= s_end; ++i)
      {
        /* Loop checking whether the substring at this location is a
          concatenation of the words */
        for (j = i; j <= slen - wlen; j += wlen)
        {
          char c = s[j + wlen];
          struct tnode *tn = m_arr[j];
     
          /* If there is no hit, then search the trie */
          if (!tn)
          {
            /* Create a NULL terminating condition */
            s[j + wlen] = '\0';
     
            /* If the word is not found then, set the value to
               dummy */
            if ((tn = search(root, &s[j])) == NULL)
              tn = &dummy;
     
            /* Replace the character */
            s[j + wlen] = c;
     
            /* Assign the pointer to the memoization array */
            m_arr[j] = tn;
          }
     
          /* If it's not found, then break */
          if (!tn->count)
            break;
     
          /* Decrement the count */
          --tn->count;
     
          /* Initiate the linked list head */
          if (!head)
            node = head = tn;
     
          /* Add the new node only if it's not already a part of the
            list */
          else if ((!tn->next) && (node != tn))
          {
            node->next = tn;
            node = tn;
          }
     
          /* Increment the hit count */
          tw++;
     
          /* If all the words were found, then break*/
          if (tw == wordsSize)
          {
            /* Make the necessary dynamic allocation */
            if ((!ret_arr) && ((ret_arr = malloc(sizeof(int) * slen)) == NULL))
              goto subs_exit;
     
            /* Save the offset, increment the index and break */
            ret_arr[arr_off] = i;
            arr_off++;
            break;
          }
        }
     
        /* Reset the list */
        if (head)
        {
          reset_list(head);
          head = NULL;
          tw = 0;
        }
      }
     
     /* Free the trie, memoization array, assign the return size */
    subs_exit:
      m_stack_free(&m);
      if (m_arr)
      free(m_arr);
      *returnSize = arr_off;
      return ret_arr;
    }
    

    More details on the Blog Link . As you can see the algorithm also implements memoization related optimization. Other dynamic memory allocation related changes and rest of the code can be accessed from the Bitbucket repo

    NOTE: Corrected the time complexity from O(n x k) to O(n x m x k)


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.