C++ DP 250 ms, C++ Trie MLE


  • 0
    V

    The first solution below is C++ DP that was accepted with 250 ms run-time.
    The second is C++ Trie that generates MLE. It sorts strings by length, and only adds unique words to the trie to conserve memory. There are few people here who also complained about MLE on C++ trie-based solution. In addition to storing only unique words in the trie, I tried to limit the recursion depth, use array[26] or unordered_map with a custom bucked size - but still MLE.

    In my environment, C++ Trie performs much faster than C++ DP, so, I really want the MLE issue sorted out.

    C++ DP (250 ms)

    class Solution
    {
    private:
    
        bool isConcatenated(const string& word, const unordered_set<string>& dict, int minL, bool top, unordered_set<string>& dp)
        {
            auto l = word.size();
        
            if (l == 0) return true;
            if (dp.count(word) > 0) return true;
        
            for (int x = minL; x <= l; ++x) // "x" is the size of the word.
            {
                if (top && x > l - minL) break; // we need to leave the room for the second word.
        
                auto ww = word.substr(0, x);
                if (dict.count(ww) > 0)
                {
                    auto rest = word.substr(x);
                    if (rest.empty()) return true; // last word.
                    if (isConcatenated(word.substr(x), dict, minL, false, dp))
                    {
                        dp.insert(word);
                        return true;
                    }
                }
            }
        
            return false;
        }
    
    public:
    
        vector<string> findAllConcatenatedWordsInADict(vector<string>& words)
        {
            auto minWordL = INT_MAX;
            vector<string> res;
            unordered_set<string> dict;
            unordered_set<string> dp;
        
            for (auto w : words)
            {
                int l = w.size();
                if (l > 0) // failsafe against empty strings.
                {
                    dict.insert(w);
                    minWordL = min(minWordL, l);
                }
            }
        
            for (auto w : words)
            {
                if (w.size() < minWordL * 2) continue;
                if (isConcatenated(w, dict, minWordL, true, dp))
                {
                    res.push_back(w);
                }
            }
        
            return res;
        }
    };
    

    C++ Trie (MLE)

    class Solution
    {
    private:
    
        struct trie
        {
            trie* children[26] = {};
            bool eow = false;
            ~trie()
            {
                for (auto t : children) delete t;
            }
        };
    
        bool addToTrie(trie* root, const string& w)
        {
            auto l = w.size();
            stack<short> stack;
    
            auto el = root;
            for (auto i = 0; i < l; ++i)
            {
                auto idx = w[i] - 'a';
                if (el->children[idx] == NULL)
                {
                    el->children[idx] = new trie();
                }
    
                el = el->children[idx];
    
                if (el->eow) // check if w can be represented by existing words in the trie.
                {
                    auto start = i + 1;
                    auto el2 = el;
                    stack.push(i + 1);
    
                    while (true)
                    {
                        for (auto j = start; j < l; ++j)
                        {
                            el2 = el2->children[w[j] - 'a'];
                            if (el2 == NULL) break;
                            if (el2->eow && stack.size() < 5) stack.push(j + 1); // limiting the recursion depth.
                        }
    
                        if (el2 != NULL && el2->eow) return true; // this word can be comprised entirely.
                        if (stack.empty()) break;
    
                        start = stack.top();
                        el2 = root;
                        stack.pop();
                    }
                }
            }
    
            el->eow = true;
            return false; // brand new word was added to the tree (cannot be comprised).
        }
    
    public:
    
        vector<string> findAllConcatenatedWordsInADict(vector<string>& words)
        {
            sort(words.begin(), words.end(), [](string& a, string& b)
            {
                auto as = a.size();
                auto bs = b.size();
                return (as != bs ? as < bs : a < b); // the stable sort seems to improve the runtime.
            });
    
            trie root;
            vector<string> res;
    
            for (auto w : words)
            {
                if (addToTrie(&root, w))
                {
                    res.push_back(w);
                }
            }
    
            return res;
        }
    };
    

  • 1

    The set dp in DP solution records all words which are concatenatable for memorization. Would it be useful to also record those who are not concatenatable to further speed up the dfs?


  • 0
    V

    @zzg_zzm I originally tried to use map<string, bool> in DP solution to memoize both concat and non-concat words. The run-time became ~150 ms worse. Also, if you mark a word as non-concat, you need to watch if this non-concat words is a part of concat word later in the input. I think that map is not necessary unless we have a lot of duplicated non-concat words (which is not the case for this solution).


  • 0

    @votrubac said in C++ DP 250 ms, MLE on C++ Trie:

    @zzg_zzm I originally tried to use map<string, bool> in DP solution to memoize both concat and non-concat words. The run-time became ~150 ms worse. Also, if you mark a word as non-concat, you need to watch if this non-concat words is a part of concat word later in the input. I think that map is not necessary unless we have a lot of duplicated non-concat words (which is not the case for this solution).

    Indeed, I also tried to do some similar checking on duplicates but got longer running time probably because of overhead costs. For high complexity problem like this one, the characteristics of testing data really have big impact on total running time.

    @administrators In the problem's description, I see that a word can be used multiple times for concatenation. However, it does not clearly specify

    • whether the given word list vector<string> words contains duplicated words;
    • if a duplicated word is concatenatable, would it need to be also output multiple times in the result vector<string> same times as it is duplicated in the given list?

    I think these checks are not the major points to challenge in this problem, but they indeed increase the complexity of the code to slow down running time if there are a large number of test cases which do not favor the checking in the code. So maybe more clarification in the problem could erase those non-essential complexity in coding (or should purposely include test cases for duplication verification).


  • 0

    @zzg_zzm I have fixed the description to mention the list does not contain duplicate words.


  • 0
    V

    @administrators is it possible to remedy the MLE issue for C++ Trie? The same solution is accepted for Java, and many people here reported they are getting MLE for C++.


  • 0
    H

    @votrubac check out this post


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.