10-line DFS solution with prefix->word indices look-up (What is the time complexity?)


  • 0

    Many has posted the similar DFS approach to build each word square by trying every possible word with expected prefix row by row.

    It is interesting to see what is the time complexity of the algorithm. Let N be the number of words, and L be the length of each word.

    • Build loop-up dictionary: for each word and each length of prefix, we invoke substr to get the prefix copy, so it is O(NL2). Any other way to achieve prefix to word list look-up can be substituted here.
    • DFS recursion: The size of loop for (int j : dict[pfx]) depends on the given words, and I think it will likely be larger if given words share more comment letters (and I think this is why the problem assumes no duplicates in given words). If naively using N for upper bound to estimate would be too conservative and lead to a huge complexity O(NL).

    Note that the real brute force solution would be look at all possible LxL word squares which are exactly NL of them (assuming we could use a word multiple times in a word square even though the given word list has no duplicates), and check each one of them if valid with O(L2) time, so the total time for brute force would be O(NLL2).

    Initially, I thought the even the popular method below is sort like brute force, so I gave up without giving a try. However, the given constraints that N ≤ 1000, L ≤ 5 are indeed very critical.

        vector<vector<string>> wordSquares(vector<string>& words) {
          sq.resize(words[0].size());
          for (int i = 0; i < words.size(); ++i) // build prefix->word indices look-up
            for (int j=0; j<words[0].size(); dict[words[i].substr(0,j++)].push_back(i));
          return dfs(0, words), sqs;
        }
        
        // dfs routine to fill ith row of word sqaure
        void dfs(int i, const vector<string>& words) {
          if (i == words[0].size()) return sqs.push_back(sq);
          string pfx; for (int j=0; j<i; pfx += sq[j++][i]); // prefix for next look-up
          for (int j : dict[pfx]) sq[i] = words[j], dfs(i+1, words);
        }
        
        unordered_map<string, vector<int>> dict; // prefix -> word indices
        vector<vector<string>> sqs; // collection of word squares
        vector<string> sq; // a built word sqaure
    

  • 0
    A

    I like your analysis. But for first part, if use trie to deal with prefix, the time reduces to O(N*L). Of course, it increases lookup time from O(1) of lookup talbe to O(L) of trie. Could you please explain why N<=1000 and L<=5 are very CRITICAL? Thanks.


  • 0

    @apepkuss said in 10-line DFS solution with prefix->word indices look-up (What is the time complexity?):

    I like your analysis. But for first part, if use trie to deal with prefix, the time reduces to O(N*L). Of course, it increases lookup time from O(1) of lookup talbe to O(L) of trie. Could you please explain why N<=1000 and L<=5 are very CRITICAL? Thanks.

    I just mean the constraints are very critical in terms of the small magnitude (instead of the exact upper bound quantitatively). Because I think the nature of the algorithm is just brute force which yields huge time complexity even if N or L is modestly large.

    Actually, whenever Leetcode problems have such small magnitude constraints on input parameters, I think it is a signal to be prepared for possible brute force, which typically yields at least O(N^2) and beyond complexity.


  • 0
    A

    @zzg_zzm Thanks for your reply. In my view, just for this question, L is small, meaning the time complexity for search on a trie can be seen as O(1). If so, then trie could be better than hashtable. That could be the point to test during an interview. Anyway, thank you so much!


  • 0

    @apepkuss I agree. For L=5, I think trie is better than prefix hash table because trie doesn't duplicate letters at any given depth.


  • 0
    J

    hashtable to look up is always slower than trie here.
    you can't just say hash lookup is O(1)
    if you count building Trie time as n * L where L is length of word, you should count hash lookup as O(L) as well, as computing hashcode of a string is O(L) where L is the length of the word.


  • 0

    @jobhunting88 Yes, I totally agree. If L is considered as a variable for input, hashing on a data type consisting of O(L) amount of data should be considered O(L) instead of O(1). The O(1) is only the (average) loop-up time assuming hash value is already computed.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.