Easy to understand C++ O(n^3) solution


  • 13
    Y

    3 for loop, so complexity O(n^3), bottom up DP with step goes from 1 to n, and for each step calculate all start and end locations. Use the collapse idea from another solution.

    class Solution {
    private:
    	vector<vector<string>> dp;
    public:
    	string collapse(string& s, int i, int j) {
    	    string temp = s.substr(i, j - i + 1);
    		auto pos = (temp+temp).find(temp, 1);
    		if (pos >= temp.size()) {
    		    return temp;
    		}
    		return to_string(temp.size()/pos) + '['+ dp[i][i+pos-1]+']';
    	}
    
    	string encode(string s) {
    		int n = s.size();
    		dp = vector<vector<string>>(n, vector<string>(n, ""));
    		for (int step = 1; step <= n; step++) {
    			for (int i = 0; i + step - 1 < n; i++) {
    				int j = i + step - 1;
    				dp[i][j] = s.substr(i, step);
    				for (int k = i; k < j; k++) {
    					auto left = dp[i][k];
    					auto right = dp[k + 1][j];
    					if (left.size() + right.size() < dp[i][j].size()) {
    						dp[i][j] = left + right;
    					}
    				}
    				string replace = collapse(s, i, j);
    				if (replace.size() < dp[i][j].size()) {
    					dp[i][j] = replace;
    				}
    			}
    		}
    		return dp[0][n - 1];
    	}
    };
    

  • 0
    C

    Can you explain more on the index such as i, j, k? What do they stand? thanks


  • 0
    Y

    @coder2 Definitely.

    i is the start index(includsive), j is the ending index(inclusive), k is scan through from i to j,for example

    "abab"
    step = 1, (i, j) = (0,0), (i,j) = (1,1), (i.j) = (2,2), (i, j) = (3,3). So it calculates all substrings with length=1
    step = 2, (i,j) = (0, 1), (i,j) = (1,2), (i,j) = (2,3). So it calculates all substring with length = 2
    step = 3, (i, j) = (0,2), (i,j) = (1,3)
    step = 4, (i, j) = (0,3), so this is the final result
    The reason to have 3 loops is that each substring's optimal answer depends on right half and left half. That is the functionality of k, to find each break point within i, j.

    If this is still not clear, refer to the following solution for another problem on leetcode, you will find it pretty much the same looping technique. solution for burst ballons


  • 0
    C

    Thank you a lot.

    I still don't quite understand the trick behind the collapse function.
    I worked it on 'abab' and vaguely understood that it's trying to find whether one part should be encoded or not. But why do you use find that way? What's the reasoning behind?

    string collapse(string& s, int i, int j) {
        string temp = s.substr(i, j - i + 1);
    	auto pos = (temp+temp).find(temp, 1);
    	if (pos >= temp.size()) {
    	    return temp;
    	}
    	return to_string(temp.size()/pos) + '['+ dp[i][i+pos-1]+']';
    }

  • 0
    Y

    @coder2 Aha, originally I used a more complicated method, I learned this easy method from this post.

    The collapse function is trying to find whether the string has repeated substring pattern, the example "abab" can be encoded as "2[ab]" but may be not a good example since the encoded lenght is longer, but you get the idea. If "aabbaabb" then "2[aabb]" is shorter and.

    You can refer to that post, my understanding is, if the string has repeated substring, then the repeated substring is at least appear 2 times, like "aabbaabb" (2times), "aabbaabbaabb"(3 times), if you concatenate the string with iteself ("aabbaabbaabbaabb"), and get rid of the first character and last character("abbaabbaabbaab"), then if you can still find the original string ("aabbaabb"), that means part of the string matching is in the first part, and part is in the second part.

    This is a little tricky, honestly I am not fully understand this method, but it works. The first thought I have for the repeated substring problem is to use KMP method, to get the failure function, then try to find pattern, which use O(n) time and space, just more lines of code.


  • 0

    @coder2 @yanzhan2
    I had a post to prove why condition (L = (s + s).find(s, 1)) < N is equivalent to that the shortest repetitive substring of s has length L. (where N = s.size())

    Basically, if you consider function f(i) = s[i%N] from int to char, then the key is that actually L is the minimum positive period of f. Note that we always have i = L*(i/L) + i%L for any i, so

    1. f(i) = f(i%L) since L*(i/L) is an integral multiple of L.

    Also, N = L*(N/L)+N%L, so f(i) = f(i+N) = f(i+N%L), which means N%L (<L) is also a period of f. But L is the minimum by string::find(s,1) definition (i.e., first occurrence of match after 0), so we must have

    1. N%L = 0.

    Combining equations 1 and 2 proves that s.subtr(0,L) must be the shortest repetitive substring of s.


  • 0
    N

    @yanzhan2

    Re: Easy to understand C++ O(n^3) solution

    string collapse(string& s, int i, int j) {
    string temp = s.substr(i, j - i + 1);
    auto pos = (temp+temp).find(temp, 1);
    if (pos >= temp.size()) {
    return temp;
    }
    return to_string(temp.size()/pos) + '['+ dp[i][i+pos-1]+']'; // ???
    }

    Why it constructs the encoded string as

         to_string(temp.size()/pos) + '['+ dp[i][i+pos-1]+']'; 
    

    Why not constructs it as

         to_string(temp.size()/pos) + '[' + temp.substr(0, pos) + ']' ?
    

    Isn't checking if temp itself can be encoded with repeated pattern? Why does it use to the dp[i][i + pos -1] instead of temp.substr(0, pos) ?

    I know it is working. Just don't get it.


  • 0
    Y

    @ncybmh99 You can try change the line to what you suggested and you would find the problem.

    Input:
    "abbbabbbcabbbabbbc"
    Output:
    "2[abbbabbbc]"
    Expected:
    "2[2[abbb]c]"

    The thing is, temp itself can be encoded to be shorter than temp, in the case temp is "abbbabbbc" but its can be encoded as "2[abb]c"


  • 0

    @yanzhan2 Nice!

    Translated to Java:

    public String encode(String s) {
        String[][] dp = new String[s.length()][s.length()];
    
        for (int len = 1; len <= s.length(); len++) {
            for (int i = 0; i + len - 1 < s.length(); i++) {
                int j = i + len - 1;
                dp[i][j] = s.substring(i, i + len);
    
                // try left and right
                for (int k = i; k < j; k++) {
                    String l = dp[i][k], r = dp[k + 1][j];
                    dp[i][j] = isLeftPlusRightShorter(l, r, dp[i][j]) ? l + r : dp[i][j];
                }
    
                // try collapsing
                String collapsed = collapse(dp, s.substring(i, i + len), i);
                dp[i][j] = isCollapsedShorter(collapsed, dp[i][j]) ? collapsed : dp[i][j];
            }
        }
    
        return dp[0][s.length() - 1];
    }
    
    private boolean isLeftPlusRightShorter(String left, String right, String curr) {
        return left.length() + right.length() < curr.length();
    }
    
    private boolean isCollapsedShorter(String collapsed, String curr) {
        return collapsed.length() < curr.length();
    }
    
    private String collapse(String[][] dp, String s, int i) {
        int pos = (s + s).indexOf(s, 1);
        if (pos >= s.length()) return s;
        return (s.length() / pos) + "[" + dp[i][i + pos - 1] + "]";
    }
    

  • 0
    Z

    Thanks for the solution!
    The time complexity is O(n^4) instead of O(n^3), because string.find() is O(n^2).
    http://www.cplusplus.com/reference/string/basic_string/find/
    @yanzhan2 said in Easy to understand C++ O(n^3) solution:

    this post


  • 0
    Y

    @zestypanda Yes, you are correct, sorry for the wrong info. Using a KMP style string find function would make the algorithm true O(n^3). I will update if I have time.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.