ACC O(n^3) dynamic programming solution


  • 2
    A

    Started the contest late so wasn't able to submit during contest time limit, but here's what I was able to submit (with comments) about 15 minutes after the end. Maybe there's a faster way but haven't thought of one yet.

    https://github.com/andreimaximov/algorithms/blob/master/leetcode/algorithms/encode-string-with-shortest-length/solution.py

    Solution #3 Inspired by @StefanPochmann 's solution here. The main challenge with solution #2 and #3 is efficiently finding a repeating prefix.

    class Solution(object):
        def encode(self, s):
            """Encodes a string with the shortest possible length in O(n^3) time.
    
            See problem description for encoding rules. Credit to Stefan Pochmann
            for finding optimal repeating prefix substring. See his original
            solution at https://discuss.leetcode.com/topic/71442/short-python.
            """
            n = len(s)
    
            # Each dp[x] represents the shortest encoding for a substring x of s.
            dp = dict()
            return self.encodeUtil(s, dp)
    
        def encodeUtil(self, s, dp):
            """Calculates the shortest encoding for s.
    
            There are O(n^2) subproblems but each subproblem considers O(n) smaller
            subproblems so the overall runtime is O(n^3)."""
            n = len(s)
            if n < 5:
                # Cannot compress anything shorter than 5 characters.
                return s
            elif s in dp:
                return dp[s]
    
            # Encode the string raw by default.
            encoding = s
    
            # Encode the string as the shortest repeating substring if possible.
            i = (s + s).find(s, 1)
    
            # Check if a repeating substring exists and it is not s.
            if i != -1 and i < n:
                repeat = s[:i]
                count = n / len(repeat)
                encoding = '%d[%s]' % (count, self.encodeUtil(repeat, dp))
    
            # Now consider all possible splits between chars 0 through n - 1.
            for i in range(1, n):
                left = self.encodeUtil(s[:i], dp)
                right = self.encodeUtil(s[i:], dp)
                if len(left) + len(right) < len(encoding):
                    encoding = left + right
    
            dp[s] = encoding
            return encoding
    

    Solution #2 (Test case abaababaab is bugged)

    class Solution(object):
        def encode(self, s):
            """Encodes a string with the shortest possible length in O(n^3) time.
    
            See problem description for encoding rules.
            """
            n = len(s)
    
            # Each dp[i][j] reperesents the shortest encoding for s[i:j + 1].
            dp = [[None for _ in range(n)] for _ in range(n)]
            return self.encodeUtil(s, 0, n - 1, dp)
    
        def encodeUtil(self, s, i, j, dp):
            """Calculates the shortest encoding for s[i:j + 1].
    
            There are O(n^2) subproblems but each subproblem considers O(n) smaller
            subproblems so the overall runtime is O(n^3)."""
            n = j - i + 1
            if n < 5:
                # Cannot compress anything shorter than 5 characters.
                return s[i:j + 1]
            elif dp[i][j] is not None:
                return dp[i][j]
    
            # Default encoding is just the raw string.
            encoding = s[i:j + 1]
    
            # Consider all repeating prefix substrings.
            k = i
            while k < j:
                # Count how many times x occurs back to back in s[i:j + 1].
                x = s[i:k + 1]
                y = self.encodeUtil(s, i, k, dp)
                count = 0
                start = i
    
                while s[start:start + len(x)] == x:
                    count += 1
                    start += len(x)
    
                    # Compress the prefix.
                    prefix = y if count == 1 else '%d[%s]' % (count, y)
    
                    # Compress the suffix.
                    suffix = self.encodeUtil(s, start, j, dp)
    
                    # Apply if encoding with x as prefix is shorter.
                    if len(prefix) + len(suffix) < len(encoding):
                        encoding = prefix + suffix
    
                # NOTE: The following comments are wrong but I have left them to
                # demonstrate  my thought process at the time. This can be fixed with
                # k+= 1 but will result in TLE. Main challenge is efficiently finding a
                # repeating prefix.
                # 
                # This is critical for speedup! Essentially if we have a prefix x
                # that repeats p times, we do not need to consider any prefixes
                # formed by repeating x p times. Why? Because without this we will
                # be performing unecessary (repeat) calls to get optimal suffix.
                k = i + len(x) * count 
    
            dp[i][j] = encoding
            return encoding
    

    Solution #1 (Test case aaaaaaaaaabbbaaaaabbb is bugged)

    class Solution(object):
        def encode(self, s):
            """Encodes a string with the shortest possible length in O(n^3) time.
            See problem description for encoding rules.
            """
            n = len(s)
    
            # Create an index where all O(n^2) possible substrings begin.
            index = dict()
            for i in range(n):
                for j in range(i, n):
                    x = s[i:j + 1]
                    if x not in index:
                        index[x] = set()
                    index[x].add(i)
    
            # Each dp[i][j] reperesents the shortest encoding for s[i:j + 1].
            dp = [[None for _ in range(n)] for _ in range(n)]
            return self.encodeUtil(s, 0, n - 1, index, dp)
    
        def encodeUtil(self, s, i, j, index, dp):
            """Calculates the shortest encoding for s[i:j + 1].
            There are O(n^2) subproblems but each subproblem considers O(n) smaller
            subproblems so the overall runtime is O(n^3)."""
            n = j - i + 1
            if n < 5:
                # Cannot compress anything shorter than 5 characters.
                return s[i:j + 1]
            elif dp[i][j] is not None:
                return dp[i][j]
    
            # Default encoding is just the raw string.
            encoding = s[i:j + 1]
    
            # Consider all repeating prefix substrings.
            for k in range(i, j):
                # Count how many times x occurs back to back in s[i:j + 1].
                x = s[i:k + 1]
                count = 1
                start = i + len(x)
                while start in index[x]:
                    count += 1
                    start += len(x)
    
                # Compress the prefix.
                x = self.encodeUtil(s, i, k, index, dp)
                if count > 1:
                    prefix = '%d[%s]' % (count, x)
                else:
                    prefix = x
    
                # Compress the suffix.
                suffix = self.encodeUtil(s, start, j, index, dp)
    
                # Apply prefix and suffix if encoding with x as prefix is shorter.
                if len(prefix) + len(suffix) < len(encoding):
                    encoding = prefix + suffix
    
            dp[i][j] = encoding
            return encoding
    

  • 1
    7

    @amaximov said in ACC O(n^3) dynamic programming solution:
    It gives a false answer for
    "aaaaaaaaaabbbaaaaabbb".

    I believe it's because a total optimal do not necessarily have an optimal prefix


  • 0
    A

    @70664914 I get an output of "10[a]bbb5[a]bbb" for that input. Perhaps you were expecting output of 5[a]2[[5[a]bbb]] which is 16 characters long and thus sub-optimal? Another valid encoding would be 5[a]2[aaaaabbb] which is also 15 characters and thus no better.

    I can't see any better way to encode it with less than 15 characters.


  • 0
    7

    @amaximov the answer given has a square non necessary, optimal is 14, I believe


  • 0
    A

    @70664914 said in ACC O(n^3) dynamic programming solution:

    has a square non necessary

    I'm not sure what you mean by this. Please give an example of what you expect a valid 14 character encoding to look like.


  • 0
    7

    @amaximov "5[a]2[5[a]bbb]", as is given by the machine of leetcode


  • 0
    A

    @70664914 You're right! Sorry should have checked expected output.


  • 0
    A

    @70664914 fixed and updated original post.


  • 0

    @70664914 Thanks, just added your test case.


  • 0
    T

    Thanks! good solution and clear explanation.


  • 0
    R

    try test case "abaababaab"


  • 0
    A

    @roc571 Good catch. The issue is my failure at optimizing the k = ? increment step in the while loop. You can replace this with k += 1 for correctness but then it will TLE.

    The issue stems from trying to find the optimal repeating prefix substring. However @StefanPochmann has a great solution here that you can check out.


  • 0

    @roc571 Thanks, just added your test case.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.