Python DP solution beats 100% with explanation.

  • 0

    0_1481880099942_result distinct subsequences.png

    For this solution, I will use an example to explain.
    Some observations can be found as follows:

    1. For s="baabacba" & t="aba", in order to have a solution for the first 'a' in t, there must exists a 'a'.
      As for the second 'a' in t, there must exists a 'a' prior to the second 'a' and a 'b' after the first 'a'.
    2. The first 'b' in s is redundant event though t contains 'b'.
    3. Suppose that s="aab" & t="ab", it is obvious that there are only 2 solutions, and that solution is based on the solutions of s="aa" & t="a".
      If s="aabb" & t="ab", the result is twice the result of s="aa" & t="a".
      Therefore, the result of a matching character is the current result adds up with the result of the element before.

    Combining the observations above, here is the solution.

    class Solution(object):
        def numDistinct(self, s, t):
            if not t:
                return 1
            nLen = len(t)
            table = [0 for i in range(nLen + 1)]
            idxes = dict()
            table[-1] = 1
            cur = 0
            for i in range(len(s)):
                if cur < nLen:
                    if t[cur] == s[i]:
                        if t[cur] not in idxes:
                            idxes[t[cur]] = [cur]
                        cur += 1
                if s[i] in idxes:
                    for j in idxes[s[i]][::-1]:
                        table[j] += table[j-1]
            return table[nLen-1]

    table is the DP which stores the result of each iteration.
    The last element is set to 1 for dealing with cur==0 condition.
    Each element stores the result of corresponding element in t.
    cur denotes the index of the largest possible matching element (+1).
    idxes, a dict(), stores all indices (<= i) in s where char s[i] locates.

    Here is a step by step result of how this solution goes.

    s="baabacba", t="aba"
    initial, table =[0001]
    i=0,            [0001] (cur==1)
    i=1,            [1001] (cur==1) (add table[-1] to table[0])
    i=2,            [2001] (cur==1) (add table[-1] to table[0])
    i=3,            [2201] (cur==2) (add table[0] to table[1])
    i=4,            [3221] (cur==3) (add table[-1] to table[0], add table[1] to table[2])
    i=5,            [3221] (cur==3)
    i=6,            [3521] (cur==3) (add table[0] to table[1])
    i=7,            [4571] (cur==3) (add table[-1] to table[0], add table[1] to table[2]
    return table[2] which is 7

    Like other O(n) space & O(mn) time solutions, I use table for DP.
    It stores the results as we gradually check each element in s.
    However, other solutions loop through the whole table, even when they are checking the first few elements of s. On the other hand, in this solution the cur and the idxes only check necessary elements.
    In other words, cur is used for the current matching index, and we don't have to check elements later then cur.
    Besides of that, a dictionary is used for recording the indices of the elements in t that we checked before, and it can further speed up the linear traversal and only check the target indices.
    Please feel free to give me suggestions if there is something can be fixed.

Log in to reply

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.