Easy to understand DP in Java


  • 218
    B

    The idea is the following:

    • we will build an array mem where mem[i+1][j+1] means that S[0..j] contains T[0..i] that many times as distinct subsequences. Therefor the result will be mem[T.length()][S.length()].
    • we can build this array rows-by-rows:
    • the first row must be filled with 1. That's because the empty string is a subsequence of any string but only 1 time. So mem[0][j] = 1 for every j. So with this we not only make our lives easier, but we also return correct value if T is an empty string.
    • the first column of every rows except the first must be 0. This is because an empty string cannot contain a non-empty string as a substring -- the very first item of the array: mem[0][0] = 1, because an empty string contains the empty string 1 time.

    So the matrix looks like this:

      S 0123....j
    T +----------+
      |1111111111|
    0 |0         |
    1 |0         |
    2 |0         |
    . |0         |
    . |0         |
    i |0         |
    

    From here we can easily fill the whole grid: for each (x, y), we check if S[x] == T[y] we add the previous item and the previous item in the previous row, otherwise we copy the previous item in the same row. The reason is simple:

    • if the current character in S doesn't equal to current character T, then we have the same number of distinct subsequences as we had without the new character.
    • if the current character in S equal to the current character T, then the distinct number of subsequences: the number we had before plus the distinct number of subsequences we had with less longer T and less longer S.

    An example:
    S: [acdabefbc] and T: [ab]

    first we check with a:

               *  *
          S = [acdabefbc]
    mem[1] = [0111222222]
    

    then we check with ab:

                   *  * ]
          S = [acdabefbc]
    mem[1] = [0111222222]
    mem[2] = [0000022244]
    

    And the result is 4, as the distinct subsequences are:

          S = [a   b    ]
          S = [a      b ]
          S = [   ab    ]
          S = [   a   b ]
    

    See the code in Java:

    public int numDistinct(String S, String T) {
        // array creation
        int[][] mem = new int[T.length()+1][S.length()+1];
    
        // filling the first row: with 1s
        for(int j=0; j<=S.length(); j++) {
            mem[0][j] = 1;
        }
        
        // the first column is 0 by default in every other rows but the first, which we need.
        
        for(int i=0; i<T.length(); i++) {
            for(int j=0; j<S.length(); j++) {
                if(T.charAt(i) == S.charAt(j)) {
                    mem[i+1][j+1] = mem[i][j] + mem[i+1][j];
                } else {
                    mem[i+1][j+1] = mem[i+1][j];
                }
            }
        }
        
        return mem[T.length()][S.length()];
    }

  • 1
    W

    In the sentence, "we check if S[j] == T[j]" should be "we check if S[i] == T[j]", right?


  • 0
    B

    yes, thanks for pointing it out, I've edited it ;)


  • 0
    S

    Couldn't be better! Awesome explanation!


  • 0
    P

    Thanks for the awesome explanation!


  • 0
    A

    Pretty good~ Simple and detailed.Thanks for sharing your code.


  • 7
    L

    would you plz explain the intuition of 'mem[i+1][j+1] = mem[i][j] + mem[i+1][j];'
    I just can't make sense of it.


  • 23
    B

    Sure let's consider the same example as above: S = [acdabefbc], T = [ab]

                  *  * 
          S = [acdabefbc]
    mem[1] = [0111222222]
    mem[2] = [00000222_ ]
    

    Imagine that we are filling the gap at _. That means i=1, so T[i] = b and j=7, so S[j] = b.

    We're looking for mem[i+1][j+1], which is the place for _. Currently we know that at this position we have 2 as before, because mem[1][7] = 2, which is the position ABOVE and LEFT to _. Also we know that sofar we had 2 subsequences before (namely AcdaBef and acdABef -- highlighted with uppercase) because mem[2][7] = 2, which is LEFT to _. So having this new b would increase the number of subsequences (currently 2) with a number of 2, because it can be matched with the 2 as we saw before. That's why if T[i] == S[j] then mem[i+1][j+1] := mem[i][j] + mem[i+1][j]. So _ will be 4.

    I hope this helped.


  • 1
    L

    thanks for your help. I understand it now:-)


  • 1
    W

    really appreciate it,you know DP is kinda abstract and difficult.


  • 1
    T

    Brilliant!Thanks for sharing!


  • 0
    H

    So having this new b would increase the number of sub sequences (currently 2) with a number of 2, because it can be matched with the 2 as we saw before.

    Still not quite clear - I think it might be better to explain this in term of combination with new added elements.


  • 9
    S

    Imagine we have "ab" in hand as T. The index in the matrix is first going down and then going right, which means, the length of S is increasing when we are having the same length of T. When we are about to go to the next index (increase one character of S), the value should at least equal to the number of disctinct strings that previous S formed in the format of T. However, if the increased character of S is equal to T's current Character, the number of distinct substrings should also include the different combinations of T's previous character and S's new character. The number of combinations is dominated by the number of distinct strings that previous string of S can form in the format of previous T.

     For example:
     S = "aba"
           *
     T = "aa"
           *
    

    We are now at the position of *. We are about to increase the length of S by moving S's index to the next character. At this moment, we should already know "ab" has 0 distinct strings that can form in the format of T "aa". However, we now know the character we are about to use for S is equal to the last character of T : "a" == "a". In order to calculate the nunber of distinct strings that new S "aba" can form "aa", we should know the relationship between S ="ab" and T = "aa"(result is 0 in this case), before we adding the new character "a". Plus, we should also know the relationship between S = "ab" and T = "a" (result is 1 in this case).

    So 0 + 1 = 1. This means "aba" can only form one string of "aa" in the format of T.


  • 0
    X

    So helpful!! Thanks soooo much!!!!


  • 0
    J

    Thanks for sharing! Your explanation suddenly makes things brighter.


  • 0
    C

    Why mem[i+1][j+1] = mem[i+1][j];, not mem[i+1][j+1] = mem[i+1][j]+mem[i+1][j-1];? because mem[i+1][j] = mem[i+1][j-1]+other terms;?


  • 0
    B

    @Chengcheng.Pei because if the current character in S doesn't equal to current character T, then we have the same number of distinct subsequences as we had without the new character, which is mem[i+1][j]


  • 11
    Z

    Explanation to the state transition function

    dp[i][j] = dp[i-1][j] when s[i-1] != t[j-1] or dp[i][j] = dp[i-1][j] + dp[i-1][j-1] when s[i-1] == t[j-1],
    dp[i][j] represents the number of distinct subsequences for s[0, i-1] and t[0, t-1];

    We first keep in mind that s is the longer string [0, i-1], t is the shorter substring [0, j-1]. We can assume t is fixed, and s is increasing in character one by one (this is the key point).

    For example:
    t : ab--> ab --> ab --> ab
    s: a --> ac --> acb --> acbb

    The first case is easy to catch. When the new character in s, s[i-1], is not equal with the head char in t, t[j-1], we can no longer increment the number of distinct subsequences, it is the same as the situation before incrementing the s, so dp[i][j] = dp[i-1][j].

    However, when the new incrementing character in s, s[i-1] is equal with t[j-1], which contains two case:

    1. We don't match those two characters, which means that it still has original number of distinct subsequences, so dp[i][j] = dp[i-1][j].
    2. We match those two characters, in this way. dp[i][j] = dp[i-1][j-1];

    Thus, including both two case, dp[i][j] = dp[i-1][j] + dp[i-1][j-1]

    TALKING IS CHEAP:

    '''

    public int numDistinct(String s, String t) {
        int n = s.length();
        int m = t.length();
        
        int[][] dp = new int[n+1][m+1];
        
        for (int i = 0; i < n+1; i++) {
            dp[i][0] = 1;
        }
        
        for (int j = 1; j < m+1; j++) {
            dp[0][j] = 0;
        }
        
        for (int j = 1; j < m+1; j++) {
            for (int i = 1; i < n+1; i++) {
                dp[i][j] = dp[i-1][j];
                if (s.charAt(i-1) == t.charAt(j-1)) {
                    dp[i][j] += dp[i-1][j-1];
                }
            }
        }
        
        return dp[n][m];
    }
    

    '''


  • 3
    C

    @balint

    Maybe we could think in this way.

    if t[i] == s[j] then dp[i][j] == dp[i - 1][j - 1] + dp[i][j - 1] 
                    dp[i - 1][j - 1]( use s[j] . s[j] is the last character in current subsequences )
                    dp[i][j - 1] (don't use s[j]. )
    else dp[i][j] == dp[i][j - 1] 
    

  • 3

    Thanks for the nice solution! To make it clearer and simpler, I rephrase the code into:(With detailed comments inside)

    public int numDistinct(String s, String t) {
            int m = s.length(), n = t.length();
            int[][] dp = new int[m + 1][n + 1];
            // initialize the dp value when t is an empty string, number of subsequence of an empty string should be 1
            for(int i = 0; i < m; i ++){
                dp[i][0] = 1;
            }
            for(int i = 1; i <= m; i ++){
                for(int j = 1; j <= n; j ++){
                    //in both cases, the subsequence in String t should be ending with character t.charAt(j - 1)
                    if(s.charAt(i - 1) == t.charAt(j - 1)){
                        // when two pointers pointing to same character
                        // if we take these two characters simultaneously, we should have dp[i-1][j-1] subsequences
                        // otherwise if we overlook current i (moving back for one step) and keeping the current j we have another dp[i -1][j]
                        dp[i][j] = dp[i - 1][j - 1] + dp[i - 1][j];
                    }else{
                        // when two pointers pointing to difference characters
                        //we cannot take these two characters but we still should make j ending with pointing to current position
                        // then we should move i backward
                        dp[i][j] = dp[i - 1][j];
                    }
                }
            }
            return dp[m][n];
        }
    

    Hope it helps.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.