m-pointer technique -- Java O(S.length() * T.length()) solution


  • 2
    F

    Let n = S.length(), m = T.length(). We maintain an integer array p of length m serving as pointers which link each character in T to its counterpart in S. If p satisfies the following conditions:

    1. p[j] is a valid index into string S: 0 <= p[j] < n for any j with 0 <= j < m;

    2. There is a one-to-one character mapping from T to S: T[j] == S[p[j]] for any j with 0 <= j < m;

    3. p is sorted in strictly ascending order: p[j-1] < p[j] for any j with 1 <= j < m;

    then the string T will be a subsequence of the substring S[p[0], p[m-1]].

    For your convenience, here is an visualization of the p array. I'm only showing the indices of the characters in the T and S strings. The link, for example, from T[0] to S[1] means the character at T[0] is the same as the character at S[1], and we denote this link as p[0]=1. Similar for other links. All these links together (that is, the p array) form a mapping from the T string to a substring of the S string such that the former is a subsequence of the latter.

    T:     0,    1,    2,  ...,  m-1
           |     |     |   ...    |
    S:  0, 1, 2, 3, 4, 5,  ...,  n-2, n-1
    
    p: p[0]=1, p[1]=3, p[2]=5, ..., p[m-1]=n-2
    

    Potentially there could exist many mappings from T to S, so we cannot afford to check each of them one by one. Fortunately we can divide all these mappings into different groups according to the value of p[0], i.e., the link for the first character T[0]. For all the mappings within each group (that is, all these mappings have the same value of p[0]), we only need to examine one particular mapping obtained by the following greedy algorithm: for j = 1, 2, ..., m-1, we always choose the smallest p[j] that meets condition 2 yet does not violate conditions 1 and 3. The proof is as follows.

    Suppose the mapping produced by the greedy algorithm is p, and we have another mapping p' in the same group as p, that is, p[0] == p'[0]. Immediately we can conclude p[1] <= p'[1]. This is because both the characters in S at indices p[1] and p'[1] are equal to T[1], but our greedy algorithm will always pick the one with smallest index. This in turn will imply p[2] <= p'[2] for the same reason. If we continue in this fashion, we arrive at p[m-1] <= p'[m-1]. Note that the length of the substrings corresponding to the two mappings are given by p[m-1] - p[0] + 1 and p'[m-1] - p'[0] + 1, respectively, then the length of the former will always be no greater than the latter. Therefore the mapping produced by the greedy algorithm will always yield the minimum substring within that particular group.

    Again, for your reference, here is an visualization for the above greedy algorithm using the example given in the problem description:

    T:   b  d  e
    idx: 0  1  2
    
    S:   a  b  c  d  e  b  d  d  e
    idx: 0  1  2  3  4  5  6  7  8
    
    
    Group 1: p[0] = 1, corresponding to link T[0] --> S[1]
    
    Possible mappings: [1,3,4], [1,3,8], [1,6,8], [1,7,8]
    
    Greedy mapping: [1,3,4]
    
    
    Group 2: p[0] = 5, corresponding to link T[0] --> S[5]
    
    Possible mappings: [5,6,8], [5,7,8]
    
    Greedy mapping: [5,6,8]
    

    The above analyses suggest we only need to compute the greedy mapping for each group, and select the one that yields the minimum substring length out of all groups. Using the above example, we only need to compute the greedy mapping [1,3,4] for group 1 and the greedy mapping [5,6,8] for group 2 and ignore all other mappings, then select the one with minimum substring. In this case, the two substrings have the same length so the tie will be broken by choosing the one with smaller starting index, i.e., [1,3,4].

    So here is how the algorithm goes. We loop through the S string to identify each group of mappings. Once we find one, the aforementioned greedy algorithm is applied to find the greedy mapping for the remaining m-1 characters in T, starting from the second character T[1] towards the last character T[m-1]. If any link in the greedy mapping does not exist, then we conclude no mappings will be found for the current group and any subsequent groups, so we return the current minimum substring if it exists or empty string otherwise. Else we update the start and end indices (denoted as l and r) of the result substring if the above greedy mapping yields a smaller substring.

    Theoretically we don't even need to explicitly record the links for each characters in the T string (i.e., no need of the p array), like the following solution:

    public String minWindow(String S, String T) {
        int n = S.length(), m = T.length();
        int l = 0, r = n;
            
        for (int i = 0; i < n; i++) {
            if (S.charAt(i) != T.charAt(0)) continue;  // No group of mapping is found, so continue
            
            int k = i + 1;
            	
            for (int j = 1; j < m; j++, k++) {  // Greedy algorithm to find the greedy mapping
                while (k < n && S.charAt(k) != T.charAt(j)) k++;
                if (k == n) return (r == n ? "" : S.substring(l, r + 1));  // Link for T[j] does not exist, so return
            }
            
            if (k - 1 - i < r - l) {  // Update the result substring if one with a smaller length is found
                l = i; r = k - 1;
            }
        }
        
        return (r == n ? "" : S.substring(l, r + 1));
    }
    

    However, with the p array, early termination is possible, because links in the greedy mapping for previous groups may overlap with those for later groups, so we don't have to recompute them, which improves the efficiency of the algorithm by a lot. Here is a concrete example: T = "ab", S = "aaab". We have three groups with p[0] given by 0, 1, 2, respectively. The greedy mapping for these three groups are [0,3], [1,3] and [2,3], respectively. The above solution recomputes the link for T[1] = 'b' over and over again for each group, which leads to degraded time performance. Here is the more efficient version of the above solution:

    public String minWindow(String S, String T) {
        int n = S.length(), m = T.length();
        int l = 0, r = n;
        
        int[] p = new int[m];
            
        for (int i = 0; i < n; i++) {
            if (S.charAt(i) != T.charAt(0)) continue;  // No group of mapping is found, so continue
            
            p[0] = i;  // Group of mapping is found, so update the first link
            
            for (int j = 1, k = i + 1; j < m; j++, k++) {  // Greedy algorithm to find the greedy mapping
                if (k <= p[j]) break;    // Early termination, since the remaining links have been computed in previous groups
                while (k < n && S.charAt(k) != T.charAt(j)) k++;
                if (k == n) return (r == n ? "" : S.substring(l, r + 1));  // Link for T[j] does not exist, so return
                p[j] = k;  // Else update the link for T[j]
            }
                
            if (p[m - 1] - p[0] < r - l) {  // Update the result substring if one with a smaller length is found
                l = p[0]; r = p[m - 1];
            }
        }
            
        return (r == n ? "" : S.substring(l, r + 1));
    }
    

    The space complexity for this optimized solution is O(m) but the time complexity is a little bit tricky. We have three-nested loops, so the nominal time complexity is O(m * n^2). However, due to the early termination, the actual time complexity is O(m * n), where the worst scenario occurs when both S and T are repeating characters (the outermost loop will iterate n times while the middle loop will iterate m times, with the innermost loop degenerating to single operation). There is an alternative perspective to derive the time complexity, that is, for each character in the S string, it will be matched at most m times to those in the T string, therefore the time complexity will be O(m * n) = O(S.length() * T.length()). In contrast, the previous solution without the p array uses O(1) space but the worst case time complexity is O(m * n^2) as it will recompute the links for trailing characters in the T string over and over again.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.