Java solution with O(s1.length^2 * s2.length)


  • 1
    F

    To determine if a string str2 is contained in another string str1, there is a naive O(n) solution with n = str1.length: from start to end, simply match each character in str2 greedily with those in str1. If there is a such match for all characters in str2, we conclude str2 is contained in str1, otherwise it is not.

    To find the number of str2 contained in str1, we can proceed similarly except now we need to keep track of the number of matches found for str2. Still this can be done in O(n) time.

    So for our problem, "to find the maximum number of S2=[s2,n2] contained in S1=[s1,n1]", we have the following naive solution:

    public int getMaxRepetitions(String s1, int n1, String s2, int n2) {
        int M = 0, p = 0, q = 0;
        	
        for (int i = 0; i < n1; i++) {
        	for (int j = 0; j < s1.length(); j++) {
        	    if (s1.charAt(j) == s2.charAt(q)) {
        		q++;
        				
        		if (q == s2.length()) {
        		    p++;
        		    q = 0;
        		}
        			
        		if (p == n2) {
        		    M++;
        		    p = 0;
        		}
        	    }
        	}
        }
        	
        return M;
    }
    

    And as you would expect, this failed the extreme case when s1.length = 100 and n1 = 10^6 (i.e., S1.length = 10^8). So let's see how we can do better.

    The key idea here is to take advantage of the fact that S1 consists of repeating strings (i.e. s1). The naive solution ignored this information and did a blind scan for the whole string, which led to the poor time performance.

    First we will assume s1 contains at least once of each distinct character in s2, otherwise no S2 can be contained in S1. Second, instead of finding the number of S2 contained in S1, we will find the number of s2 contained in S1. The former can be obtained by dividing the latter by n2.

    To implement the above idea, here are some observations:

    1. Each character with index i (0 <= i < N1) in string S1 will correspond to some index j in string s1, with j = i % s1.length and N1 = S1.length.

    2. Each match of the whole string s2 in S1 will end at some index in S1. Starting from the first match, let's label the end index for each match in S1 as i1, i2, i3 ..., and the corresponding indices in s1 as j1, j2, j3 ....

    3. Let ip and iq be the end indices in S1 of the first two matches whose corresponding indices in s1, jp and jq, are equal. Let S1(ip, iq] denote the substring between them, l = iq - ip as its length and m as the total number of s2 contained in it. Then the substring S1(ip, N1) can be constructed by concatenating t times the substring S1(ip, iq] plus possibly some residual part S1(r, N1), where t = (N1 - ip - 1) / l (integer part) and r = ip + t * l + 1. And the total number of s2 contained in the substring S1(ip, r) will simply be the product m * l. Also the total length of the residual substring is less than l.
      (Note: for the substring notations, ( or ) means exclusive, [ or ] inclusive)

    4. It can be shown that if the two indices ip and iq in part 3 exist, they can be found in O(s1.length^2 * s2.length) time. The reasoning is as follows: first since s1 contains at least once of each distinct character in s2, starting from any index in S1, it will take at most s1.length * (s2.length + 1) characters to find a match of s2 in S1 (consider the worst case scenario in which it takes a whole s1 string to match each single character in s2). Second there are at most s1.length matches of s2 in S1 whose corresponding indices in s1 are all different, as each match must take one index in s1 and we have only up to s1.length different positions available. Therefore the index in s1 of the (s1.length + 1)-th match must coincide with some previous match. In total ip and iq will be found within s1.length^2 * (s2.length + 1) characters. This also sets the upper limit on l where l = iq - ip, which in turn bounds the total length of the residual substring in part 3.

    5. The total number of s2 contained in S1 can be obtained from three parts: the number contained in substring S1[0, ip], the number contained in substring S1(ip, r) and lastly the number contained in the residual substring S1[r, N1). The number in the second part can be obtained in constant time (simple mathematical computation) while those in the first and third parts can both be obtained in O(s1.length^2 * s2.length) time. Therefore our total time complexity will be at most O(s1.length^2 * s2.length).

    Here is the java program for the improved solution with brief explanation as follows:

    1. First make sure s1 contains at least once of each distinct character in s2.

    2. For each match of s2 in S1, we will record two piece of information: its end index in S1 and the total number of matches up to this end index. Since we will have at most s1.length "distinct" matches, we will use an array of the same length as s1 to hold matches found so far. The element in the array will be another array of length 2, whose first element denotes the end index and second element the total matches.

    3. We will proceed in the way described in the naive solution to identify possible matches. If we do not make it all the way to the end of S1, then a repeating match is found. We can obtain l, m and t to compute the number of s2 contained in the second part as well as scanning the residual substring to get that in the third part. The final answer will be the total number in all three parts divided by n2.

    public int getMaxRepetitions(String s1, int n1, String s2, int n2) {
        Set<Character> set = new HashSet<>(s1.length());
    
        for (char ch : s1.toCharArray()) set.add(ch);
    
        for (char ch : s2.toCharArray()) {
        	if (!set.contains(ch)) return 0;
        }
        	
        int[][] arr = new int[s1.length()][2];
    
        for (int i = 0; i < s1.length(); i++) {
            arr[i][0] = -1;
        }
        	
        int i = 0, j = 0, k = 0, c = 0, N1 = n1 * s1.length();
        	
        for (; i < N1; i++, j = i % s1.length(), k %= s2.length()) {
        	if (s1.charAt(j) == s2.charAt(k) && ++k == s2.length()) {
        	    c++;
    	    if (arr[j][0] >= 0) break;
    	    arr[j][0] = i;
    	    arr[j][1] = c;
    	}
        }
        	
        if (i < N1) {
            int l = i - arr[j][0];
            int m = c - arr[j][1];
            int t = (N1 - arr[j][0] - 1) / l;
            c = arr[j][1] + m * t;
            i = arr[j][0] + l * t + 1;
            	
            for (j = i % s1.length(), k = 0; i < N1; i++, j = i % s1.length(), k %= s2.length()) {
                if (s1.charAt(j) == s2.charAt(k) && ++k == s2.length()) {
        		c++;
        	    }
            }
        }
        	
        return c / n2;
    }
    

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.