To determine if a string str2
is contained in another string str1
, there is a naive O(n)
solution with n = str1.length
: from start to end, simply match each character in str2
greedily with those in str1
. If there is a such match for all characters in str2
, we conclude str2
is contained in str1
, otherwise it is not.
To find the number of str2
contained in str1
, we can proceed similarly except now we need to keep track of the number of matches found for str2
. Still this can be done in O(n)
time.
So for our problem, "to find the maximum number of S2=[s2,n2]
contained in S1=[s1,n1]
", we have the following naive solution:
public int getMaxRepetitions(String s1, int n1, String s2, int n2) {
int M = 0, p = 0, q = 0;
for (int i = 0; i < n1; i++) {
for (int j = 0; j < s1.length(); j++) {
if (s1.charAt(j) == s2.charAt(q)) {
q++;
if (q == s2.length()) {
p++;
q = 0;
}
if (p == n2) {
M++;
p = 0;
}
}
}
}
return M;
}
And as you would expect, this failed the extreme case when s1.length = 100
and n1 = 10^6
(i.e., S1.length = 10^8
). So let's see how we can do better.
The key idea here is to take advantage of the fact that S1
consists of repeating strings (i.e. s1
). The naive solution ignored this information and did a blind scan for the whole string, which led to the poor time performance.
First we will assume s1
contains at least once of each distinct character in s2
, otherwise no S2
can be contained in S1
. Second, instead of finding the number of S2
contained in S1
, we will find the number of s2
contained in S1
. The former can be obtained by dividing the latter by n2
.
To implement the above idea, here are some observations:

Each character with index
i
(0 <= i < N1
) in stringS1
will correspond to some indexj
in strings1
, withj = i % s1.length
andN1 = S1.length
. 
Each match of the whole string
s2
inS1
will end at some index inS1
. Starting from the first match, let's label the end index for each match inS1
asi1, i2, i3 ...
, and the corresponding indices ins1
asj1, j2, j3 ...
. 
Let
ip
andiq
be the end indices inS1
of the first two matches whose corresponding indices ins1
,jp
andjq
, are equal. LetS1(ip, iq]
denote the substring between them,l = iq  ip
as its length andm
as the total number ofs2
contained in it. Then the substringS1(ip, N1)
can be constructed by concatenatingt
times the substringS1(ip, iq]
plus possibly some residual partS1(r, N1)
, wheret = (N1  ip  1) / l
(integer part) andr = ip + t * l + 1
. And the total number ofs2
contained in the substringS1(ip, r)
will simply be the productm * l
. Also the total length of the residual substring is less thanl
.
(Note: for the substring notations,(
or)
means exclusive,[
or]
inclusive) 
It can be shown that if the two indices
ip
andiq
in part 3 exist, they can be found inO(s1.length^2 * s2.length)
time. The reasoning is as follows: first sinces1
contains at least once of each distinct character ins2
, starting from any index inS1
, it will take at mosts1.length * (s2.length + 1)
characters to find a match ofs2
inS1
(consider the worst case scenario in which it takes a wholes1
string to match each single character ins2
). Second there are at mosts1.length
matches ofs2
inS1
whose corresponding indices ins1
are all different, as each match must take one index ins1
and we have only up tos1.length
different positions available. Therefore the index ins1
of the(s1.length + 1)
th match must coincide with some previous match. In totalip
andiq
will be found withins1.length^2 * (s2.length + 1)
characters. This also sets the upper limit onl
wherel = iq  ip
, which in turn bounds the total length of the residual substring in part 3. 
The total number of
s2
contained inS1
can be obtained from three parts: the number contained in substringS1[0, ip]
, the number contained in substringS1(ip, r)
and lastly the number contained in the residual substringS1[r, N1)
. The number in the second part can be obtained in constant time (simple mathematical computation) while those in the first and third parts can both be obtained inO(s1.length^2 * s2.length)
time. Therefore our total time complexity will be at mostO(s1.length^2 * s2.length)
.
Here is the java program for the improved solution with brief explanation as follows:

First make sure
s1
contains at least once of each distinct character ins2
. 
For each match of
s2
inS1
, we will record two piece of information: its end index inS1
and the total number of matches up to this end index. Since we will have at mosts1.length
"distinct" matches, we will use an array of the same length ass1
to hold matches found so far. The element in the array will be another array of length 2, whose first element denotes the end index and second element the total matches. 
We will proceed in the way described in the naive solution to identify possible matches. If we do not make it all the way to the end of
S1
, then a repeating match is found. We can obtainl
,m
andt
to compute the number ofs2
contained in the second part as well as scanning the residual substring to get that in the third part. The final answer will be the total number in all three parts divided byn2
.
public int getMaxRepetitions(String s1, int n1, String s2, int n2) {
Set<Character> set = new HashSet<>(s1.length());
for (char ch : s1.toCharArray()) set.add(ch);
for (char ch : s2.toCharArray()) {
if (!set.contains(ch)) return 0;
}
int[][] arr = new int[s1.length()][2];
for (int i = 0; i < s1.length(); i++) {
arr[i][0] = 1;
}
int i = 0, j = 0, k = 0, c = 0, N1 = n1 * s1.length();
for (; i < N1; i++, j = i % s1.length(), k %= s2.length()) {
if (s1.charAt(j) == s2.charAt(k) && ++k == s2.length()) {
c++;
if (arr[j][0] >= 0) break;
arr[j][0] = i;
arr[j][1] = c;
}
}
if (i < N1) {
int l = i  arr[j][0];
int m = c  arr[j][1];
int t = (N1  arr[j][0]  1) / l;
c = arr[j][1] + m * t;
i = arr[j][0] + l * t + 1;
for (j = i % s1.length(), k = 0; i < N1; i++, j = i % s1.length(), k %= s2.length()) {
if (s1.charAt(j) == s2.charAt(k) && ++k == s2.length()) {
c++;
}
}
}
return c / n2;
}