C++ solution inspired by @70664914 with organized explanation


  • 29

    It's easy to come up with a brute force solution and to find that there will be a repetitive pattern when matching S2 through S1. The only problem is how to use the repetitive pattern to save computation.

    Fact:
    If s2 repeats in S1 R times, then S2 must repeats in S1 R / n2 times.
    Conclusion:
    We can simply count the repetition of s2 and then divide the count by n2.

    How to denote repetition:
    We need to scan s1 n1 times. Denote each scanning of s1 as a s1 segment.
    After each scanning of i-th s1 segment, we will have

    1. The accumulative count of s2 repeated in this s1 segment.
    2. A nextIndex that s2[nextIndex] is the first letter you'll be looking for in the next s1 segment.

    Suppose s1="abc", s2="bac", nextIndex will be 1; s1="abca", s2="bac", nextIndex will be 2

    It is the nextIndex that is the denotation of the repetitive pattern.

    Example:

    Input:
    s1="abacb", n1=6
    s2="bcaa", n2=1
    
    Return:
    3
    
                        0  1   2 3 0      1    2 3 0      1    2 3 0  
    S1 --------------> abacb | abacb | abacb | abacb | abacb | abacb 
    repeatCount ----->    0  |   1   |   1   |   2   |   2   |   3
    Increment of 
    repeatCount     ->    0  |   1   |   0   |   1   |   0   |   1
    nextIndex ------->    2  |   1   |   2   |   1   |   2   |   1
    

    The nextIndex has s2.size() possible values, ranging from 0 to s2.size() - 1. Due to PigeonHole principle, you must find two same nextIndex after scanning s2.size() + 1 s1 segments.

    Once you meet a nextIndex you've met before, you'll know that the following nextIndexs and increments of repeatCount will repeat a pattern.

    So let's separate the s1 segments into 3 parts:

    1. the prefix part before repetitive pattern
    2. the repetitive part
    3. the suffix part after repetitive pattern (incomplete repetitive pattern remnant)

    All you have to do is add up the repeat counts of the 3 parts.

    class Solution {
    public:
        int getMaxRepetitions(string s1, int n1, string s2, int n2) {
            vector<int> repeatCount(s2.size() + 1, 0);
            vector<int> nextIndex(s2.size() + 1, 0);
            int j = 0, cnt = 0;
            for (int k = 1; k <= n1; ++k) {
                for (int i = 0; i < s1.size(); ++i) {
                    if (s1[i] == s2[j]) {
                        ++j;
                        if (j == s2.size()) {
                            j = 0;
                            ++cnt;
                        }
                    }
                }
                repeatCount[k] = cnt;
                nextIndex[k] = j;
                for (int start = 0; start < k; ++start) {
                    if (nextIndex[start] == j) {
                        int prefixCount = repeatCount[start];
                        int patternCount = (repeatCount[k] - repeatCount[start]) * (n1 - start) / (k - start);
                        int suffixCount = repeatCount[start + (n1 - start) % (k - start)] - repeatCount[start];
                        return (prefixCount + patternCount + suffixCount) / n2;
                    }
                }
            }
            return repeatCount[n1] / n2;
        }
    };
    

  • 2

    Due to PigeonHole principle, you must find two same nextIndex after scanning s2.size() + 1 s1 segments.

    Awesome! It's the key to take a good advantage of the repetitive pattern!


  • 0
    Y

    Thanks for your answer, and I found it also sovlable after erasing the "suffixCount".


  • 0
    C

    Can you further explain this part?

            repeatCount[k] = cnt;
            nextIndex[k] = j;
            for (int start = 0; start < k; ++start) {
                if (nextIndex[start] == j) {
                    int prefixCount = repeatCount[start];
                    int patternCount = (repeatCount[k] - repeatCount[start]) * (n1 - start) / (k - start);
                    int suffixCount = repeatCount[start + (n1 - start) % (k - start)] - repeatCount[start];
                    return (prefixCount + patternCount + suffixCount) / n2;
                }
            }

  • 3

    @coder2

    repeatCount[k] = cnt; // record the k-th repeatCount and nextIndex
    nextIndex[k] = j;
    for (int start = 0; start < k; ++start) {
        if (nextIndex[start] == j) { // see if you have met this nextIndex before
            // if found, you can calculate the 3 parts
            int prefixCount = repeatCount[start]; // prefixCount is the start-th repeatCount
            // (repeatCount[k] - repeatCount[start]) is the repeatCount of one occurrance of the pattern
            // There are (n1 - start) / (k - start) occurrances of the pattern
            // So (repeatCount[k] - repeatCount[start]) * (n1 - start) / (k - start) is the repeatCount of the repetitive part
            int patternCount = (repeatCount[k] - repeatCount[start]) * (n1 - start) / (k - start);
            // The suffix contains the incomplete repetitive remnant (if any)
            // Its length is (n1 - start) % (k - start)
            // So the suffix repeatCount should be repeatCount[start + (n1 - start) % (k - start)] - repeatCount[start]
            int suffixCount = repeatCount[start + (n1 - start) % (k - start)] - repeatCount[start];
            return (prefixCount + patternCount + suffixCount) / n2;
        }
    }
    

  • 0
    C

    Thank you!

    I still don't understand why (prefixCount + patternCount + suffixCount) are added up.

    How are those variables correspond to the S1 --------------> abacb | abacb | abacb | abacb | abacb | abacb


  • 0
    C

    I stepped into your code and printed out the k as follows. And realize that you didn't even finish scan all the s1 n1 times, then the pattern can be found.
    But I'm still confused! Sigh

    class Solution {
    public:
    int getMaxRepetitions(string s1, int n1, string s2, int n2) {
    vector<int> repeatCount(s2.size() + 1, 0);
    vector<int> nextIndex(s2.size() + 1, 0);
    int j = 0, cnt = 0;
    for (int k = 1; k <= n1; ++k) {
    for (int i = 0; i < s1.size(); ++i) {
    if (s1[i] == s2[j]) {
    ++j;
    if (j == s2.size()) {
    j = 0;
    ++cnt;
    }
    }
    }
    repeatCount[k] = cnt;
    cout<< k <<endl;
    nextIndex[k] = j;
    for (int start = 0; start < k; ++start) {
    if (nextIndex[start] == j) {
    cout<< "before return" <<endl;
    int prefixCount = repeatCount[start];
    int patternCount = (repeatCount[k] - repeatCount[start]) * (n1 - start) / (k - start);
    int suffixCount = repeatCount[start + (n1 - start) % (k - start)] - repeatCount[start];
    //cout<< prefixCount <<endl;
    //cout<< patternCount<< endl;
    return (prefixCount + patternCount + suffixCount) / n2;
    }
    }
    }

        //cout<< "before repeatCount[n1]"<<endl;
        return repeatCount[n1] / n2;      
    }
    

    };


  • 1

    @lzl124631x That's a elegant solution! Actually, you can avoid the following loop by making nextIndex a map with (j, k) pair so that you can access the previous j directly. Remember to put(0, 0) in the beginning.

               for (int start = 0; start < k; ++start) {
                    if (nextIndex[start] == j) {
                        int prefixCount = repeatCount[start];
                        int patternCount = (repeatCount[k] - repeatCount[start]) * (n1 - start) / (k - start);
                        int suffixCount = repeatCount[start + (n1 - start) % (k - start)] - repeatCount[start];
                        return (prefixCount + patternCount + suffixCount) / n2;
                    }
                }
    

  • 0
    M

    Very elegant solution. Here is the Java version:

        public int getMaxRepetitions(String s1, int n1, String s2, int n2) {
            int len1 = s1.length(), len2 = s2.length();
            int[] repeatCount = new int[len2 + 1];
            int[] nextIndex   = new int[len2 + 1];
            int j = 0, cnt = 0;
            for (int k = 1; k <= n1; k++){
                for (int i = 0; i < len1; i++){
                    if (s1.charAt(i) == s2.charAt(j)){
                        j++;
                        if (j == len2){
                            j = 0; 
                            cnt++;
                        }
                    }
                }
                repeatCount[k] = cnt;
                nextIndex[k] = j;
                for (int start = 0; start < k; start++){
                    if (nextIndex[start] == j) {
                        int prefixCount = repeatCount[start];
                        int patternCount = (repeatCount[k] - repeatCount[start]) * (n1 - start)/(k - start);
                        int suffixCount = repeatCount[start + (n1 - start) % (k-start)]- repeatCount[start];
                        return (prefixCount + patternCount + suffixCount) / n2;
                    }
                }
            }
            return repeatCount[n1] / n2;
        }
    

  • 0
    N

    @lzl124631x
    How about replacing your nextIndex int array with a HashMap? Therefore linear scanning becomes constant time querying.

    For example:

    Map<Integer, Integer> nextIndex = new HashMap<>();
    nextIndex.put(0,0);
    ...
        if (nextIndex.containsKey(j)) {
            int start = nextIndex.get(j);
            ...
        }
        nextIndex.put(j, k);
    

    Oops. After I post this reply. I found my idea is the same as @lufangjianle.


  • 0

    I'm not sure if I fully understand your solution but I have a similar solution. My solution is still trying to find the occurrence of s2 in n1 times s1. The difference is instead of finding the first letter of s2 I'm trying to find the last letter of s2. Our solution here is somehow similar to finding the linked list cycle that when there is a repeated pattern is found, we advance a big number of characters. Here is the source code:

    int getMaxRepetitions(string s1, int n1, string s2, int n2) {
            unordered_map<size_t,pair<size_t,size_t>> rep;
            if(s2.empty()) return 0;
            if(n2==0) return 0;
            const auto L1=s1.size(),L2=s2.size();
            size_t matched=0;
            size_t cnt=0;
            bool skip=false;
            for(size_t i=0;i<L1*n1;i++){
                if(s1[i%L1]==s2[matched])
                    matched++;
                if(matched==L2)
                {
                    cnt++;
                    matched=0;
                    if(!skip){
                        auto ite=rep.find(i%L1);
                        if(ite==rep.end())
                        {
                            rep[i%L1]=make_pair(i,cnt);
                        }else{
                            size_t span=i-ite->second.first;
                            size_t cntdiff=cnt-ite->second.second;
                            size_t next=((L1*n1-i-1)/span); // L1*n1-i-1 is the number of remaining characters after and not inclusive the current repeated pattern in s1.
                            cnt+=next*cntdiff;
                            i+=next*span;
                            skip=true; // There is no need to find the repeated pattern anymore. Not sure if this is really necessary.
                        }
                    }
                }
            }
            return cnt/n2;
        }
    

  • 0
    Z

    Great solution!
    I have a minor optimization. In order to remove the inner loop to search for nextIndex[start], which is O(n^2) and n is s2.size(), we can use a hash table visited[k]. It is used to record how many passes of s1 is required to get nextIndex of k. For example, visited[2] = p means after p passes of s1, the index in string s2 is 2. And the vector<int> nextIndex is not necessary any more.
    Also prefixCount + suffixCount = repeatCount[start + (n1 - start) % (k - start)], which can be simplified.

    class Solution {
    public:
        int getMaxRepetitions(string s1, int n1, string s2, int n2) {
            int m1 = s1.size(), m2 = s2.size();
            vector<int> repeated(m2+1,0), visited(m2+1, -1); 
            visited[0] = 0;
            int k = 0, cnt = 0;
            for (int i = 1; i <= n1; i++) {
                for (int j = 0; j < m1; j++) {
                    if (s1[j] == s2[k]) {
                        k++;
                        if (k == m2) {
                            k = 0;
                            cnt++;
                        }
                    }
                }
                if (visited[k] == -1) {
                    repeated[i] = cnt;
                    visited[k] = i;
                }
                else {
                    int start = visited[k], p = i-start, t = cnt-repeated[start];
                    int ans = (n1-start)/p*t + repeated[(n1-start)%p+start];
                    return ans/n2;
                }
            }
            return cnt/n2;
        }
    };
    

  • 0
    R

    There is an error in this line

    int patternCount = (repeatCount[k] - repeatCount[start]) * (n1 - start) / (k - start);
    

    It should calculate (n1-start)/(k-start) first

    int patternCount = (repeatCount[k] - repeatCount[start]) * ((n1 - start) / (k - start));
    

  • 0
    R

    @lzl124631x you get most of it right. Just this line
    int patternCount = (repeatCount[k] - repeatCount[start]) *(n1 - start) / (k - start);
    Correct to:
    int patternCount = (n1 - start) / (k - start) * (repeatCount[k] - repeatCount[start]);
    for example
    3 * 3 / 2 ( 4) != 3/2 * 3 (3)


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.