The Math Reasoning behind the Most Voted Algorithm


  • 1
    R

    The algorithm posted in https://discuss.leetcode.com/topic/68206/easy-python-solution-with-explaination is the smartest algorithm I've seen. Also, algorithms using KMP such as https://discuss.leetcode.com/topic/67590/java-o-n/10 are tricky. However, it is hard to understand why they work clearly. Here I share a proof of the first algorithm. One may use a similar idea (the lemma below) to prove algorithms using KMP.

    The most voted algorithm is as follows.

    public boolean repeatedSubstringPattern(String s) {
    			int n = s.length();
    			String t = (s + s).substring(1, n + n - 1);
    			return t.indexOf(s) != -1;
    		}
    

    -- Proof of correctness --
    [Definition] We say a non-empty string s can be generated by a substring ss of s if s can be constructed by appending multiple copies of ss.

    [Theorem] A non-empty string s can be generated by a substring of s if and only if this.algorithm.return=true.

    To prove [Theorem], we need the following lemma.

    [Lemma] A non-empty string s can be generated by a substring of s if and only if there exist non-empty substrings s1, s2 of s such that s=s1+s2=s2+s1 (i.e., s is equal to a string constructed by rotating s (to left or right) by nonzero units). Moreover, if the latter holds, then s can be generated by s.substring(0, g), where g=gcd(s1.length, s2.length) is the greatest common divisor; While in particular, s, s1, and s2 can be generated by s.substring(0, g) = s1.substring(0, g) = s2.substring(0, g).

    [Proof of Lemma] It is trivial for the "only if" part, since if a non-empty string s can be generated by a substring ss of s, then s can be writen as s=ss+r=r+ss, where r=s.substring(ss.length), which contains ss as a substring, while ss is non-empty since s is not.

    For the "if" part, the "While" statement directly follows from its previous statement. It remains to show:
    [Statement] If there exist non-empty substrings s1, s2 of s such that s=s1+s2=s2+s1, then s can be generated by s.substring(0, gcd(s1.length, s2.length)).

    We will prove the statement by induction on n, the length of s. When n=2, the result is clear. Assuming that [Statement] is true for all strings of length less than n, we shall prove the case that for a string s having length n, there exist non-empty substrings s1, s2 of s such that s=s1+s2=s2+s1.

    Let li=si.length, for i=1,2. WLOG, assume l1>=l2. Since s1+s2=s2+s1, s1 can be written as s1=s2+s3, where s3=s1.substring(l2). If s3 is empty, then the proof is done, since s=s2+s2. Otherwise, s=s1+s2=s2+s1=s2+s3+s2=s2+s2+s3. If s3.length>=l2, then s3 can be written as s3=s2+s4, where s4=s3.substring(l2). If s4 is empty, then the proof is done, since s=s2+s2+s2. Otherwise, s=s2+s2+s4+s2=s2+s2+s2+s4.

    Repeating the process until s=(s2+...+s2)+s*+s2=s2+(s2+...+s2)+s*, with s*.length<l2, we have, again, if s*.length=0, then it is done. Otherwise, we have that s*+s2=s2+s*, both strings are non-empty, and the string s'=s*+s2 has length less than n. Employing the induction hypothesis, we obtain that s' can be generated by s'.substring(0, g), where g=gcd(s*.length, l2). Notice that gcd(s*.length, l2)=gcd(l1, l2)=g, since s1=(s2+...+s2)+s*. Also, we see that s'.substring(0, g)=s2.substring(0, g)=s.substring(0, g). Hence, s=(s2+...+s2)+s*+s2 can be generated by s.substring(0, gcd(l1, l2)), proving the case for a string of length n.

    By the induction principle, [Statement] holds. Therefore, [Lemma] holds.

    [Proof of Theorem] If a non-empty string s can be generated by a substring of s, then by [Lemma], s=s1+s2=s2+s1 for non-empty substrings s1, s2. Thus, (s+s)=s1+s2+s1+s2=s1+s+s2. Since s1, s2 are non-empty, this.algorithm.return=true.

    Conversely, if this.algorithm.return=true, then let i=t.indexOf(s)>=0, s1=s.substring(0, i+1), and s2=s.substring(i+1). Then s=s1+s2. Since t.indexOf(s)=i and (s+s)=s1+s2+s1+s2, we have that s=s2+s1. Also, s1=s.substring(0, i+1) and i>=0 imply that s1 is non-empty. Since (s+s)=s1+s2+s1+s2, from the last s2 and the formation of t, we have that s2 contains the last char of s, meaning that it is non-empty. Hence by [Lemma], s can be generated by s.substring(0, gcd(s1.length, s2.length)).

    (One may prove that this generator is the shortest, and that gcd(s1.length, s2.length)=s1.length=(i+1). Since otherwise, t.indexOf(s) would be smaller than i.)


  • 0
    C

    Great solution. Thanks for sharing.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.