Repeated String Match


  • 0

    Click here to see the full article post


  • 0
    F

    Why not take the lengths and do two string finds? One in A * ceil(B_length/A_length) and one in A * (ceil(B_length/A_length)+1). If both fail, there is no other way B can be a substring of any string in the set [A_repeated]. O(2) => O(1) time and O(2) => O(1) strings of length max(A, B) space.

    Subsequence is a different beast altogether.


  • 0
    W

    I think the time complexity for Approach 1 should be O(M*N).
    Besides, how do you get space complexity O(M+N)
    Could you please explain that clearer?


  • 0
    G

    I guess space for approach 1 should be $O(N)$. the string B will be concat at most $N/M+1$ times, so S.length should be at most $O(M*(N/M)+1)=O(N)$.


  • 0
    G

    em .. I just made a mistake before. number of concatenations was right, but the total length ofB should be M * (N/M+1), and when using big-Oh notation it would be O(N+M)


  • 0
    W

    @g63 Yeah, that's right. What about time complexity for approach 1 ?


  • 0
    G

    @wqmbisheng
    As the java code shows, the for segment use time of O(N/M) , since A would be concatenated at most N/M + 1 times.
    The first if segment use an indexOf method. I dont know how java's indexOf is implemented but i think it's just a brute-force string match like std::string.find in cpp. So the first if use time of O(N*S.length), while M*(N/M) <= S.length <= M*(N/M+1). so the complexity of this segment is at least O(N^2) , at most O(N*(M+N))
    The second if segment looks like the same, but now the S is an M longer than before. so the inequality becomes M*(N/M+1) <= S.length <= M*(N/M+2), and the complexity is O(N*(M+N)).
    so the complexity is O(M*(M+N)) when adding them together.


  • 0
    W

    For time complexity, the first "if" use time of O(N*(S.length - N)), because when the pointer points to S.length-N + 1,there is no need to go further. The left length of S is certainly smaller than N, so there would be no substring equals to B.
    In this way, the time complexity is O(N*M). I don't know whether this is correct.


  • 0
    H

    It took me more than I care to admit to figure out that this
    q = (len(B) - 1) // len(A) + 1
    is just a Ceil operation
    q = ceil(len(B) / len(A)) :D
    thank you for your explanation


  • 0
    B

    Why choose MOD = 1_000_000_007 ?


  • 0
    This post is deleted!

  • 1
    J

    Here is my take at supplementing the logic explained in Approach 1: If B has even a chance of being a sub-string of A, then the first character of B must exist somewhere in A. Actually, if you disregard the order, every character of B should exist somewhere in A, because A is the source of all the characters of S (because S is built out of copies of A). However, since the first character of B must exist in A, and B is a sub-string of S, then the index in S that marks the beginning of that sub-string must lie somewhere between 0 and one less than the length of A, or 0 <= i < len(A). If we want to know the maximum length that S needs to grow to before we can say that B is never going to be a sub-string of A, then we need to look at the case where the first character of B occurs at the greatest possible index of S, namely len(A) - 1.

    I realise this is not a complete explanation of Approach 1, but hopefully (if correct) it will at least provide another perspective that some may find useful. I struggled with understanding why we only needed to check up to q + 1 instances of A before we could determine that B was definitely not a sub-string of S. After reading Approach 1 several times and thinking about it for a bit (and writing out a few examples) it suddenly clicked, and I thought I should share my findings. In any case, just writing this out has helped me solidify my own understanding of the approach.


  • 0
    J

    @ben1205 I believe MOD should be (1) a fairly large number so that hash values are spread out more leading to less collisions, and (2) a number that is co-prime to 'p' so that it is possible to calculate the modular inverse.


  • 0
    J

    Here is my JavaScript implementation of Approach 2 (Rabin-Karp Rolling Hash). It took me a while to decipher the given Java implementation, so hopefully this code will make it much clearer what the given implementation does.

    function repeatedStringMatch(A, B) {
      
      if(!A || A.length === 0 || !B || B.length === 0) {
        return -1;
      }
    
      var q = Math.ceil(B.length / A.length);
      
      var a = new CyclicRollingHash(A, B.length);
      var b = new CyclicRollingHash(B, B.length);
      
      var max1 = A.length * q;
      var max2 = A.length + max1;
      
      while(a.end < max2) {
        
        if(a.hash === b.hash && isSubstring(A, B, a.start)) {
          return a.end < max1 ? q : q + 1;
        }
        
        a.roll();
      }
      
      return -1;
      
    }
    
    function isSubstring(A, B, offset) {
      for(var i = 0; i < B.length; ++i) {
        if(B[i] !== A[(offset + i) % A.length]) {
          return false;
        }
      }
      return true;
    }
    
    function CyclicRollingHash(text, width) {
      this.text = text;
      this.start = 0;
      this.end = width - 1;
      this.hash = 0;
      
      this.p = 113;
      this.MOD = 1000003; //smaller than 1000000007 so it doesn't take forever to compute inverse
      this.pInverse = this.getModularInverse(this.p, this.MOD);
      this.power = 1;
      
      for(var i = this.start; i <= this.end; ++i) {
        this.hash = (this.hash + this.power * this.text.codePointAt(i % this.text.length)) % this.MOD;
        this.power = (this.power * this.p) % this.MOD;
      }
      
      this.power = (this.power * this.pInverse) % this.MOD; //reverts last power 'increment' at the end of the hash loop
    }
    
    CyclicRollingHash.prototype.roll = function() {
      this.hash -= this.text.charCodeAt(this.start++ % this.text.length);
      this.hash *= this.pInverse;
      this.hash += this.power * this.text.charCodeAt(++this.end % this.text.length);
      this.hash %= this.MOD;
    }
    
    //https://rosettacode.org/wiki/Modular_inverse#C.2B.2B
    //no idea how it works
    CyclicRollingHash.prototype.getModularInverse =function(a, b) {
      var b0 = b, t, q;
      var x0 = 0, x1 = 1;
      if(b===1) return 1;
      while(a>1) {
        q = Math.floor(a/b);
        t = b, b = a % b, a = t;
        t = x0, x0 = x1 - q * x0, x1 = t;
      }
      if(x1 < 0) x1 += b0;
      return x1;
    }
    
    //slower but easier to understand version of getModularInverse
    CyclicRollingHash.prototype.getModularInverseNaive = function(num, mod) {
      num %= mod;
      
      for(var numInverse = 1; numInverse < mod; ++numInverse) {
        result = (num * numInverse) % mod;
        if(result === 1) {
          return numInverse;
        }
      }
      
      return -1;
    }
    
    //////////////////////////////////
    
    var t = console.assert;
    var f = repeatedStringMatch;
    
    function test() {
      t(f('abc','ca') === 2);
      t(f('abc','cabca') === 3);
      t(f('abc','d') === -1);
      t(f('abc','abc') === 1);
      t(f('','') === -1);
      t(f('a','aaaaa') === 5);
      t(f('a','') === -1);
      t(f('','a') === -1);
      t(f('abadc','bcaba') === -1);
      t(f('abc','abcabc') === 2);
      t(f('aabc','abca') === 2);
      t(f('aaabc','aabca') === 2);
      t(f('accdc','cabcd') === -1);
      t(f('axaxaya','axaya') === 1);
    }
    
    test();
    

  • 0
    K

    Here is my Solution

    class Solution {
    public int repeatedStringMatch(String A, String B) {
    StringBuilder mainString = new StringBuilder(A);
    int i=1;
    for(; mainString.length() < (B.length() + 2 * A.length()); i++ ){
    if( mainString.toString().contains(B)){
    return i;
    }else{
    mainString.append( A );
    }
    }

        return -1;
    }
    

    }


  • 0
    K

    Here's my java implementation.

    class Solution {
        public int repeatedStringMatch(String A, String B) {
            StringBuilder mainString = new StringBuilder(A);
            int i=1;
            for(; mainString.length() < (B.length() + 2 * A.length()); i++ ){
                if( mainString.toString().contains(B)){
                    return i;
                }else{
                    mainString.append( A );
                }  
            }
            
            return -1;
        }
    }
    

  • 0
    U

    Without considering language specific functions, I think the time complexity in approach 1 should be O(M*N) because:

    1. The first brutal force check of B against S[0:] will take N character-by-character matching tries.
    2. A second try will need to compare B against S[1:] if the 1st try fails, again will take N character matching tries.
    3. The worst situation is that we have to keep going until comparing B against S[M-1:], so the total iteration will be M, times the number of tries in each iteration N, and the final result is M*N character-by-character comparison.

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.