Python, Straightforward with Explanation (Bonus O(N) solution)


  • 14

    We perform a "center expansion" among all possible centers of the palindrome.

    Let N = len(S). There are 2N-1 possible centers for the palindrome: we could have a center at S[0], between S[0] and S[1], at S[1], between S[1] and S[2], at S[2], etc.

    To iterate over each of the 2N-1 centers, we will move the left pointer every 2 times, and the right pointer every 2 times starting with the second (index 1). Hence, left = center / 2, right = center / 2 + center % 2.

    From here, finding every palindrome starting with that center is straightforward: while the ends are valid and have equal characters, record the answer and expand.

    def countSubstrings(self, S):
        N = len(S)
        ans = 0
        for center in xrange(2*N - 1):
            left = center / 2
            right = left + center % 2
            while left >= 0 and right < N and S[left] == S[right]:
                ans += 1
                left -= 1
                right += 1
        return ans
    

    Bonus: Implementing Manacher's algorithm can give a linear time solution with one additional line. I invite the reader to Google "Manacher's Algorithm" if interested in the explanation.

    def countSubstrings(self, S):
        def manachers(S):
            A = '@#' + '#'.join(S) + '#$'
            Z = [0] * len(A)
            center = right = 0
            for i in xrange(1, len(A) - 1):
                if i < right:
                    Z[i] = min(right - i, Z[2 * center - i])
                while A[i + Z[i] + 1] == A[i - Z[i] - 1]:
                    Z[i] += 1
                if i + Z[i] > right:
                    center, right = i, i + Z[i]
            return Z
    
        return sum((v+1)/2 for v in manachers(S))
    

  • 1
    S

    Could you please share some pointers to understand why there're 2*N - 1 centers, for first algorithm?


  • 1

    @parvez.h.shaikh Sure, I edited my post to explain it better.


  • 0
    8

    For "Manacher's Algorithm", I don't understand why the result is

    return sum((v+1)/2 for v in manachers(S))
    

    Can u explain it ?


  • 0

    @8aAsr5 Each integer in manachers(S) represents the maximum half-length of a palindrome at some center. For example, if the center is at the capital letter, 'A' will be 1, 'bAb' will be 3, 'cbAbc' will be 5. If the center is between two letters, say at |, then 'a|a' will be 2, 'ba|ab' will be 4, etc. From these numbers, (v+1)/2 is the correct count of the number of palindromes with this center.


  • 0
    B
    This post is deleted!

  • 0
    B
    This post is deleted!

  • 0
    S

    @awice said in Python, Straightforward with Explanation (Bonus O(N) solution):

    epresents the maximum half-length of a palindrome at some center.

    I still can not understand why (v + 1) / 2 is the count of the number of palindromes; instead of giving instances, could you give me any intuition behind this? thank you!


  • 0
    J

    Thanks for the nice solution.
    I have a question,since I was asked a similar problem in an interview, and that one is almost the same except that we need to return the number of distinct palindromes.
    My solution was quite naive, by using a hashmap to check whether we counted this palindrome or not, but I wonder, is there a better and tricky solution?


  • 0

    @shaniavina said in Python, Straightforward with Explanation (Bonus O(N) solution):

    @awice said in Python, Straightforward with Explanation (Bonus O(N) solution):

    epresents the maximum half-length of a palindrome at some center.

    I still can not understand why (v + 1) / 2 is the count of the number of palindromes; instead of giving instances, could you give me any intuition behind this? thank you!

    Say the longest palindrome with some given center C has radius R. Then, the substring with center C and radius R-1, R-2, R-3, ..., 0 are also palindromes. Example: abcdedcba is a palindrome with center e, radius 4: but e, ded, cdedc, bcdedcb, and abcdedcba are all palindromes.

    We are dividing by 2 because we were using half-lengths instead of lengths. For example we actually had a#b#c#d#e#d#c#b#a so our length is twice as big.


  • 0

    @JadenPan For an interview, what you did is quite fine. I don't think any interviewer expects Manacher's algorithm. So the rest of this post is purely academic, just out of interest in what is actually possible.


    Now, let's modify Manacher's algorithm.

    def manachers(S):
        A = '@#' + '#'.join(S) + '#$'
        Z = [0] * len(A)
        center = right = 0
        for i in range(1, len(A) - 1):
            if i < right:
                Z[i] = min(right - i, Z[2 * center - i])
            while A[i - Z[i] - 1] == A[i + Z[i] + 1]:
                Z[i] += 1
            if i + Z[i] > right:
                center, right = i, i + Z[i]
        return Z
    

    First, let's discuss this implementation of Manacher's algorithm further, including why it is linear.

    Our loop invariants will be that center, right is our knowledge of the palindrome with the largest right-most boundary with center < i, centered at center with right-boundary right. Also, i > center, and we've already computed all Z[j]'s for j < i.

    When i < right, we reflect i about center to be at some coordinate j = 2 * center - i. Then, limited to the interval with radius right - i and center i, the situation for Z[i] is the same as for Z[j].

    For example, if at some time center = 7, right = 13, i = 10, then for a string like A = '@#A#B#A#A#B#A#$', the center is at the '#' between the two middle 'A''s, the right boundary is at the last '#', i is at the last 'B', and j is at the first 'B'.

    Notice that limited to the interval [center - (right - center), right] (the interval with center center and right-boundary right), the situation for i and j is a reflection of something we have already computed. Since we already know Z[j] = 3, we can quickly find Z[i] = min(right - i, Z[j]) = 3.

    Now, why is this algorithm linear? The while loop only checks the condition more than once when Z[i] = right - i. In that case, for each time Z[i] += 1, it increments right, and right can only be incremented up to 2*N+2 times.


    Whew, that was a lot of information. Anyways, back to our original problem, of knowing the count of how many distinct palindromes there are, in linear time.

    In the case that we do not increment Z[i], it is because the palindrome found was already a match to the mirrored situation. Since we have counted those (mirrored) palindromes already, then we don't need to update our count.

    Otherwise, every Z[i]++ represents potentially a new palindrome found, and we find an amount of them linear to N (where N is the size of the string), since we proved that Z[i]++ only happens a linear number of times. In these cases, we should count that substring with indices in range [i - Z[i], i + Z[i]]. But dealing with strings S[i:j] is too expensive and would add an O(N) factor, making our algorithm O(N^2).

    We can mitigate the problem by dealing with string hashes instead of substrings. For those not familiar with this advanced concept, here is a primer: Let the hash of a word hash(W) be sum( a^i * W[i] for i in range(len(W)) ) % P, where P is some large prime, and a is some constant coprime to P. Then, if we compute in linear time prefix sums pre[i] = hash(W[:i]), we can compute (in constant time) queries hash(W[i:j]) = (pre[j] - pre[i]) * (a^{-1})^{i} % P where a^{-1} is the modular inverse of a mod P.

    This means (up to hash collision), we can know the number of unique substrings - it's just the number of unique hashes we've found. Note in the below implementation I didn't precompute pow(PinvQ, L, Q) etc. which doesn't make it a constant time query, but we could easily precompute these hashes (there are order N of them.)

    class StringHash(object):
        P = 41
        Q = 10**9 + 33
        R = 10**9 + 123  #I'm using two primes Q and R to reduce the chance of collision
        PinvQ = pow(P, Q-2, Q)
        PinvR = pow(P, R-2, R)
        
        def __init__(self, S):
            self.S = S
            self.prefix = [(0,0)]
            cq = cr = 0
            q = r = 1
            for i, x in enumerate(self.S):
                cq = (cq + ord(x) * q) % StringHash.Q 
                cr = (cr + ord(x) * r) % StringHash.R
                q = q * StringHash.P % StringHash.Q
                r = r * StringHash.P % StringHash.R
                self.prefix.append((cq, cr))
            
        def query(self, L, R):
            cq2, cr2 = self.prefix[R+1]
            cq1, cr1 = self.prefix[L]
            return ((cq2 - cq1) * pow(StringHash.PinvQ, L, StringHash.Q) % StringHash.Q, 
                    (cr2 - cr1) * pow(StringHash.PinvR, L, StringHash.R) % StringHash.R)
    
    def solve(S):
        A = '@#' + '#'.join(S) + '#$'
        shash = StringHash(A)
        seen = set(shash.query(i, i) for i in xrange(2, len(A)-1, 2))
        Z = [0] * len(A)
        center = right = 0
        for i in range(1, len(A) - 1):
            if i < right:
                Z[i] = min(right - i, Z[2 * center - i])
            while A[i - Z[i] - 1] == A[i + Z[i] + 1]:
                Z[i] += 1
                if A[i-Z[i]] != '#':
                    seen.add(shash.query(i - Z[i], i + Z[i]))
            if i + Z[i] > right:
                center, right = i, i + Z[i]
        return len(seen)
    

  • 1
    J

    @awice Awesome, string hash is a wonderful way to get rid of the O(N) factor.


  • 0
    F

    I don't understand how an O(n) solution is considered a "Bonus"? If it performs better, it's not a bonus, it's the correct answer...


  • 0

    @FlorenceMachine Well, because at the time I didn't want to write a lengthy explanation, so I offered the solution without comment. As I was writing one anyways for my editorial on this question, it made sense now to paste some of that explanation here.


  • 0
    D

    Thanks for your awesome solution. I just didn't understand what's the meaning of non-integers index in a string, like S[1/2] .........


  • 0

    @dingshiyao0217 I mean if the original string is say S = "abcd", we have S[0] = 'a', S[1] = 'b', S[2] = 'c', S[3] = 'd'; but with a new string of T = "a#b#c#d", we have T[0] = 'a', T[2] = 'b', T[4] = 'c', T[6] = 'd'.


  • 0
    D

    @awice Thanks very much for your detialed explanation. I understood the key idea of 'center' which includes the gap of string. like for a string with N length, the number of gaps will be N-1 . so the number of center is N + (N-1) = 2*N -1 .

    But in your algorithm
    for center in xrange(2*N - 1):
    left = center / 2
    right = left + center % 2
    while left >= 0 and right < N and S[left] == S[right]:

    if N=2 , so the center would be 0, 1,2 ,accordingly the left will be 0, 1/2 ,1 .
    So when the left is equal to 1/2 , the S[left] will be S[1/2]. So I don't understand what's the meaning of S[1/2] ???


  • 0

    @dingshiyao0217 In Python2, / represents integer division. Sorry for the confusion, I should have used //. Every variable is an integer.


  • 0
    D

    @awice  So it is ! I See, Thank you very much ! ! !


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.