Improving the runtime from O(n) to O(log n)


  • 45
    L

    Let f[i][j][k] denote the # of valid sequences of length i where:

    1. There can be at most j A's in the entire sequence.
    2. There can be at most k trailing L's.

    We give the recurrence in the following code, which should be self-explanatory, and the final answer is f[n][1][2].

    public int checkRecord(int n) {
        final int MOD = 1000000007;
        int[][][] f = new int[n + 1][2][3];
    
        f[0] = new int[][]{{1, 1, 1}, {1, 1, 1}};
        for (int i = 1; i <= n; i++)
            for (int j = 0; j < 2; j++)
                for (int k = 0; k < 3; k++) {
                    int val = f[i - 1][j][2]; // ...P
                    if (j > 0) val = (val + f[i - 1][j - 1][2]) % MOD; // ...A
                    if (k > 0) val = (val + f[i - 1][j][k - 1]) % MOD; // ...L
                    f[i][j][k] = val;
                }
        return f[n][1][2];
    }
    

    The runtime of this solution is clearly O(n), using linear space (which can be easily optimized to O(1) though). Now, let's see how to further improve the runtime.

    In fact, if we treat f[i][][] and f[i-1][][] as two vectors, we can represent the recurrence of f[i][j][k] as follows:

    f[i][0][0]   | 0 0 1 0 0 0 |   f[i-1][0][0]
    f[i][0][1]   | 1 0 1 0 0 0 |   f[i-1][0][1]
    f[i][0][2] = | 0 1 1 0 0 0 | * f[i-1][0][2]
    f[i][1][0]   | 0 0 1 0 0 1 |   f[i-1][1][0]
    f[i][1][1]   | 0 0 1 1 0 1 |   f[i-1][1][1]
    f[i][1][2]   | 0 0 1 0 1 1 |   f[i-1][1][2]
    

    Let A be the matrix above, then f[n][][] = A^n * f[0][][], where f[0][][] = [1 1 1 1 1 1]. The point of this approach is that we can compute A^n using exponentiating by squaring (thanks to @StefanPochmann for the name correction), which will take O(6^3 * log n) = O(log n) time. Therefore, the runtime improves to O(log n), which suffices to handle the case for much larger n, say 10^18.
    Update: The final answer is f[n][1][2], which involves multiplying the last row of A^n and the column vector [1 1 1 1 1 1]. Interestingly, it is also equal to A^(n+1)[5][2] as the third column of A is just that vector. Credit to @StefanPochmann.

    Java Code:

    final int MOD = 1000000007;
    final int M = 6;
    
    int[][] mul(int[][] A, int[][] B) {
        int[][] C = new int[M][M];
        for (int i = 0; i < M; i++)
            for (int j = 0; j < M; j++)
                for (int k = 0; k < M; k++)
                    C[i][j] = (int) ((C[i][j] + (long) A[i][k] * B[k][j]) % MOD);
        return C;
    }
    
    
    int[][] pow(int[][] A, int n) {
        int[][] res = new int[M][M];
        for (int i = 0; i < M; i++)
            res[i][i] = 1;
        while (n > 0) {
            if (n % 2 == 1)
                res = mul(res, A);
            A = mul(A, A);
            n /= 2;
        }
        return res;
    }
    
    public int checkRecord(int n) {
        int[][] A = {
                {0, 0, 1, 0, 0, 0},
                {1, 0, 1, 0, 0, 0},
                {0, 1, 1, 0, 0, 0},
                {0, 0, 1, 0, 0, 1},
                {0, 0, 1, 1, 0, 1},
                {0, 0, 1, 0, 1, 1},
        };
        return pow(A, n + 1)[5][2];
    }
    

  • 4

    Very nice. Here's a Python solution based on it:

    import numpy as np
    
    class Solution(object):
        def checkRecord(self, n):
            A = np.matrix([[0, 0, 1, 0, 0, 0],
                           [1, 0, 1, 0, 0, 0],
                           [0, 1, 1, 0, 0, 0],
                           [0, 0, 1, 0, 0, 1],
                           [0, 0, 1, 1, 0, 1],
                           [0, 0, 1, 0, 1, 1]])
            power = A
            mod = 10**9 + 7
            while n:
                if n & 1:
                    power = (power * A) % mod
                A = A**2 % mod
                n /= 2
            return int(power[5, 2])
    

    Python's pow function even supports a modulo parameter, but unfortunately NumPy matrices don't support it, so I had to write the exponentiation myself.

    I compute An+1 because then the answer is simply one matrix element rather than the sum of the last row.

    Btw, I don't think this is "binary search". Wikipedia's whole Exponentiation by squaring article doesn't say "search", not even once.


  • 2

    Your pow (and mul) would be simpler if your mul created a result matrix and returned it:

    int[][] mul(int[][] A, int[][] B) {
        int[][] C = new int[6][6];
        for (int i = 0; i < 6; i++)
            for (int j = 0; j < 6; j++)
                for (int k = 0; k < 6; k++)
                    C[i][j] = (int) ((C[i][j] + (long) A[i][k] * B[k][j] % MOD) % MOD);
        return C;
    }
    
    
    int[][] pow(int[][] A, int n) {
        int[][] res = new int[6][6];
        for (int i = 0; i < 6; i++)
            res[i][i] = 1;
        while (n > 0) {
            if (n % 2 == 1)
                res = mul(res, A);
            A = mul(A, A);
            n /= 2;
        }
        return res;
    }

  • 1
    L

    @StefanPochmann Thanks! I have corrected the name.
    Good point of returning A^n[5][2]! We didn't even notice that the third column of A contains all ones.

    Regarding the mul function, indeed, I explicitly reuse the space such that the program would strictly use O(1) space rather than O(log n). It is true that the code could be handy in that way, but sometimes, I found that these intermediate matrix allocations/destructions can be a large overhead, especially for large matrices. Anyway, it might not be an issue here... :)


  • 0

    @lixx2100 Yeah, with so small and so few matrices, I don't think it's an issue and I prefer the simplicity. Then again, if I want simplicity, I won't use Java in the first place :-P

    But wouldn't it matter less for large matrices? I mean, the larger the matrices get, the more the cubic complexity of the multiplication should outweigh that overhead, no?

    We didn't even notice that the third column of A contains all ones.

    Are you a team or a king? :-)

    Wow, while Ruby's matrices apparently don't support element-wise modulo, it's fast enough that I can just work with the large numbers and do the modulo at the end. This "oneliner" gets accepted in about 1100 ms:

    require 'matrix'
    
    def check_record(n)
      (Matrix[[0, 0, 1, 0, 0, 0],
              [1, 0, 1, 0, 0, 0],
              [0, 1, 1, 0, 0, 0],
              [0, 0, 1, 0, 0, 1],
              [0, 0, 1, 1, 0, 1],
              [0, 0, 1, 0, 1, 1]]**(n+1))[5, 2] % (10**9 + 7)
    end
    

  • 1
    L

    @StefanPochmann Oh my... I didn't even notice that I wrote "we didn't even notice that." LOL

    I found that Ruby code is in some sense so elegant. Specifically, I quite like the champion's Q4 code in Contest 27.


  • 0

    Darn, Python is a bit too slow here for big integers, but it works if I use "self-moduloing" integers. This gets accepted in about 430 ms:

    import numpy as np
    
    class Mint:
        def __init__(self, value):
            self.value = value % (10**9 + 7)
        def __add__(self, other):
            return Mint(self.value + other.value)
        def __mul__(self, other):
            return Mint(self.value * other.value)
    
    class Solution(object):
        def checkRecord(self, n):
            O, I = Mint(0), Mint(1)
            A = np.matrix([[O, O, I, O, O, O],
                           [I, O, I, O, O, O],
                           [O, I, I, O, O, O],
                           [O, O, I, O, O, I],
                           [O, O, I, I, O, I],
                           [O, O, I, O, I, I]], dtype=object)
            return (A**(n+1))[5, 2].value

  • 1

    @lixx2100 Brilliant!


  • 1

    @lixx2100 said in Improving the runtime from O(n) to O(log n):

    I quite like the champion's Q4 code in Contest 27.

    That's indeed nice, though I wrote almost the same in Python so I'd say it doesn't show a Ruby advantage.

    Anyway, with my noobish Ruby knowledge and some googling I hacked self-moduloing in there as well now. There's definitely a better way, but this works and is decently fast (accepted in about 170 ms):

    require 'matrix'
    
    class Fixnum
      alias_method :add, :+
      alias_method :mul, :*
      def +(b)
        add(b) % 1000000007
      end
      def *(b)
        mul(b) % 1000000007
      end
    end
    
    def check_record(n)
      (Matrix[[0, 0, 1, 0, 0, 0],
              [1, 0, 1, 0, 0, 0],
              [0, 1, 1, 0, 0, 0],
              [0, 0, 1, 0, 0, 1],
              [0, 0, 1, 1, 0, 1],
              [0, 0, 1, 0, 1, 1]]**(n+1))[5, 2]
    end
    

  • 0
    J

    @lixx2100 Hi i am not understanding f[0] = new int[][]{{1, 1, 1}, {1, 1, 1}};. Why the base case is like this? As the length is 0 should not then all these entries be 0?


  • 2
    L

    @johirbuet When i = 0, there is a unique sequence, i.e., the empty sequence. Moreover, the empty sequence is a sequence with 0 A's globally and with 0 trailing L's. Therefore, we have f[0][i][j] = 1, for 0 <= i < 2 and 0 <= j < 3.


  • 0

    Thank you for your sharing .!!

            /**
             *                              1 1 0 1 0 0
             *                              1 0 1 1 0 0
             * dp_n[0,1,2,3,4,5]      *     1 0 0 1 0 0     = dp_n+1[0,1,2,3,4,5]
             *                              0 0 0 1 1 0
             *                              0 0 0 1 0 1
             *                              0 0 0 1 0 1
             */
    

    And there are my c++ verson , and base on matrix above and Recursive formula posted by @jasonshieh , Link,I think your solution would be more faster :). but this formula is more easy to understand.Maybe it's because I am not smart lol.

    dp[n][0][0] = dp[n-1][0][0] + dp[n-1][0][1] + dp[n-1][0][2] 
    dp[n][0][1] = dp[n-1][0][0]
    dp[n][0][2] = dp[n-1][0][1]
    dp[1][0] = ((dp[0][0] + dp[0][1] + dp[0][2] ) % m + (dp[1][0] + dp[1][1] + dp[1][2]) % m) % m;
    dp[n][1][1] = dp[n-1][1][0]
    dp[n][1][2] = dp[n-1][1][1]
    
    class Solution {
    
    const int mod = pow(10,9) + 7;
    const int size = 6;
    public:
        int checkRecord(int n) {
            vector<vector<int>> matrix = {{1,1,0,1,0,0},{1,0,1,1,0,0},{1,0,0,1,0,0},{0,0,0,1,1,0},{0,0,0,1,0,1},{0,0,0,1,0,0}};
             if(n==1)
                return 3;
             vector<vector<int>> res = getMatrix(matrix,n-1);
             int count = 0;
             for(int j=0;j<size;++j){
                 count = (count + res[0][j] % mod) % mod;
             }
             return count % mod;
        }
        
        vector<vector<int>> getMatrix(vector<vector<int>> &m1,int n){
            vector<vector<int>> res = {{1,1,0,1,0,0}};
            while(n){
                if(n%2){
                    res = MatrixMultiplication(res,m1);
                }
                m1 = MatrixMultiplication(m1,m1);
                n /= 2;
            }
            return res;
        }
        
        
        vector<vector<int>> MatrixMultiplication(vector<vector<int>> &m1,vector<vector<int>> &m2){
            int rs = m1.size(),cs = m2[0].size(),mid = m1[0].size();///默认矩阵有效
            vector<vector<int>> m3(rs,vector<int>(cs));
            for(int i = 0 ; i < rs ; ++i){//row
                for(int j = 0 ; j < cs ; ++j){//col
                    for(int k = 0 ; k < mid ; ++k){ // C[i][j] = A[i][k] * B[k][j];
                        m3[i][j] = ((m3[i][j]) + (long)m1[i][k]  * (long)m2[k][j] % mod) % mod;
                    }
                }
            }
            return m3;
        }
    };
    

    Also,according to 311. Sparse Matrix Multiplication , We can optimaze Matrix Multiplication funtion ,But In this problem ,It does have any obvious optimazation in runtime .

    vector<vector<int>> matrixMutiply(vector<vector<int>> &m1,vector<vector<int>> &m2){
            int rs = m1.size(),mid = m1[0].size(),cs = m2[0].size();
            vector<vector<int>> m3(rs,vector<int>(cs));
            for(int i = 0 ; i < rs ; ++i){
                for(int j = 0 ; j < mid ; ++j){
                    if(m1[i][j] == 0)
                        continue;
                    for(int k = 0 ; k < cs ; ++k){
                        if(m2[j][k] == 0)
                            continue;
                        m3[i][k] = ((m3[i][k]) + (long)m1[i][j] * (long)m2[j][k] % mod ) % mod;
                    }
                }
            }
            return m3;
        }
    

  • 1

    @lixx2100 @StefanPochmann You guys blow my mind


  • 3
    A

    For those of you befuddled by how the initial matrix is derived:

    f[i][0][0]   | 0 0 1 0 0 0 |   f[i-1][0][0]
    f[i][0][1]   | 1 0 1 0 0 0 |   f[i-1][0][1]
    f[i][0][2] = | 0 1 1 0 0 0 | * f[i-1][0][2]
    f[i][1][0]   | 0 0 1 0 0 1 |   f[i-1][1][0]
    f[i][1][1]   | 0 0 1 1 0 1 |   f[i-1][1][1]
    f[i][1][2]   | 0 0 1 0 1 1 |   f[i-1][1][2]
    

    The top row says that:

    f[i][0][0] = f[i-1][0][2]
    

    Why? We want at most 0 trailing L's, so the previous (i-1) case where we allow 0, 1, or 2 L's, we can simply add a 'P', and that get's the current (i) case. We don't need any of the f[i-1][1][x] cases, since we don't allow A's. And we don't need the f[i-1][0] and f[i-1][1] cases because those are already counted in the f[i-1][2] case.

    Moving on:

    f[i][0][1] = f[i-1][0][0] + f[i-1][0][2]
    

    Why? We want at most 1 trailing L, so we can either take a case with no L's (f[i-1][0]) and add an 'L', or we can take one with up to 2 L's (f[i-1][2]) and add a 'P'. f[i-1][0][1] isn't included here because adding an 'L' to it would give us 2 L's, which is outside the desired k of 1, and adding a 'P' to it is already covered by f[i-1][2], which covers 0, 1, or 2 trailing L's.


  • 0
    L

    said in Improving the runtime from O(n) to O(log n):

    if (k > 0) val = (val + f[i - 1][j][k - 1]) % MOD; // ...L

    Why when the new record ends with 'L' the val is only added by f[i - 1][j][k - 1]? From my understanding, part of the value f[i - 1][j][k] is also valid if the last 'L' doesn't increase the contiguous 'L' number.


  • 0

    @lixx2100 you blew my mind again! Thank you for your awesome code!


  • 0
    Z

    @li61 pay attention to the word "trailing"


  • 0
    J

    Nice idea. I've seen similar ideas in solving Fibonacci problem, where we could always transfer that problem into matrix's chain multiplication. And use same trick of computing pow(x,n) to reduce the time complexity from O(n) to O(logn)


  • 2
    P

    this is really not self explanatory


  • 0
    D

    @Avidiax Thank You so much for this explanation!! Very very useful in addition to the original answer...

    I was trying to figure this out, until I scrolled down, and saw this explanation.

    Thanks! Makes it much better to exactly understand the thinking process behind this solution


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.