Share my O(n) C++ accumulation solution with thinking process and explanation


  • 0
    K

    1. Problem


    Given a binary array, find the maximum length of a contiguous subarray with equal number of 0 and 1.


    2. Thinking process


    2.1 Calculating number 1 in a contiguous subarray


    Before introducing the way to calculate number of 0 and 1 in a contiguous subarray, we focus on the number of 0 and 1 in the whole array.


    Since the array is a binary array, suppose the array is

    a(1), a(2), ..., a(n), a(k) = 0 or 1, 1 ≤ k ≤ n.

    It can be inferred that

    The number of 1 in a(n) is S(n) = a(1) + a(2) + ... + a(n).

    which has a recursion formula

    S(n) = a(1), n = 1

    S(n) = a(n) + S(n - 1), n > 1


    Now we focus on a contiguous subarray from a(i) to a(j) in a(n).

    Suppose the summation of the subarray is

    T(i, j) = a(i) + a(i + 1) + ... + a(j - 1) + a(j), 1≤ i < j ≤ n.

    It can be inferred that

    T(i, j) = S(j), i = 1.

    T(i, j) = S(j) - S(i - 1), i > 1.

    Since the array is a binary array, it can be inferred that

    The number of 1 in the subarray is T(i, j).


    2.2 Calculating number 0 in a contiguous subarray


    Suppose U(n) is the number of 0 in the array

    a(1), a(2), ..., a(n), a(k) = 0 or 1, 1 ≤ k ≤ n.

    Then, the number of 0 in a contiguous subarray from a(i) to a(j) in a(n) is

    V(i, j) = U(j), i = 1.

    V(i, j) = U(j) - U(i - 1), i > 1.


    2.3 Let number of 0 and 1 be equal in the subarray


    This means

    T(i, j) = V(i, j)

    It can be inferred that

    S(j) = U(j), i = 1

    S(j) - S(i - 1) = U(j) - U(i - 1), i > 1


    The first formula - when i = 1, the subarray starts from the beginning of the array.

    We only need to check whether

    U(j) - S(j) = 0.


    The second formula - when i > 1.

    We get

    S(j) - S(i - 1) = U(j) - U(i - 1)

    If we change the formula, we get

    U(j) - S(j) = U(i - 1) - S(i - 1)

    If we can find 1≤ i < j ≤ n meets the formulas above, the subarray from a(i) to a(j) will meet the requirement.


    2.4 The key point and calculating the maximum length of a required subarray


    Till now, we find the key point is to calculate

    Key(j) = U(j) - S(j) = j - S(j) - S(j) = j - 2 × S(j).


    If we do iteration from j = 1 to j = n, and calculate Key(j) and save it to a hash table as key.

    There are 2 situations.


    The first time Key(j) appears is when j = F(Key(j))

    The last time Key(j) appears is when j = L(Key(j))

    Key(j) = 0

    The maxmium subarray's length is L(0) - from a(1) to a(q).

    Key(j) ≠ 0 appeared n(n > 1) times

    The maxmium subarray's length is L(Key(j)) - F(Key(j)) + 1 - from a(F(Key(j))) to a(L(Key(j))).


    The total maxmium length is

    max{max[Key(j) = 0][L(Key(j)], max[Key(j) ≠ 0][L(Key(j)) - F(Key(j)) + 1]}.


    3. Algorithm


    3.1 Special cases


    If the size of array < 2, there will be no subarray that meets the requirement.

    • The function returns 0.

    3.2 Normal situation


    In order to save memory and make the algothrim simple, when doing iteration from j = 1 to j = n,


    If

    Key(j) = 0 - update the total maximum by j.

    Key(j) ≠ 0 - save F(Key(j)) to the hash table, update the total maximum by j - F(Key(j)).


    4. Complexity analysis


    4.1 Time complexity


    Since the algothrim is one-pass from 1 to n.

    The time complexity is O(n).


    4.2 Space complexity


    As we know

    Key(j) = U(j) - S(j) = j - 2 × S(j), 0 ≤ S(j) ≤ j, 0 ≤ j ≤ n.

    We can infer that

    - j ≤ Key(j) ≤ j, 0 ≤ j ≤ n.

    So

    - n ≤ Key(j) ≤ n.

    That is to say

    The space complexity is O(n).


    5. Runtime optimization


    By using std::map as hash table, the runtime is

    162ms (beats 41.56% of cpp submissions).


    In order to optimize runtime, I use an interger array instead.

    Since Key(j) can be negative integer, and negative integer can't be an index of an array,

    I can

    normalize Key(j) from [- n, n] to [0, 2n] by adding n to Key(j).

    and

    The size of the interger array is 2n + 1.


    After optimization, the runtime is

    99ms (beats 96.50% of cpp submissions).


    6. Code


    6.1 Using std::map [Before Optimization]


    class Solution {
    public:
        int findMaxLength(vector<int>& nums) {
            if(nums.size() < 2) return 0;        
            int maxLength = 0;
            map<int, int> lowIndex;    
        
            //sum-up
            int i = 1;
            while(true)
            {
                //number of 1s when <= i: nums[i - 1]
                //number of 0s and 1s <= i: i
                int diff = i - 2 * nums[i - 1];
                if(diff == 0)
                {
                    if(maxLength < i) maxLength = i;
                }else{
                    if(lowIndex.find(diff) == lowIndex.end())
                    {
                        lowIndex[diff] = i - 1;
                    }else{
                        if(i - 1 - lowIndex[diff] > maxLength) maxLength = i - 1 - lowIndex[diff];
                    }
                }
                if(i == nums.size()) return maxLength;
                nums[i] += nums[i - 1];
                i++;
            }
        }
    };
    

    6.2 Using integer array [After Optimization]


    class Solution {
    public:
        int findMaxLength(vector<int>& nums) {
            if(nums.size() < 2) return 0;        
            int maxLength = 0;        
            int *lowIndex = new int[2 * nums.size() + 1];        
            for(int i = 0; i < 2 * nums.size() + 1; i++) lowIndex[i] = -1;
    
            //sum-up
            int i = 1;
            while(true)
            {
                //number of 1s when <= i: nums[i - 1]
                //number of 0s and 1s <= i: i
                int diff = i - 2 * nums[i - 1];
                if(diff == 0)
                {
                    if(maxLength < i) maxLength = i;
                }else{
                    if(lowIndex[diff + nums.size()] == -1)
                    {
                        lowIndex[diff + nums.size()] = i - 1;
                    }else{
                        if(i - 1 - lowIndex[diff + nums.size()] > maxLength) 
                        maxLength = i - 1 - lowIndex[diff + nums.size()];
                    }
                }
                if(i == nums.size()) return maxLength;
                nums[i] += nums[i - 1];
                i++;
            }
        }
    };
    


  • 0
    C

    Great, thanks! You had me scratching my head for a while at 2.2 when you did not explicitly say U(j) = j - S(j) (because all the elements that are 0 means all the elements all together, except those that are 1.)

    I didn't speed up the O(n) of your solution, but I tightened up, In particular there is no need to check for the special case nums,size() < 2, because it will fall out. Also there is no need to treat the array from nums[0] to nums[i] as special, so we can save a conditional. I put a few repeated expressions in const int or int. Especially the size() method on an array may be implemented as a function call.
    I don't understand why you are storing the tally (key, imbalance) at each position given that this is single pass, so I store it as scalar variable.
    For style I prefer a for loop to a while(true) with a break, and I named some things more explicitly. All together I might be running in half the time as your optimized array solution.

        int findMaxLength(vector<int>& nums) {
            const int nums_size = nums.size();
            const int firsts_size = 2 * nums_size + 2;
            //This will fall out if(nums_size < 2) return 0;        
            int maxLength = 0; 
            // This array holds the first i where we find a given Key
            // The Key is the imbalance. Negative is how many more 0s, Positive is how many more 1s.
            // This is normalized to run from 0 to 2 * nums.size() rather than from -nums.size() to nums.size().
            // Since we are starting at nums[0], any time Key is 0 the subArray from 0 to i is balanced,
            //  and it must be the longest so far because any partial, or any so-far, must be shorter.
            //  But that is also lowIndex[0] so no need to special case it
            // Index of firstTimeWeSaw is nums_size + actual i where it happened
            // Before we add nums[0] is position -1
            // cout << "nums.size() = " << nums_size << "  map firsts size = " << firsts_size << endl;
            int *firstTimeWeSaw = new int[firsts_size];        
            for(int i = 0; i < firsts_size; i++)
                firstTimeWeSaw[i] = -2;
            int i = -1;
            int tally = 0; // KJer stores this in nums[i-1]
            int diff = 2 * tally - i; // 1 means balanced because of how we count from -1
            int key = diff + nums_size;
            firstTimeWeSaw[key] = i; // We are balanced before we start
            // cout << "Before nums[0]  tally = " << tally << " calculated imbalance (1 + 1s - 0s) is " << diff << " key = " << key << endl;
            int length;
    
            for (i = 0; i < nums_size; i++)
            {
                
                //number of 1s in nums[] from 0 to i: tally
                //number of 0s or 1s in nums[] from 0 to i: i
                tally += nums[i];
                // diff = 2 * tally - i; // 1 means balanced because of how we count from -1
                // cout << "i = "<< i << " tally (total 1s) including nums[" << i << "] (" << nums[i] << ") = " 
                //              << tally << " imbalance (1 + 1s - 0s) = " << diff  << endl;
                key = 2 * tally - i + nums_size;
                // Have we seen this key before?
                if(firstTimeWeSaw[key] == -2)
                {
                    // cout << "    This is the first time we saw diff " << diff << " (key = " << key << ")\n";
                    // This is the first time we saw the key
                    firstTimeWeSaw[key] = i;
                }
                else
                {
                    // This key looks familiar
                    // How long since we last saw it? 
                    length = i - firstTimeWeSaw[key];
                    // cout << "    First saw diff " << diff << " was when i was " << firstTimeWeSaw[key] << "  length = " << length << endl;
                    if(length > maxLength) 
                    {
                        // cout << "        New longest length = " << length << " > longest was " << maxLength << endl;
                        maxLength = length;
                    }
                }
            }
            return maxLength;
        }
    

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.