Sliding window, similar to finding longest substring with k distinct characters


  • 45
    H

    The problem says that we can make at most k changes to the string (any character can be replaced with any other character). So, let's say there were no constraints like the k. Given a string convert it to a string with all same characters with minimal changes. The answer to this is

    length of the entire string - number of times of the maximum occurring character in the string

    Given this, we can apply the at most k changes constraint and maintain a sliding window such that

    (length of substring - number of times of the maximum occurring character in the substring) <= k

    class Solution {
    public:
        int characterReplacement(string s, int k) {
            vector<int> counts(26, 0);
            int start = 0;
            int maxCharCount = 0;
            int n = s.length();
            int result = 0;
            for(int end = 0; end < n; end++){
                counts[s[end]-'A']++;
                if(maxCharCount < counts[s[end]-'A']){
                    maxCharCount = counts[s[end]-'A'];
                }
                while(end-start-maxCharCount+1 > k){
                    counts[s[start]-'A']--;
                    start++;
                    for(int i = 0; i < 26; i++){
                        if(maxCharCount < counts[i]){
                            maxCharCount = counts[i];
                        }
                    }
                }
                result = max(result, end-start+1);
            }
            return result;
        }
    };
    

  • 12

    Nice solution! But a little advice, this step:

    for(int i = 0; i < 26; i++){
        if(maxCharCount < counts[i]){
            maxCharCount = counts[i];
        }
    }
    

    is not necessary, because when the sliding window shrinks, the counts array won't get larger. So basically maxCharCount never be updated in this loop. Correct me if I'm wrong :)


  • 2
    S

    @vesion I think this is necessary and i can't find a better way than the posted code. When the window shrinks, macChar can change so maxCharCount must be updated in every shrink step.


  • 0
    Y

    @vesion I don't think so,since when you shrink, the character that has the maximum count might change, say
    if you shrink two twice from the beginning, the most recent will change from A to B, so you do need that update. Do you agree?


  • 4
    M

    Thanks! Here is the Java version based on the solution provided. When we move the start pointer (j), we only need to update maxCnt if the corresponding count is originally maxCnt.

    public int characterReplacement(String s, int k) {
        if (s == null || s.length() == 0) return 0;
        int j = 0;
        int i = 0;
        int[] cnt = new int[26];
        int maxCnt = 0;
        int maxLen = 0;
        while (i < s.length()) {
            if (cnt[s.charAt(i) - 'A']++ >= maxCnt) maxCnt = cnt[s.charAt(i) - 'A'];
            while (i - j + 1 - maxCnt > k) {
                if (cnt[s.charAt(j++) - 'A']-- == maxCnt) {
                    maxCnt--;
                    for (int l = 0; l < 26; l++) {
                        if (cnt[l] > maxCnt)
                            maxCnt = cnt[l];
                    }
                }
            }
            maxLen = Math.max(maxLen, i - j + 1);
            i++;
        }
        return maxLen;
    }

  • 0
    B

    I don't think it will affect the result if you delete the inside loop. Here's my reasoning:

    when the program jumps out of the while loop:

    while(end-start-maxCharCount+1 > k) {...}
    

    it is a guarantee that end-start-maxCharCount+1 <= k, which is clear. So end-start+1 <= maxCharCount+k. So even if we don't update the maxCharCount everytime the window shrinks, the result won't change and should be correct.

    However, in this way, the meaning of the variable maxCharCount will be changed and not easy to understand. Can someone give it a better name?


  • 1
    V

    Nice solutions, however, I don't think we need the while loop here.

    while(end-start-maxCharCount+1 > k) {
    

    instead, simple if would do. Since at each iteration, end increases by 1 and maxCharCount either remains same or gets increased by 1. So end-start-maxCharCount+1 can at max become k+1 which would come down to k in first trimming itself. But again, implementation of while and if are just JZ and JG in hardware assembly so having one or other doesn't cost much.


  • 13
    G

    @vesion Yes, I know what you mean. The for loop didn't update the maxCharCount since it can't be smaller than any element in the counts. We should set maxCharCount as 0 before the for loop.

    However, the original code works. Even more, we can change the while loop as a if clause and delete the while loop. Change it to

        int characterReplacement(string s, int k) {
            vector<int> count(26, 0);
            int max_len = 0, max_occur = 0, start = 0;
            for (int end = 0; end < s.size(); end++) {
                char ch = s[end] - 'A';
                count[ch]++;
                max_occur = max(max_occur, count[ch]);
                if (end - start - max_occur + 1 > k) {
                    count[s[start] - 'A']--;
                    start++;
                }
                max_len = end - start + 1;
            }
            return max_len;
        }
    

    Because when we get a result, we don't need to calculate a new result worse than the first one. So the max_len will increment only.


  • 0
    T

    @googleguang agree with you. The "while" condition can be changed to "if', and inside for loop is not necessary. These will pass the testcase. But, during the process of sliding window, there exists wrong substring even though the substring length is the same. The "while" loop guarantee the substrings are all correct. For example, AABAABA, k = 1;


  • 0
    T

    @googleguang The interesting thing is that using "while" loop cost less time than "if" loop


  • 0
    J

    I want to know the time complexity of this method. I think it is O(2*n) and we can remove coefficient, Thus O(n);


  • 2
    Z

    @googleguang
    I still not very sure why we don't need to update max_occur inside the if statement? since if we shrink the window by 1 (i.e start++), it's possible that max_occur also decrease by 1.
    Could you elaborate more?


  • 3
    G

    @zzwcsong Sure, let's think about it separately by whether s[start] is the majority character.

    1. If s[start] is not the majority character, max_occur won't decrease if we execute start++, right? So we don't need to update max_occur in this case.
    2. if it is the majority character, we won't get a better result if the new max_occur doesn't exceed the former one. Please notice the window size is increased only because we end++ every time. So if the new character coming into the window (s[end+1]) is not the majority character, we won’t get a qualified sequence since we introduce a new character. So we will only get a new result if the majority character count increase further or there comes a new majority character and its count exceeds the former one.

  • 1
    Z

    @googleguang Thanks for your detailed explanation. I think I understand now. so in this implementation, max_occur is actually not the maximum occurrence of a character in current window. (This is why I get quite confused)
    But it actually doesn't matter, since the window size won't decrease in each iteration, if a majority character gone and come with another non-majority character (which maximum occurrence of a character in current window does decrease), even if we don't update max_occur, our condition check (end - start - max_occur + 1 > k) still hold.
    For example, AABCDFEG , k = 1.
    When we shift our window from AABC to ABCD, in ABCD, it keeps max_occur = 2, but (end - start - max_occur + 1 > k) still hold.

    So here max_occur is the maximum occurrence of a character in all substring we see, right? since it never decrease.


  • 3
    G

    @zzwcsong Yes, exactly. The parameter name max_occur here is not good. It should be something like max_majority_occur_till_now.


  • 0
    W

    so clear! really clever mind!


  • 0
    H

    If we take the charset size Σ into consideration, then the complexity of this solution would be O(). A better way for larger Σ is to maintain amultiset< pair<int, char> > rev where the first element represents the count of character and the second stands for the character, which is sorted by first as first key and second as second key. Then every time we also need to update rev, and looking up the char with maximal occurrence is simply rev.rbegin()->first. This algorithm would be O(nlogΣ)

    class Solution {
        struct Cmp
        {
            bool operator() (const pair<int, int>& a, const pair<int, int> &b) const
            {
                return a.first != b.first ? a.first < b.first : a.second < b.second;
            }
        };
    public:
        int characterReplacement(string s, int k) {
            int cnt[26];
            memset(cnt, 0, sizeof(cnt));
            multiset <pair<int, int>, Cmp> rev; // cnt, idx
            for (int i=0; i<26; ++i)
            {
                rev.insert(make_pair(0, i));
            }
            
            queue<char> q;
            size_t ans = 0;
            
            for (size_t i=0; i<s.size(); ++i)
            {
                char c = s[i];
                q.push(c);
                auto it = rev.find(make_pair(cnt[c - 'A'], c - 'A'));
                rev.erase(it);
                
                ++cnt[c-'A'];
                rev.insert(make_pair(cnt[c-'A'], c-'A'));
                
                while (q.size() - rev.rbegin()->first > k)
                {
                    char popC = q.front();
                    q.pop();
                    auto it = rev.find(make_pair(cnt[popC - 'A'], popC - 'A'));
                    rev.erase(it);
                    --cnt[popC - 'A'];
                    rev.insert(make_pair(cnt[popC - 'A'], popC - 'A'));
                }
                ans = max(ans, q.size());
            }
            return ans;
        }
    };
    

  • 0
    O

    The same idea. Concise 8-liner in Python

    class Solution(object):
        def characterReplacement(self, s, k):
            d, lo, ret = collections.defaultdict(int), 0, 0
            for hi,c in enumerate(s):
                d[c] += 1
                while lo<hi and hi-lo+1-max(d.values())>k:
                    d[s[lo]] -= 1
                    lo += 1
                ret = max(ret, hi-lo+1)
            return ret
    

  • 1
    C

    @harunrashidanver nice solution! I took your idea and wrote it in Java

        public int characterReplacement(String s, int k) {
            int[] cArr = new int[26];
            int maxCount = 0, start = 0, maxSize = 0;
            
            for(int end = 0; end < s.length(); end ++){
                cArr[s.charAt(end) - 'A']++;
                maxCount = Math.max(maxCount, cArr[s.charAt(end) - 'A']);
                
                // if max character frequency + distance between start and end is greater than k
                // this means we have considered changing more than k charactres. So time to shrink window
                if(end - start + 1 - maxCount > k){
                    cArr[s.charAt(start) - 'A']--;
                    start ++;
                }
                
                maxSize = Math.max(maxSize, end - start + 1);
            }
            
            return maxSize;
        }
    

  • 2
    T

    @zzwcsong I still don't get it.
    if the left character is the maxCnt character, the maxCnt will be decreased by 1. why does the the formula( end - start - max_occur + 1 > k) still makes sense? what is the purpose of the formula?

    suppose AABCDA k= 2, formula: end-start-maxCnt +1 <= k

    1. we have AABC, start:0, end: 3, maxCnt = 2(AA), k=2. the formula is valid.
    2. slide right: AABCD: start:0, end: 4 maxCnt =2(AA), k=2, the formula is invalid.
    3. shift left: ABCD, start:1, end:4
      • in my understanding, maxCnt = 1(A), k=2, the formula is invalid.
      • in your understanding, maxCnt = 2(no changed) , the formula is valid.

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.