From an intuitive recursive solution (4ms) to an optimized backtracking solution (0ms) in C, both are well-explained


  • 6

    Because we are required to remove the duplicates and still search for the lexicographical smallest, we can then follow the following steps to solve it:

    • first use a counter array to count the frequency for each letter that appears in the string;
    • in this recursive method, we will search for the smallest valid letter in each recursion - validity here means that selecting this letter will not exclude others which can be ensured by the counter we initialized before;
    • after selecting the current valid smallest letter, we then need to prepare a substring for the recursion - removing all the letters before this selected letter and remove all the same letter after this selected letter; why? since we have already get the valid smallest letter without excluding any other letter, the letters before this is useless and invalid now and since we have store this selected letter, the selected letter should also not be allowed to appear in the next recursion, right? Solved simply and directly!

    To save space cost, I used an assistant method to store the result string using a int pointer to record its size.

    • Space cost O(1)
    • Time cost O(n) - since there are at most 26 different letters, the recursions will be limited to O(1).

    //selecting the valid smallest, remove the invalid
    //and then select the next smallest valid;
    void helper(char* s, char* t, int* returnSize)
    {
        int len = strlen(s);
        if(!len) return ;
        int count[26] = {0};
        int index = 0;
        for(int i = 0; i < len; i++) count[s[i]-'a']++;
        for(int i = 0; i < len; i++)
        {
            if(s[i] < s[index]) //search for the smallest;
                index = i;
            if(!(--count[s[i]-'a'])) //ensure every letter appears at least once, do not exclude any letter;
                break;
        }
        char c = s[index];
        *returnSize += 1;
        t[*returnSize-1] = c;
        int newIndex = 0;
        for(int i = index+1; i < len; i++) //reconstructing the left substring by removing the collected character and the letters before the selected since they are invalid after selecting the letter in 'index' position;
            if(s[i] != c)
                s[newIndex++] = s[i];
        s[newIndex] = '\0';
        helper(s, t, returnSize);
    }
    
    //AC - 4ms - quite intuitive solution selecting the smallest one by one;
    char* removeDuplicateLetters(char* s)
    {
        char* t = (char*)malloc(sizeof(char)*27);
        int size = 0;
        helper(s, t, &size);
        t[size] = '\0';
        return t;
    }
    

    Apart from the above recursion solution, there is also a backtracking one which comparably more efficient eliminating redundant traversal for counter or something like that. Also there is a counter for avoiding removing essential letters, an array used to record the status (stored or not - in stack or not) and a char stack to store the valid letter which will be returned as the result. The steps are as follows:

    • initialize the array counter;
    • decrease the count for each corresponding letter while traversing the string ensure the unknowingly exclusion will not happen;
    • if the current letter is already stored in the stack, we just continue the loop letting the follow-up letters to handle it if there is something wrong with the position;
    • if the current letter is not stored then we have to where it should be placed in the stack - checking the value of the letter and counter at the same time - we have to ensure the lexicographical order, remember? but at the same time we have to ensure every unique letter is included that's why we need to check the counter - if the counter is bigger than zero, then there will others of the letter in the un-traversed part, right?
    • store the letter and update its status (stored now).

    Compared with the above recursive solution, this one is so economic - almost waste nothing in space and time! Though pitifully they share the same representation of cost ##

    • Space cost O(n)
    • Time cost O(n)

    //AC - 0ms - using counter to ensure including each letter
    //using visited to accelerate the checking process;
    char* removeDuplicateLetters0(char* s)
    {
        int count[26] = {0};
        bool visited[26] = {0};
        int len = strlen(s);
        for(int i = 0; i < len ;i++) count[s[i]-'a']++; //count the frequency of each character;
        char *t = (char*)malloc(sizeof(char)*27); //used to store the result string;
        int size = 0; //used to indicate the length of the result string;
        for(int i = 0; i < len; i++)
        {
            int index = s[i]-'a';
            count[index]--;
            if(visited[index]) continue; //stored already, needless to check the result string;
            int j = size-1; //not visited before, check it in the result string;
            for(; j > -1; j--)
            {
                int index = t[j]-'a';
                if(s[i] < t[j] && count[index]) visited[index] = false; //if the character t[j] is bigger and there is still such character in the remaining substring, to keep lexicographical order, we have to remove it from the result string and meantime reset visited records;
                else break;
            }
            size = j+1; //update the size of the result string;
            t[size++] = s[i];
            visited[index] = true;
        }
        t[size] = '\0';
        return t;
    }

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.