C++ O(n) solution without sort()


  • 44
    R
    class Solution {
    public:
        string frequencySort(string s) {
            unordered_map<char,int> freq;
            vector<string> bucket(s.size()+1, "");
            string res;
            
            //count frequency of each character
            for(char c:s) freq[c]++;
            //put character into frequency bucket
            for(auto& it:freq) {
                int n = it.second;
                char c = it.first;
                bucket[n].append(n, c);
            }
            //form descending sorted string
            for(int i=s.size(); i>0; i--) {
                if(!bucket[i].empty())
                    res.append(bucket[i]);
            }
            return res;
        }
    };
    

  • 0
    P

    So smart guy!!!! I like this solution. Amazing idea of counting sort.


  • 3
    R

    The idea is great to know.
    However you might want to mention both solutions to interviewer.
    Since this solution requires a lot of memory. the vector can be really large if the input string is big, while the sort solution has stable container size.
    Great idea though.


  • 0
    N

    Such a Smart solution!!!


  • 0
    A
    This post is deleted!

  • 5
    B

    If s is extremely long, this method costs too much memory.
    If s is long, the time cost is reading string instead of sorting frequencies, because sort frequencies is O(nlogn), where n <= # of all char.


  • 0
    C
    This post is deleted!

  • 1
    R

    I ran your code and compare with mine, I don't think it's more efficient compared to these with sort().
    since when we sort, the n in O(nlogn) are not the number of characters but the number of different characters, since the case we are going to deal with is with a lot of repetition, it's not so bad.
    At the same time. the string object in this environment takes 32 bytes. Think about what if the s.size() is very big. the vector<string> bucket will be a disaster.


  • 8
    W

    It is true that the solution will cost a lot of memory when the string is long. A viable solution to bypass it would be to use a second map<int, string>, whose first element is the occurrence of the characters. Since "map" will sort automatically using its key, the final result hence can be obtained by traversing the map backward. Surprisingly this improvement can cut the runtime by half.

        string frequencySort(string s) {
            unordered_map<char, int> freqOfChar;
            for(char c : s) {
                freqOfChar[c] ++;
            }
            
            map<int, string> dupStr;
            for(auto v : freqOfChar) {
                char c = v.first;
                int n = v.second;
                dupStr[n] += string(n, c);
            }
            
            string res;
            for(auto rit = dupStr.rbegin(); rit != dupStr.rend(); ++rit) {
                res += rit->second;
            }
            return res;
        }
    

  • 0
    R

    @watanuo Nice optimization! I think this resolve the huge bucket concern as guys mentioned above.

    Seems I was too careless. Just realized the time complexity(worst case) will become O(nlogn) if map is used. Since each insertion to map takes O(logn), in worst case, there will be n elements need to be push, which takes O(nlogn).

    I am not sure if there is other data structure could save creating auxiliary buckets without sorting. Otherwise, I guess this bucket sort solution is just an alternative way to trade off space for speed.


  • 0

    @RushRab Actually, the excessive usage of space happens in your non-sorting solution only because the vector<string> bucket was allocated too much space for fixed length s.size()+1 regardless of possibly very uneven char frequency distribution of s.

    For the O(N) time complexity without sacrificing for excessive usage of space, in fact, another OJ problem 432 All O`one Data Structure gives the hint. You could use the following two containers:

    1. list<pair<int, unordered_set<char>>> charsByFreq to save chars with same frequency as a "node" in a list sorted by frequency;
    2. unordered_map<char, list<pair<int, unordered_set<char>>::iterator> pos to simply give a quick look-up table while looking for chars in charsByFreq.

    So a single pass to scan chars in s will build both containers in O(N) time. The code might be long, but it is just container manipulation and operation. There are already many excellent solutions under the discussion session of that problem if you are interested.

    Note both containers only use space of number of distinct chars in s eventually after a complete scan of s, so they will not use a fixed space of s.size().


  • 0

    @watanuo I had very similar solution to yours using map to sort frequency. But l think the map only needs to save distinct chars with the same frequency as values instead of the full repetition of characters, which could cut down the space/time usage. Only the final result string to build has the full length.


  • 0
    A

    nice idea man


  • 0
    A

    Is there any reason against dumping the results from the hash table into a bucket of pairs <int,char>. Then sort the bucket, which should be constant time as there are only so many ascii characters. Then we can append to res in order of char * int.

    Still seems like an O(n) result without a huge bucket of empty strings unless I'm missing something.


  • 0

    @andychen152 said in C++ O(n) solution without sort():

    Is there any reason against dumping the results from the hash table into a bucket of pairs <int,char>. Then sort the bucket, which should be constant time as there are only so many ascii characters. Then we can append to res in order of char * int.

    Still seems like an O(n) result without a huge bucket of empty strings unless I'm missing something.

    Actually, nothing against. As you mentioned, it is better to use bucket of pairs <int,char> to sort rather than the original post where vector<string> bucket(s.size()+1, "") costs way too much memory if s is long. You can read previous posts in this thread as many fellow coders have tested that the solution in the original posts actually wasn't that efficient due to memory usage.


  • 0
    N

    @watanuo nice solution. I basically used the same idea. Just want to share mine, which is lengthy but might save more memory.

    class Solution {
    public:
        string frequencySort(string s) {
            map<char,int> char_count;
            for (char c : s )
                char_count[c]++;
            map<int,set<char>> order_count;
            for (auto c_n : char_count){
                char key = c_n.first;
                order_count[char_count[key]].insert(key);
            }
            string ans;
            for ( auto r_itr = order_count.rbegin(); r_itr != order_count.rend(); r_itr++ ){
                for( char c : r_itr->second){
                    for( int i = 0; i < r_itr->first; i++ ) {
                        ans.insert(ans.end(), (char)c);
                    }
                }
            }
            return ans;
        }
    };
    
    

  • 0

    That's a quite nice solution, thank you for sharing.
    but I am thinking that int for(auto& it:freq). the type of it is actually not an iterator, but pair<char, int>.
    (just thinking a name like it will make the interviewer confused.


  • 0
    J

    @RushRab I like this solution ,very good!!!!


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.