10-lines 76ms Easy C++ Solution (Updated Function Signature)


  • 72

    The function signature has been updated to return a more intuitive vector<vector<string>> which treats a single string as a group of anagrams consisting of only itself.

    The idea is to use an unordered_map to store those strings that are anagrams. We use the sorted string as the key and the string itself as the value. The strings are stored in a multiset since there may be duplicates. Moreover, multiset will sort them by default as we desire.

    The code is as follows.

    class Solution {
    public:
        vector<vector<string>> groupAnagrams(vector<string>& strs) {
            unordered_map<string, multiset<string>> mp;
            for (string s : strs) {
                string t = s; 
                sort(t.begin(), t.end());
                mp[t].insert(s);
            }
            vector<vector<string>> anagrams;
            for (auto m : mp) { 
                vector<string> anagram(m.second.begin(), m.second.end());
                anagrams.push_back(anagram);
            }
            return anagrams;
        }
    };
    

    Update: as suggested by yswu1234 in the answer, general sort takes O(nlogn) time. In this problem, since the string only contains lower-case alphabets, we can write a sorting function using counting sort (O(n) time) to speed up the sorting process. I write a string sorting function strSort below and using it to sort the string achieves the overall running time 72ms for this problem.

    class Solution {
    public:
        vector<vector<string>> groupAnagrams(vector<string>& strs) {
            unordered_map<string, multiset<string>> mp;
            for (string s : strs) {
                string t = strSort(s);
                mp[t].insert(s);
            }
            vector<vector<string>> anagrams;
            for (auto m : mp) { 
                vector<string> anagram(m.second.begin(), m.second.end());
                anagrams.push_back(anagram);
            }
            return anagrams;
        }
    private:
        string strSort(string& s) {
            int count[26] = {0}, n = s.length();
            for (int i = 0; i < n; i++)
                count[s[i] - 'a']++;
            int p = 0;
            string t(n, 'a');
            for (int j = 0; j < 26; j++)
                for (int i = 0; i < count[j]; i++)
                    t[p++] += j;
            return t;
        } 
    };

  • 1

    Thank you for your solution-sharing. I have learnt a lot from your solution as usual.


  • 0

    Could tell me where is the function "sorted the mp.second"? For example input "tea","ate","eat", why your output is ["ate","eat","tea"] rather than ["tea","ate","eat"]? Is it because multiset<string>?
    Thank you in advance.


  • 0

    Yes, I mentioned it in the last sentence of my post:

    multiset will sort them by default as we desire.


  • 10
    Y

    Just a small improvement:

    As all inputs will be in lower-case, we can implement our own sorting algorithm to sort a string in order to speed up this sorting. I implemented one similar to counting sort as follows, and the running time was decreased to 68ms.

    string sortLowercase(string s) {
        int charExist[26] = {0};
        for (int i = 0; i < s.size(); i++) {
            charExist[s[i] - 'a']++;
        }
        string res;
        int j = 0;
        while (j < 26) {
            if (charExist[j] == 0) {
                j++;
            }
            else {
                res.push_back(j + 'a');
                charExist[j]--;
            }
        }
        return res;
    }

  • 0

    Hi, yswu1234. Thank you for sharing this idea of further improvements :-) I try to use your sortLowercase function for sorting the strings and test it several times but it still takes 76ms or even more.

    I rewrite a string sorting function strSort below using counting sort and add some minor optimizations (like passing reference instead of a new copy of the string s, doing pre-allocations). It now takes 72ms.

    string strSort(string& s) { 
        int count[26] = {0}, n = s.length();
        for (int i = 0; i < n; i++)
            count[s[i] - 'a']++;
        int p = 0;
        string t(n, 'a');
        for (int j = 0; j < 26; j++)
            for (int i = 0; i < count[j]; i++)
                t[p++] += j;
        return t;
    }

  • 0
    Y

    Maybe there are some factors affecting the running time that is out of our control :)

    Yeah, your optimizations will surely improve the running time.


  • 3
    F

    Hi! This is my solution. It takes only 60ms. I think the improvement comes from the 'swap' function.

    class Solution {
    public:
        vector<vector<string>> groupAnagrams(vector<string>& strs) {
            unordered_map<string, vector<string>> hashMap;
            for(auto &v : strs) {
                string tmp(v);
                sort(tmp.begin(), tmp.end());
                hashMap[tmp].push_back(v);
            }
            vector<vector<string>> result(hashMap.size());
            int k = 0;
            for(auto it = hashMap.begin(); it != hashMap.end(); ++it, ++k) {
                result[k].swap(it->second);
                sort(result[k].begin(), result[k].end());
            }
            return result;
        }
    };

  • 0

    Hi, fastcode. Thanks for your sharing. Well, I guess replacing multiset with vector also reduces the running time since multiset keeps its elements sorted by key and that will certainly take some time?

    BTW, I try to use move semantics like result[k] = move(it->second); but it is not as fast as your code :-) You may try it as well since the running time for this problem seems to be inconsistent as what yswu1234 has pointed out in his answer.


  • 0

    really nice explaination and careful optimization !


  • 2
    D

    You need not use multiset, because the words needn't been sorted


  • 0
    1

    @daben I don't understand why the order is necessary.....
    The wiki told me
    "
    An anagram is direct word switch or word play, the result of rearranging the letters of a word or phrase to produce a new word or phrase, using all the original letters exactly once
    "


  • 0
    A

    @jianchao.li.fighter The problem is your can't have any letter repeated more than 256 times, otherwise you have a collision.


  • 0
    A

    @fastcode The swap is smart indeed.


  • 1

    @yswu1234 Good idea! Tend to be a follow-up question in interview. Thanks for sharing! Here is a Java version based on counting sort. I comment out the quicksort and create counter array in each call for simplicity.

        public List<List<String>> groupAnagrams(String[] strs) {
            Map<String,List<String>> group = new HashMap<>();
            for (String str : strs) {
                char[] chars = str.toCharArray();
                //Arrays.sort(chars);
                sort(chars);
                
                String key = new String(chars);
                if (!group.containsKey(key))
                    group.put(key, new ArrayList<>());
                group.get(key).add(str);
            }
            return new ArrayList<>(group.values());
        }
        
        private void sort(char[] chars) {
            int[] cnt = new int[26];
            for (char c : chars) cnt[c - 'a']++;
            for (int i = 0, j = 0; i < chars.length && j < 26; ) {
                if (cnt[j]-- > 0) chars[i++] = (char) ('a' + j);
                else j++;
            }
        }
    

  • 0
    J

    @jianchao.li.fighter Great Solution! We can even simplify the last for inside strSort by encoding the string, e.g. aabbb to a2b3, then the inner for can be ignored.


  • 0
    F

    we don't need unordered_map<string, multiset<string>>, unordered_map<string, int> is enough. The latter is used to indicate the index of the pattern string in vector<vector<int>> res

    class Solution {
    public:
        string sortWord(string& word) {
            vector<int> nums(26, 0);
            for (int i = 0; i < word.size(); i++)
                nums[word[i] - 'a']++;
            string res(word.size(), 'a');
            int k = 0;
            for (int i = 0; i < nums.size(); i++)
                for (int j = 0; j < nums[i]; j++)
                    res[k++] += i;
            return res;
        }
        vector<vector<string>> groupAnagrams(vector<string>& strs) {
            unordered_map<string, int> patternId;
            vector<vector<string>> res;
            for (int i = 0; i < strs.size(); i++) {
                string p = sortWord(strs[i]);
                if (patternId.count(p) == 0) {
                    patternId[p] = res.size();
                    vector<string> sol(1, strs[i]);
                    res.push_back(sol);
                } else {
                    res[patternId[p]].push_back(strs[i]);
                }
            }
            return res;
        }
    };
    

  • 0
    F
    This post is deleted!

  • 1
    G

    Great answer! I have made several improvements of your code. The new answer takes 52ms

    1. use auto& rather then auto to avoid unnecessary copy
    2. use std::move() to steal vector from map
    3. use vector.reserve() to avoid reallocate
    class Solution {
    public:
        vector<vector<string>> groupAnagrams(vector<string>& strs) {
            int n = strs.size();
            unordered_map<string, vector<string>> map;
            vector<vector<string>> ret;
            for (const auto& s : strs) {
                string t = s;
                sort(t.begin(), t.end());
                map[t].push_back(s);
            }
            ret.reserve(map.size());
            for (auto& p : map) {
                ret.push_back(std::move(p.second));
            }
            return ret;
        }
    };
    

  • 0
    D

    using your strSort() function idea, I improve you code run 39ms, Hope to get your advice

    class Solution {
    public:
        string strSort(string s) {
            int count[26] = {0}, n = s.length();
            for (int i = 0; i < n; ++i)
                ++count[s[i] - 'a'];
            int p = 0;
            string t(n, 'a');
            for (int j = 0; j < 26; ++j)
                for (int i = 0; i < count[j]; ++i)
                    t[p++] += j;
            return t;
        } 
        vector<vector<string>> groupAnagrams(vector<string>& strs) {
            vector<vector<string>> ret;
            map<string,vector<string>> mp;
            for(auto &str : strs){
                string tmp = strSort(str);
                //sort(tmp.begin(),tmp.end());
                mp[tmp].push_back(str);
            }
            for(auto &m:mp)
                ret.push_back(m.second);
    /*        auto begin = mp.cbegin();
            while(begin != mp.cend()){
                ret.push_back(begin->second);
                ++begin;
            }
    */        return ret;
        }
    };

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.