Solution by awice


  • 0

    Approach #1: Categorize by Sorted String [Accepted]

    Intuition

    Two strings are anagrams if and only if their sorted strings are equal.

    Algorithm

    Maintain a map ans : {String -> List} where each key K is a sorted string, and each value is the list of strings from the initial input that when sorted, are equal to K.

    In Java, we will store the key as a string, eg. "code". In Python, we will store the key as a hashable tuple, eg. ('c', 'o', 'd', 'e').

    Java

    class Solution {
        public List<List<String>> groupAnagrams(String[] strs) {
            if (strs.length == 0) return new ArrayList();
            Map<String, List> ans = new HashMap<String, List>();
    
            for (String s : strs) {
                char[] ca = s.toCharArray();
                Arrays.sort(ca);
                String key = String.valueOf(ca);
                if (!ans.containsKey(key)) ans.put(key, new ArrayList());
                ans.get(key).add(s);
            }
            return new ArrayList(ans.values());
        }
    }
    

    Python

    class Solution(object):
        def groupAnagrams(self, strs):
            ans = collections.defaultdict(list)
            for s in strs:
                ans[tuple(sorted(s))].append(s)
            return ans.values()
    

    Complexity Analysis

    • Time Complexity: $$O(NK \log (K) )$$, where $$N$$ is the length of strs, and $$K$$ is the maximum length of a string in strs. The outer loop has complexity $$O(N)$$ as we iterate through each string. Then, we sort each string in $$O(K \log K)$$ time.

    • Space Complexity: $$O(N*K)$$, the total information content stored in ans.


    Approach #2: Categorize by Count [Accepted]

    Intuition

    Two strings are anagrams if and only if their character counts (respective number of occurrences of each character) are the same.

    Algorithm

    We can transform each string s into a character count count, consisting of 26 non-negative integers representing the number of 'a''s, 'b''s, 'c''s, etc. We use these counts as the basis for our hash map.

    In Java, the hashable representation of our count will be a string delimited with '#' characters. For example, "abbccc" will be "#1#2#3#0#0#0...#0" where there are 26 entries total. In python, the representation will be a tuple of the counts. For example, "abbccc" will be "(1, 2, 3, 0, 0, ..., 0)", where again there are 26 entries total.

    Java

    class Solution {
        public List<List<String>> groupAnagrams(String[] strs) {
            if (strs.length == 0) return new ArrayList();
            Map<String, List> ans = new HashMap<String, List>();
            int[] count = new int[26];
            for (String s : strs) {
                Arrays.fill(count, 0);
                for (char c : s.toCharArray()) count[c - 'a']++;
                
                StringBuilder sb = new StringBuilder("");
                for (int i = 0; i < 26; i++) {
                    sb.append('#');
                    sb.append(count[i]);
                }
                String key = sb.toString();
                if (!ans.containsKey(key)) ans.put(key, new ArrayList());
                ans.get(key).add(s);
            }
            return new ArrayList(ans.values());
        }
    }
    

    Python

    class Solution:
        def groupAnagrams(strs):
            ans = collections.defaultdict(list)
            for s in strs:
                count = [0] * 26
                for c in s:
                    count[ord(c) - ord('a')] += 1
                ans[tuple(count)].append(s)
            return ans.values()
    

    Complexity Analysis

    • Time Complexity: $$O(N * K)$$, where $$N$$ is the length of strs, and $$K$$ is the maximum length of a string in strs. Counting each string is linear in the size of the string, and we count every string.

    • Space Complexity: $$O(N*K)$$, the total information content stored in ans.


  • 0

    Edit: it's fixed now.

    Your second Java "solution" is bad. It for example claims that "a" and "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb" are anagrams of each other (for input ["a", "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"] it returns [["a","bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"]]).


  • 0

    @StefanPochmann Thanks for the correction, I updated the code. I thought that Arrays.hashCode would have a reasonable hash that did not collide so often, but I reviewed how it is implemented now.


  • 0

    Edit: it's fixed now.

    @awice Well, even if that hashing algorithm weren't so simple, it would still only give you an int and thus likely collisions after only about 65536 strings (birthday paradox). And since 267 > 232, I could also definitely find two different strings just seven letters long which have the same hash code.

    Your new version is about equally easy to break, for example you now fail this input:
    ["aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"]

    Your key for both is "9719899100101102103104105106107108109110111112113114115116117118119120121122". For the "a"-string it starts with "971"+"98" and for the "b"-string it starts with "97" + "198".


  • 0

    @StefanPochmann All good points. As for the hash, I meant to delimit each part. I think everything should be good now.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.