O(M * N) algorithm using hash, without sort()


  • 34
    W

    Assign a prime number for a to z, and then multiply all prime numbers together to form a hash value.

        private static final int[] PRIMES = new int[]{2, 3, 5, 7, 11 ,13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 107};
        
        public List<String> anagrams(String[] strs) {
            List<String> list = new LinkedList<>();
            Map<Integer, List<String>> mapString = new HashMap<>();
            int result = -1;
            for (int i = 0; i < strs.length; i++){
                int mapping = 1;
                for (int j = 0, max = strs[i].length(); j < max; j++) {
                    mapping *= PRIMES[strs[i].charAt(j) - 'a'];
                }
                List<String> strings = mapString.get(mapping);
                if (strings == null) {
                    strings = new LinkedList<>();
                    mapString.put(mapping, strings);
                }
                strings.add(strs[i]);
            }
            for (List<String> mapList : mapString.values()){
                if (mapList.size() > 1)
                    list.addAll(mapList);
            }
            return list;
        }

  • 1
    W

    Yes I am also searching for a method that avoid the O(n log n) cost of sorting a string. This is the same hashing idea in C++:

    class Solution {
    public:
        typedef unsigned long long ll;
        vector<string> anagrams(vector<string>& strs) {
            vector<string> rtn;
            table={2, 3, 5, 7, 11 ,13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 107};
            unordered_map<ll, vector<size_t>> table;
            for(size_t i=0; i!=strs.size(); i++){
                size_t hv = hashf(strs[i]);
                if(table.count(hv)==0) table[hv]={i};
                else table[hv].push_back(i);
            }
    
            for(auto itr = table.begin(); itr!=table.end(); itr++){
                if(itr->second.size()>1){
                    for(size_t j = 0; j!=itr->second.size(); j++) rtn.push_back(strs[itr->second[j]]);
                }
            }
            return rtn;
        }
    private:
        ll hashf(const string& s){
            ll v=1;
            for(size_t i=0; i!=s.size(); i++){
                v*=table[s[i]-'a'];
                v%=1000000000000000007;
            }
            return v;
        }
        vector<ll> table;
    };

  • 2
    P

    hey! I just want to know why you use the magic number 1000000000000000007?


  • 0
    L

    may overflow?


  • 4
    P

    multiply prime numbers is a good idea, but I use another way which is more simple but use extra space though.

    private static class ArrayKey {
        int[] array = new int[26];
    
        @Override
        public int hashCode() {
            return Arrays.hashCode(array);
        }
    
        @Override
        public boolean equals(Object obj) {
            return Arrays.equals(array, ((ArrayKey)obj).array);
        }
    }
    
    public List<String> anagrams(String[] strs) {
        HashMap<ArrayKey, Integer> hashMap = new HashMap<ArrayKey, Integer>();
        List<String> res = new ArrayList<String>(strs.length);
    
        for (int i = 0; i < strs.length; i++) {
            String s = strs[i];
            ArrayKey key = new ArrayKey();
            for (int j = 0; j < s.length(); j++) {
                (key.array[s.charAt(j) - 'a'])++;
            }
    
            Integer prev = hashMap.put(key, i);
            if (prev != null) {
                if (strs[prev] != null) {
                    res.add(strs[prev]);
                    strs[prev] = null;
                }
    
                if (strs[i] != null) {
                    res.add(strs[i]);
                    strs[i] = null;
                }
            }
        }
    
        return res;
    }

  • -2
    D

    "aaa" and "bb" ?


  • 0
    H

    I also concern about overflow issue.


  • 0

    Definitely overflows for longer strings("yyyyyyyyyyyyyy"). Not a good idea. But thanks, I couldn't think about this myself. :)


  • 0
    K

    I don't see how this keeps the elements in sublists in lexicographic order. Also, the algorithms seems incorrect because overflows could lead to collisions, you should add an equal comparator for the hashtable.


  • 1

    Good point. But your hashing function take O(n), I think using radix sort could achieve O(n) also and there won't be a overflows.


  • 11
    M

    How feasible is prime number multiplication for large inputs. For strings of size 20 or 50 or 100. Here is a O(M*N) solution without sort and without costly prime multiplication. Most solutions use sort, my idea is to use a counter instead of sort. My hash construction is constant time and does not change with input size, O(26).

    public List<List<String>> groupAnagrams(String[] strs) {
            
            HashMap<String,List<String>> hm = new HashMap<String,List<String>>();
            
            for(int i=0; i<strs.length; i++){
                String eachString = strs[i];
                
                //only lower-case letters. so array of size 26 is enough
                int[] counter = new int[26];
                
                //Iterate the string and increment corresponding index
                //char - 'a' , so the index will be between 0 and 25
                for(int j=0; j<eachString.length(); j++)
                    counter[eachString.charAt(j)-'a']++;
                
                //Now iterate thorugh the counter array and construct the hash 
                //Eg: cat becomes 1a1c1t. caabbt becomes 2a2b1c1t
                StringBuilder sb = new StringBuilder("");
                for(int j=0; j<26; j++){
                    if(counter[j]>0){
                        sb.append(counter[j]);
                        sb.append((char) ('a'+j));
                    }
                }
                
                String hash = sb.toString();
                
                //Chech if an entry exists in hash table already, else add a new one
                if(!hm.containsKey(hash))
                    hm.put(hash,new LinkedList<String>());
                    
                //Add this string to the list pointed by hash
                hm.get(hash).add(eachString);
            }
            
            return new LinkedList<List<String>>(hm.values());
        }
    

  • 0
    A

    Anyone knows where does the magic number 1000000000000000007 come from??


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.