My Python solution, 286ms, how to make it faster?


  • 1
    J
    class Solution(object):
        def groupAnagrams(self, strs):
            """
            :type strs: List[str]
            :rtype: List[List[str]]
            """
            dic = {}
            for s in strs:
                a = list(s)
                a.sort()
                a = ''.join(a)
                if a not in dic:
                    dic[a] = [s]
                else:
                    dic[a].append(s)
            result = []
            for s in dic:
                dic[s].sort()
                result.append(dic[s])
            return result
    

    Thank you in advance!


  • 2
    N

    Sorting every string could be costly

    use a character count method to calculate the key on the fly

    (which require O(n),while sorting require O(nlogn), where n is the length of string)

    chr_primes= [2,   3,  5,  7,  11,  13,  17,
          19,  23,  29,  31,  37,  41,  43,
          47,  53,  59,  61,  67,  71,
          73,  79,  83,  89,  97,  101]
    
    def getStrSig(s):
          ord_a = ord('a')
          hash = 1
          for c in s:
              hash *= chr_primes[ ord(c) - ord_a ]
          return hash
    

    then use the above function to calulate key in dict

    a = getStrSig(s)
    if a not in dic:
       dic[a] = [s]
    else:
       dic[a].append(s)
    

    for further optimization,

    1、if you want to use dic[s] multiple times, don't just use dic[s] every time

    because every time you use ,it will cost a map search.

    you can store dic[s] in a local variable,

    and use that local variable every time you need.

    2、iterating over a dict is slower than iterating over a list.

    If I was to optimize the code you write , it would be something like this:

    def groupAnagrams(self, strs):
        def getStrSig(s):
            #your hash function 
            pass
        
        dic = {}
        result = []
        for s in strs:
            a = getStrSig(s)
            if dic.has_key(a):
                dic[a].append(s)
            else:
                arr = [s]
                dic[a] = arr
                result.append(arr)
        
       for s in result:
          s.sort()
    
       return result
    

  • 0
    P

    can you explain why using the prime numbers to hash?


  • 0
    N

    It's based on the idea of prime factorization.

    Every number has a unique prime factorization.
    That is for every integer num,

    there exists num = P0^k0 * P1^k1 * P2^k2...Pn^kn, where P0,P1...Pn are different prime numbers, k0,k1...kn are the number of occurrences of each prime factor in the factorization

    For example, "aab" can be represented as 2^2 * 3^1, "baa" can also be respresented as 2^2 * 3^1, hence ,these two are anagrams

    (However,this approach has one major defect, that is the product of these primes grows with the length of string, which could cause "integer overflow".
    But it is no problem with python)


  • 0
    P

    Great explanation. Thank you very much


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.