Python O(n lg n) and O(d) solutions


  • 0
    G

    This solution includes both the sorting and hash table methods, and it's smart enough to distinguish when to use which method to maximize efficiency. For some reason, I still get a really slow time in the OJ though. I'd appreciate any insight as to why!

    class Solution(object):
        def arrayPairSum(self, nums):
            """
            :type nums: List[int]
            :rtype: int
            """
            if len(nums) <= 5363:
                # In this case, the O(len(nums) log(len(nums))) solution is faster, given a potential 20001 value spread
                nums.sort()
                sum = 0
                for i in range(len(nums)):
                    if i % 2 == 0:
                        sum += nums[i]
                return sum
                
            else:
                # In this case, the O(d) hash table solution is faster, given a potential 20001 value spread (i.e. d)
                nums_hash = {}
                for i in range(-10000, 10001):
                    nums_hash[i] = 0
                for num in nums:
                    nums_hash[num] += 1
                count = 0
                sum = 0
                for i in range(-10000, 10001):
                    while nums_hash[i] != 0:
                        if count % 2 == 0:
                            sum += i
                        nums_hash[i] -= 1
                        count += 1
                        if count == len(nums):
                            return sum
    

  • 1

    What exactly makes you think that 5363 is a smart switch point? In my local testing I see that at this size, the second method is still a lot slower:

    >>> import random, timeit
    >>> for n in range(5361, 5367):
            nums = random.sample(range(-10000, 10001), n)
            print n, timeit.timeit(lambda: Solution().arrayPairSum(list(nums)), number=1000)
    
    5361 1.48419183077
    5362 1.46775913496
    5363 1.48114897265
    5364 4.51884381608
    5365 4.45902357516
    5366 4.47234039868
    

    Even at the maximum allowed n, the second method is still a lot slower. I changed your 5363 to 9997 and got this:

    >>> for n in range(9995, 10001):
            nums = random.sample(range(-10000, 10001), n)
            print n, timeit.timeit(lambda: Solution().arrayPairSum(list(nums)), number=1000)
    
    9995 2.96072288237
    9996 2.94667080914
    9997 2.99135728539
    9998 6.17817711556
    9999 6.19995547037
    10000 6.20504092249
    

    Btw, it would be simpler and faster (and cooler :-) to use a list instead of a dictionary: nums_hash = [0] * 20001. With that, I get these times:

    9998 3.72009040537
    9999 3.71792552014
    10000 3.69450990514
    

  • 0
    G

    Thanks for the analysis!

    The reason I chose 5363 is because it was the point I calculated where len(nums) * log(len(nums)) > 20001, i.e. where the sort method should theoretically take longer than the hash table solution. But now I realize that calculation assumed log base 10, when maybe log base 2 would be more accurate? In that case, the number should be even smaller, closer to 1844.

    However, maybe the real problem here is that I'm not accounting for the constant factors / overhead involved between these two methods. Perhaps that's why the sort method is indeed faster for most values? Maybe if there were, e.g. 1,000,000 numbers, the hash table method would be quicker because the efficiency would make up for the constant factor difference?

    Your list solution is cool, but again I'm a bit confused as to why it's faster than the hash table (also, why it's "cooler" haha). Is this again a question of overhead / constant factors between using a dictionary vs. using a list? I guess the list doesn't require the hash function to be run each time, and that's why it's faster?


  • 2

    @GravityGrave Yeah, the constant factors are ignored in big-O analysis, but when you want to find a critical point like that, you can't ignore them. Especially when the codes are so different.

    A hash is just more complicated (like using the hash function, as you said), which comes at extra cost. List is as simple as it gets. I called it "cooler" because of using negative indexes in the list, which you can't do in all languages and which not everybody might think of.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.