I've got a solution running one test case in 20ms but still got TLE, so I wonder what the expected running time is for Python. I'm still checking the code though.

```
>> timeit Solution().fourSum([-494,-474,-425,-424,-391,-371,-365,-351,-345,-304,-292,-289,-283,-256,-236,-236,-236,-226,-225,-223,-217,-185,-174,-163,-157,-148,-145,-130,-103,-84,-71,-67,-55,-16,-13,-11,1,19,28,28,43,48,49,53,78,79,91,99,115,122,132,154,176,180,185,185,206,207,272,274,316,321,327,327,346,380,386,391,400,404,424,432,440,463,465,466,475,486,492], -1211)
>> 10 loops, best of 3: 19.9 ms per loop
```

The idea is to cache sum of two numbers in a `dict`

and check if there are two numbers that add up to the remainder using strategy of two sum.

Edit: include by far the fastest code I've gotten

10 loops, best of 3: 158 ms per loop

```
class Solution:
# @return a list of lists of length 4, [[val1,val2,val3,val4]]
def fourSum(self, nums, target):
nums = sorted(nums)
result = set()
for b in xrange(1, len(nums) - 2):
# stores sub_sums for the new num index just moved on
sub_sums = {}
for a in xrange(b):
sub_sums[nums[a] + nums[b]] = [nums[a], nums[b]]
for c in xrange(b + 1, len(nums) - 1):
for d in xrange(c + 1, len(nums)):
remainder = target - nums[c] - nums[d]
if remainder in sub_sums:
result.add(tuple(
sub_sums[remainder] + [nums[c], nums[d]]))
return map(list, result)
```