1-liner in Python, 88ms, with an optimization note


  • 1
    O
    class Solution(object):
        def isScramble(self, s1, s2):
            return s1==s2 or sorted(s1)==sorted(s2) and any(self.isScramble(s1[:i], s2[:i]) and self.isScramble(s1[i:], s2[i:]) or self.isScramble(s1[:i], s2[-i:]) and self.isScramble(s1[i:], s2[:-i]) for i in range(1, len(s1)))
    

    Which is equivalent to:

    class Solution(object):
        def isScramble(self, s1, s2):
            """
            :type s1: str
            :type s2: str
            :rtype: bool
            """
            if s1 == s2:
                return True
            if sorted(s1) != sorted(s2):
                return False
            for i in range(1, len(s1)):
                if self.isScramble(s1[:i], s2[:i]) and self.isScramble(s1[i:], s2[i:]):
                    return True
                if self.isScramble(s1[:i], s2[-i:]) and self.isScramble(s1[i:], s2[:-i]):
                    return True
            return False
    

    I noticed for shorter strings (which is the case here, but not in problem 242 Valid Anagram), sorted(s1) != sorted(s2) is much faster than collections.Counter(s1) != collections.Counter(s2)

    What do you guys think? @StefanPochmann


  • 1

    @o_sharp Yeah, I had noticed the same. Now I did a little test (on my PC, with Python 2.7.11), testing sorted(s) vs Counter(s) for strings s of different lengths. The table shows the number of calls per second:

     length    sorted   Counter ratio
    ---------------------------------
          1 1580287.4  325402.3  4.86
          2 1719492.1  306741.2  5.61
          4 1558405.9  269471.1  5.78
          8 1250145.3  209603.3  5.96
         16  821293.2  150663.9  5.45
         32  490674.4   98769.0  4.97
         64  232293.7   55583.3  4.18
        128   94124.6   33067.8  2.85
        256   36400.9   17857.0  2.04
        512   15713.3    9232.4  1.70
       1024    7287.5    4688.3  1.55
       2048    3495.2    2322.0  1.51
       4096    1754.9    1200.3  1.46
       8192     853.3     588.7  1.45
      16384     422.0     295.3  1.43
      32768     209.0     147.0  1.42
      65536     105.6      70.6  1.50
     131072      51.6      37.2  1.39
     262144      25.6      16.6  1.54
     524288      11.7       8.5  1.37
    1048576       6.4       4.3  1.47
    2097152       3.1       2.2  1.40
    4194304       1.5       1.1  1.35
    8388608       0.8       0.6  1.42
    

    The code:

    import string, random
    from collections import Counter
    from timeit import timeit
    
    print ' length    sorted   Counter ratio'
    print '---------------------------------'
    for e in range(24):
        n = 2**e
        s = ''.join(random.choice(string.ascii_lowercase) for _ in range(n))
    
        t = ctr = 0
        while t < 1:
            t += timeit(lambda: sorted(s), number=ctr+1)
            ctr += ctr + 1
        sorted_ctr  = ctr / t
    
        t = ctr = 0
        while t < 1:
            t += timeit(lambda: Counter(s), number=ctr+1)
            ctr += ctr + 1
        Counter_ctr  = ctr / t
    
        print '%7d %9.1f %9.1f %5.2f' % (n, sorted_ctr, Counter_ctr, 1.0 * sorted_ctr / Counter_ctr)
    

    And test results for sorted(s) == sorted(s) vs Counter(s) == Counter(s):

     length    sorted   Counter ratio
    ---------------------------------
          1  825679.7  165309.6  4.99
          2  555952.0  155285.3  3.58
          4  807594.4  132861.4  6.08
          8  633594.7  100878.7  6.28
         16  432651.3   75552.0  5.73
         32  234392.6   47605.7  4.92
         64  112044.6   28623.3  3.91
        128   47015.2   16190.6  2.90
        256   18590.3    8684.4  2.14
        512    7828.7    4630.1  1.69
       1024    3648.3    2310.6  1.58
       2048    1764.9    1182.5  1.49
       4096     846.8     588.5  1.44
       8192     428.7     295.0  1.45
      16384     210.4     143.6  1.47
      32768     105.0      74.1  1.42
      65536      51.4      37.5  1.37
     131072      25.8      18.3  1.40
     262144      12.8       8.8  1.44
     524288       6.1       4.5  1.37
    1048576       3.1       2.1  1.45
    2097152       1.6       1.1  1.40
    4194304       0.8       0.5  1.39
    8388608       0.4       0.3  1.33
    

  • 0
    O

    @StefanPochmann Awesome! We can see sorted(s) is never slower than collections.Counter on average cases, which is actually in line with my test results of Problem 242 Valid Anagram. They both gave me about 110ms.

    However, if I implement the counter myself in Problem 242, either by collections.defaultdict(int) or by plain simple dictionary, the results are as good as 70ms.

    I can't help but think there is some unnecessary overhead in collections.Counter, as O(n) methods are supposed to beat the crap out of O(n*lg(n)) methods for super long strings.


  • 0

    @o_sharp said in 1-liner in Python, 88ms, with an optimization note:

    O(n) methods are supposed to beat the crap out of O(lg(n)) methods for super long strings.

    You mean out of O(n*lg(n)) methods. And I guess you need to use strings that are more super long :-)

    Also plays a role that we're just using 26 different values here and thus have massive repetition. Here are my results for

    s = range(n)
    random.shuffle(s)
    

    and sorted(s) vs Counter(s):

     length    sorted   Counter ratio
    ---------------------------------
          1 2696527.4  349440.2  7.72
          2 2573732.2  318736.4  8.07
          4 2266664.2  283053.1  8.01
          8 1686039.1  220469.0  7.65
         16 1046608.9  157290.9  6.65
         32  494841.6  100514.5  4.92
         64  210483.6   58693.7  3.59
        128   86904.1   32388.7  2.68
        256   34704.0   16774.0  2.07
        512   14314.6    8722.6  1.64
       1024    6008.0    4463.6  1.35
       2048    2856.7    2159.3  1.32
       4096    1296.5    1117.0  1.16
       8192     583.0     512.7  1.14
      16384     266.4     254.0  1.05
      32768     123.5     125.9  0.98
      65536      57.7      64.7  0.89
     131072      26.1      32.0  0.82
     262144      12.2      13.1  0.94
     524288       5.4       6.0  0.90
    1048576       2.1       2.5  0.82
    2097152       0.9       1.2  0.76
    4194304       0.4       0.6  0.71
    8388608       0.2       0.3  0.62
    

  • 0

    Oh and it probably also plays a significant role that sorted is C, while Counter is Python on top of dict (which I think is also C). Just last night I had fun with an O(n2) algorithm being as fast as (Edit: not true, was a mistake) an O(n log n) algorithm where n was about 9 million! See here.


  • 0
    O

    @StefanPochmann Now that you mentioned it, I started to think the limited alphabet and Python's Timsort play a big role in the reason why sort() runs faster than I expected, as in they can almost guarantee O(n) time for long strings due to repetition. It is evidenced by your test results: sort() runs much slower on bigger alphabet than on 26 letters, while Counter doesn't.

    However, for why collections.Counter has so much overhead, I don't think it makes much sense to say collections.Counter is in Python and dict is in C because writing my own counter in Python using dict runs much faster than collections.Count, which is supposed to be built in the exact same way. I mean there must be some good reason for collections.Counter to be slower than my naive implementation. Or, there better be : )

    Any clue?

    By the way, it's not uncommon in real life that O(n^2) algorithms beats O(n log n) algorithms. My understanding is: O(n^2) implementation that iterates arrays consecutively, if not screwed up, are usually very cache-friendly and conditional-branch-lite. On the other hand, log(n) usually means sorting & binary search or BST or heap or similar stuff. They mean more instructions, more conditional branches, and they are harder to be cache-friendly, but making them cache-friendly usually means becoming even more complicated and therefore unfriendly to small data.


  • 0

    @o_sharp I think it's very uncommon in real life that O(n^2) beats O(n log2 n) when n is 9 million. And actually it was 10 million there. That's a factor 430043 advantage (10000000 / log2(10000000)). Caching and branch prediction and simplicity and C-vs-Python don't make up for nearly that much. I was just a huge idiot there, testing with Python 3 and thus testing the O(1) range.index instead of the O(n) list.index that I thought I was testing. I rewrote that StackOverflow answer now, please check it out again, at least the benchmark results. I btw noticed my mistake when, triggered by your comment, I tried to run the experiment with C++ and the O(n^2) thingy was far slower than in Python. So thank you :-)

    About Counter: Yeah, it has lots of overhead. Even just creating empty ones is slow:

    >>> from timeit import timeit
    
    >>> timeit('{}')
    0.020729978152876978
    
    >>> timeit('dict()')
    0.11448962133957075
    
    >>> timeit('defaultdict()', 'from collections import defaultdict')
    0.10418442565443087
    
    >>> timeit('Counter()', 'from collections import Counter')
    1.136008030943536
    
    >>> timeit('MyClass()', 'class MyClass: pass')
    0.1173687182009644
    

    The last one with MyClass shows that it's not the pure object creation that takes the time, but the initialization. And if you check out Counter's initialization, you'll see that it does a whole lot of stuff. And I still believe that if that were done in C, it wouldn't be so slow. That's what I meant. I'd expect it to be similar to the defaultdict time.

    Similarly, Counter's update, which the initializer uses, does a whole lot of stuff. In Python. Stuff that you probably didn't do when you used dict/defaultdict. And stuff that might be fast again if it were done in C.


  • 0
    O

    @StefanPochmann Thanks for explaining the overheads!

    Another question: doesn't range() give you a generator? How is range.index O(1)?


  • 0

    @o_sharp No, it's not a generator. See this StackOverflow question and its top answers.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.