speedup trick by tuning divisor order


  • 0
    G

    My implementation is similar to most of the posted solution.
    '''
    if num <= 0: return False
    divisor_list = [30, 5, 3, 2] # or [2, 3, 5]
    for x in divisor_list:
    while num % x == 0:
    num /= x
    return num == 1
    '''
    But it seems that there is not much detailed discussion on tuning the divisor list. In fact, by improving divisors and their order, the runtime might be doubled or more. Here is my explanation:

    When above code takes long runtime and many loop, the main reason would be because the loop of "/" and "%" operation. Assuming:
    num = (2^a) * (3^b) * (5^c) * <1 or k>,
    where k cannot be divided by 2, 3, 5.

    Therefore, before getting out of the loop, the "/" and "%" operations need to be executed for (a+b+c) times.

    if set([x, x+y, x+y+z]) == set([a, b, c]), where, x = min(a, b, c), (x+y+z) = max(a, b, c) and y >0
    if using order of [30, 5, 3, 2], the number of loops will be reduced from a+b+c = 3x+2y+z to x+2y+z+1,
    the last 1 is for the penalty of 1 "%" operation as the quotient after "/30" can only be devided by up to two numbers of [2, 3, 5]

    I reversed [2, 3, 5] to [5, 3, 2] is to scale down the num faster during the loop.

    By playing the speedup trick of tuning divisor order, my python submission constantly reduced the runtime by 50%.

    I also tested divisor order with [810000, 900, 30, 15, 10, 6, 5, 3, 2], and [30, 15, 10, 6, 5, 3, 2]. I didn't see much difference on runtime.


  • 0

    This is pretty bad.

    Yes, including 30 in the divisors does save a little for 1 in 30 numbers, but it costs a little extra for every number. Overall, that makes it slower.

    Reversing [2, 3, 5] to [5, 3, 2], as far as I know, makes absolutely no difference here. Why would it?

    You talk about the submission times even though I already clearly showed you why that is completely wrong.


  • 0
    G

    @StefanPochmann

    I still think include 30 or some numbers other than [2, 3, 5] would have benefit on large number. The benefit will increase, such as 1800 vs 1800^2, while the overhead of "%30" is constant.

    Meanwhile, for integer longer than 32 bit, [5, 3, 2] should be better than [2, 3, 5]


  • 0

    Yes, for 1800^2 you save more, but that is much more rare. Doesn't really play a role. For 1800^3 you save even more, but that is even much more rare.

    Look. Even if you got every 30th number for free, i.e., even if you took zero time for them, that still wouldn't make up for what you pay extra for all the other 29 out of every 30 numbers.

    And I tested very large numbers now, result being that 30, 5, 3, 2 again clearly lost against 2, 3, 5.

    Btw, please format your above code.


  • 0

    @gazersxs said in speedup trick by tuning divisor order:

    I reversed [2, 3, 5] to [5, 3, 2] is to scale down the num faster during the loop.

    I thought a bit more about this now. If you divide by 5 first, you do make the number more smaller for the remaining tests. But... it's also less likely that you can divide by 5. You have a 1/5 chance that the given number gets divided by 5, i.e., that you can shave off a factor of 5. You have a 1/25 chance that you can shave off another factor of 5. And so on. Each factor of 5 that you shave off shaves off log2(5) bits, and you can expect to do that 1/5+1/25+1/125+... times. Which is ∑i=1..(1/5)i = (∑i=0..(1/5)i) - 1 = 1/(1-1/5) - 1 = 1/4 times. So you can expect to shave off log2(5)/4 bits. Same for 2 and 3. This results in:

    2 shaves off log2(2) / 1 = 1.0 bits
    3 shaves off log2(3) / 2 ≈ 0.79 bits
    5 shaves off log2(5) / 4 ≈ 0.58 bits

    So it might actually be more beneficial to handle 2 first, because that shaves off the most and 5 shaves off the least. I say "might" because I haven't fully thought the consequences through yet, but I'd say it's at least something to think about :-)

    In any case, this also explains why the order pretty much doesn't matter for very large numbers, either. If your input number has a million bits, your output number (after taking out all factors of 2, 3 and 5) will have about a million minus 2.37 bits. So all the modulo and division operations pretty much operate on a million bits.

    And for "small" numbers, fitting into 32-bit or 64-bit ints, it doesn't matter since there the modulos and divisions all take the same time as far as I know and as this test shows:

    >>> from timeit import timeit
    
    >>> timeit('a / b', 'a = 1; b = 1', number=10**8)
    3.94301838331842
    >>> timeit('a / b', 'a = 987654321; b = 1357', number=10**8)
    3.806759419565246
    
    >>> timeit('a % b', 'a = 1; b = 1', number=10**8)
    3.750121840338977
    >>> timeit('a % b', 'a = 987654321; b = 1357', number=10**8)
    3.721301417073761
    

  • 0
    G

    @StefanPochmann
    Hi Stefan,

    I realized this part this morning on 5 first or 2 first.

    Meanwhile, you have a basic assumption that the test case is even distributed. Ugly numbers are sparse when the numbers are getting large. I tried to improve the worst scenario with lots of loop, as I thought the runtime critical test cases of Leetcodes would be the ones with lots of loops.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.