This is super weird. Why is logic bit shift soooooooooo unbeliebably slow?


  • 0
    A

    If I use the following line, which is kind of dumb, I get 9ms accepted.

       //for (uint i = 0; m >> i; i += 4) {
       for (uint i = 0; i < 32; i += 4) {
    

    But if I use the bit shift instead, which should be a smarter way than the above, I get TLE.

       for (uint i = 0; m >> i; i += 4) {
       //for (uint i = 0; i < 32; i += 4) {
    

    Why? What is going on? Could m >> i be so slow? This for loop should only be evaluated at most 8+1 times. This doesn't make sense.

    class Solution {
    public:
        int maximumGap(vector<int>& nums) {
            uint e = nums.size(), m = 0;
            if (e < 2) return 0;
            if (e == 2) return abs(nums[1] - nums[0]);
            uint c[17];
            vector<int> v(e);
            for (auto& n : nums) {
                if (n > m) m = n;
            }
            //for (uint i = 0; m >> i; i += 4) {
            for (uint i = 0; i < 32; i += 4) {
                memset(c, 0, sizeof(uint)*16);
                for (auto& n : nums) {
                    c[(n>>i & 0xF)+1]++;
                }
                for (uint i = 1; i < 16; i++) {
                    c[i] += c[i-1];
                }
                for (auto& n: nums) {
                    v[c[n>>i & 0xF]++] = n;
                }
                nums.swap(v);
            }
            m = 0;
            for (uint t, i = 1; i < e; i++) {
                if ((t = nums[i] - nums[i-1]) > m) m = t;
            }
            return m;
        }
    };
    

  • 0
    A

    OK. I figured it out myself.

    The 8086 does not mask the shift count. However, all other IA-32 processors (starting with the Intel 286 processor) do mask the shift count to 5 bits, resulting in a maximum count of 31. This masking is done in all operating modes (including the virtual-8086 mode) to reduce the maximum execution time of the instructions.

    So the logic is correct, but IA-32 hardware has a limitation on bit shift.
    We have to use the following to solve this limitation.

       for (uint i = 0; i < 32 && m >> i; i += 4) {

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.