# This is super weird. Why is logic bit shift soooooooooo unbeliebably slow?

• If I use the following line, which is kind of dumb, I get 9ms accepted.

``````   //for (uint i = 0; m >> i; i += 4) {
for (uint i = 0; i < 32; i += 4) {
``````

But if I use the bit shift instead, which should be a smarter way than the above, I get TLE.

``````   for (uint i = 0; m >> i; i += 4) {
//for (uint i = 0; i < 32; i += 4) {
``````

Why? What is going on? Could `m >> i` be so slow? This for loop should only be evaluated at most 8+1 times. This doesn't make sense.

``````class Solution {
public:
int maximumGap(vector<int>& nums) {
uint e = nums.size(), m = 0;
if (e < 2) return 0;
if (e == 2) return abs(nums[1] - nums[0]);
uint c[17];
vector<int> v(e);
for (auto& n : nums) {
if (n > m) m = n;
}
//for (uint i = 0; m >> i; i += 4) {
for (uint i = 0; i < 32; i += 4) {
memset(c, 0, sizeof(uint)*16);
for (auto& n : nums) {
c[(n>>i & 0xF)+1]++;
}
for (uint i = 1; i < 16; i++) {
c[i] += c[i-1];
}
for (auto& n: nums) {
v[c[n>>i & 0xF]++] = n;
}
nums.swap(v);
}
m = 0;
for (uint t, i = 1; i < e; i++) {
if ((t = nums[i] - nums[i-1]) > m) m = t;
}
return m;
}
};
``````

• OK. I figured it out myself.

The 8086 does not mask the shift count. However, all other IA-32 processors (starting with the Intel 286 processor) do mask the shift count to 5 bits, resulting in a maximum count of 31. This masking is done in all operating modes (including the virtual-8086 mode) to reduce the maximum execution time of the instructions.

So the logic is correct, but IA-32 hardware has a limitation on bit shift.
We have to use the following to solve this limitation.

``   for (uint i = 0; i < 32 && m >> i; i += 4) {``

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.