When I saw this problem, what I think first is written calculation algorithm of square-root.

Converting the original algorithm to binary version it should be like this:

```
union LARGE_INTEGER{
long long QuadPart;
struct{
int LowPart;
int HighPart;
};
};
int mySqrt(int x){
int y=0,a,ttl;
union LARGE_INTEGER l;
l.HighPart=0;
l.LowPart=x;
for (ttl=16;ttl;--ttl){
l.QuadPart<<=2;
a=(y<<2)|1;
if (a<=l.HighPart){
l.HighPart-=a;
y=(y<<1)|1;
}else{
y<<=1;
}
}
return y;
}
```

The result will be no more than 16 bits, so it takes 16 steps to determine the result.

This program use only basic ALU command with SLL and RCL, it can run on microcontroller system.