I don't see O(1), either, but at least this Newton solution computes the root of 21000000 in just 19 iterations:

class Solution(object):
def isPerfectSquare(self, x):
r = 2 << (x.bit_length() // 2)
while r*r > x:
r = (r + x/r) / 2
return r*r == x

Looks like it's O(log(log(x))) iterations:

>>> for e in range(20):
x = 2**2**e
r = 2 << (x.bit_length() // 2)
steps = 0
while r*r > x:
r = (r + x/r) / 2
steps += 1
print 'x=2**2**%d takes %d steps' % (e, steps)
x=2**2**0 takes 2 steps
x=2**2**1 takes 1 steps
x=2**2**2 takes 2 steps
x=2**2**3 takes 2 steps
x=2**2**4 takes 3 steps
x=2**2**5 takes 4 steps
x=2**2**6 takes 5 steps
x=2**2**7 takes 6 steps
x=2**2**8 takes 7 steps
x=2**2**9 takes 8 steps
x=2**2**10 takes 9 steps
x=2**2**11 takes 10 steps
x=2**2**12 takes 11 steps
x=2**2**13 takes 12 steps
x=2**2**14 takes 13 steps
x=2**2**15 takes 14 steps
x=2**2**16 takes 15 steps
x=2**2**17 takes 16 steps
x=2**2**18 takes 17 steps
x=2**2**19 takes 18 steps

Though of course number of iterations is only part of the story when you're going this big. The multiplication and divisions add a comparatively large factor and I think it becomes

O(log(log(x)) * log2(x)). Or better, if you're using better multiplication/division.

Then again, this article seems to say Newton square root takes time proportional to the time for one multiplication.