# Python solution with detailed explanation

• Solution

UTF-8 Validation https://leetcode.com/problems/utf-8-validation/

Algorithm

• UTF can have only 1 to 4 characters.
• If just one character, then its byte must start with 0. So '0b10010001' is an error.
• If more than one character, then character 2 onwards must start from "10". How do we check that? First mask with 11000000. Then XOR with 10000000.
``````class Solution(object):
def get_first_unset_bit(self, x):
for i in range(8):
if mask >> i & x:
result += 1
else:
break
return result

def validUtf8(self, data):
"""
:type data: List[int]
:rtype: bool
"""
end = 0
while end < len(data):
x = data[end]
first_unset_bit = self.get_first_unset_bit(x)
if first_unset_bit in (1,5,6,7,8):
return False
elif first_unset_bit == 0:
end += 1
else:
if end + first_unset_bit-1 >= len(data):
return False
for i in range(end+1, end + first_unset_bit):
if (data[i] & 0xC0) ^ (0x80) != 0:
return False
end = end + first_unset_bit
return True
``````

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.