Python solution with detailed explanation


  • 0
    G

    Solution

    UTF-8 Validation https://leetcode.com/problems/utf-8-validation/

    Algorithm

    • UTF can have only 1 to 4 characters.
    • If just one character, then its byte must start with 0. So '0b10010001' is an error.
    • If more than one character, then character 2 onwards must start from "10". How do we check that? First mask with 11000000. Then XOR with 10000000.
    class Solution(object):
        def get_first_unset_bit(self, x):
            mask, result = 0x80, 0
            for i in range(8):
                if mask >> i & x:
                    result += 1
                else:
                    break
            return result
    
        def validUtf8(self, data):
            """
            :type data: List[int]
            :rtype: bool
            """
            end = 0
            while end < len(data):
                x = data[end]
                first_unset_bit = self.get_first_unset_bit(x)
                if first_unset_bit in (1,5,6,7,8):
                    return False
                elif first_unset_bit == 0:
                    end += 1
                else:
                    if end + first_unset_bit-1 >= len(data):
                        return False
                    for i in range(end+1, end + first_unset_bit):
                        if (data[i] & 0xC0) ^ (0x80) != 0:
                            return False
                    end = end + first_unset_bit
            return True
    

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.