Bit Manipulation, Java, 6ms


  • 14
    public boolean validUtf8(int[] data) {
    	if(data==null || data.length==0) return false;
    	boolean isValid = true;
    	for(int i=0;i<data.length;i++) {
    		if(data[i]>255) return false; // 1 after 8th digit, 100000000
    		int numberOfBytes = 0;
    		if((data[i] & 128) == 0) { // 0xxxxxxx, 1 byte, 128(10000000)
    			numberOfBytes = 1;
    		} else if((data[i] & 224) == 192) { // 110xxxxx, 2 bytes, 224(11100000), 192(11000000)
    			numberOfBytes = 2;
    		} else if((data[i] & 240) == 224) { // 1110xxxx, 3 bytes, 240(11110000), 224(11100000)
    			numberOfBytes = 3;
    		} else if((data[i] & 248) == 240) { // 11110xxx, 4 bytes, 248(11111000), 240(11110000)
    			numberOfBytes = 4;
    		} else {
    			return false;
    		}
    		for(int j=1;j<numberOfBytes;j++) { // check that the next n bytes start with 10xxxxxx
    			if(i+j>=data.length) return false;
    			if((data[i+j] & 192) != 128) return false; // 192(11000000), 128(10000000)
    		}
    		i=i+numberOfBytes-1;
    	}
    	return isValid;
    }
    

  • 1
    S

    good solution! can you explain what does this line do: i=i+numberOfBytes-1;


  • 1
    S

    @Sophie77
    The line is just to indicate that the numbers are valid till i + numberOfBytes - 1, now let's check the next numbers in the array.


  • 13
       Char. number range  |        UTF-8 octet sequence
          (hexadecimal)    |              (binary)
       --------------------+---------------------------------------------
       0000 0000-0000 007F | 0xxxxxxx
       0000 0080-0000 07FF | 110xxxxx 10xxxxxx
       0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
       0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
    

    Rule 2:
    Record the count of consecutive of 1.
    Move the number 5 bit right, if it equals "110" means there is one '1'.
    Move the number 4 bit right, if it equals "1110" means there are two '1'.
    ...
    Move the number 7 bit right, if it equals "1" means it is "10000000" which has no meaning, return false.

    Rule 1:
    If it is not started with "10", return false;

    public boolean validUtf8(int[] data) {
    	int count = 0;
    	for(int d:data){
    	  if(count == 0){
    	    if((d >> 5) == 0b110) count = 1;
    	    else if((d >> 4) == 0b1110) count = 2;
    	    else if((d >> 3) == 0b11110) count = 3;
    	    else if((d >> 7) ==  1) return false;
    	  } else {
    	    if((d>>6) != 0b10) return false;
    	    else count--;
    	  }
    	}
    	return count == 0;
    }
    

  • 0
    N

    @hot13399 Nice solution! You should deserve more upvotes!


  • 0
    N
    This post is deleted!

  • 0
    N

    @fabrizio3 Small changes to your code to speed up:
    """
    """
    public class Solution {
    public boolean validUtf8(int[] data) {
    if(data==null || data.length==0) return false;
    int i = 0;
    while(i<data.length) {
    if(data[i]>255) return false; // 1 after 8th digit, 100000000
    int numberOfBytes = 0;
    if((data[i] & 128) == 0) { // 0xxxxxxx, 1 byte, 128(10000000)
    numberOfBytes = 1;
    } else if((data[i] & 224) == 192) { // 110xxxxx, 2 bytes, 224(11100000), 192(11000000)
    numberOfBytes = 2;
    } else if((data[i] & 240) == 224) { // 1110xxxx, 3 bytes, 240(11110000), 224(11100000)
    numberOfBytes = 3;
    } else if((data[i] & 248) == 240) { // 11110xxx, 4 bytes, 248(11111000), 240(11110000)
    numberOfBytes = 4;
    } else {
    return false;
    }
    for(int j=1;j<numberOfBytes;j++) { // check that the next n bytes start with 10xxxxxx
    if(i+j>=data.length) return false;
    if((data[i+j] & 192) != 128) return false; // 192(11000000), 128(10000000)
    }
    i=i+numberOfBytes;
    }
    return true;
    }
    }
    """
    """
    """


  • 0

    Another Approach.

    public boolean validUtf8(int[] data) {
        int idx = 0;
        while (idx < data.length) {
            int followers = count (data [idx ++]) - 1;
            if (followers == 0 || followers >= 4) return false;
            while (followers -- > 0) if (idx == data.length || count (data [idx ++]) != 1) return false;
        }
        return true;
    }
        
    private int count (int num) {
        int bits = 0;
        for (int idx = 7; idx >= 0; idx --) if ((num >> idx & 1) == 0) break; else bits ++;
        return bits;
    }

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.