Bit Manipulation, Java, 6ms

• ``````public boolean validUtf8(int[] data) {
if(data==null || data.length==0) return false;
boolean isValid = true;
for(int i=0;i<data.length;i++) {
if(data[i]>255) return false; // 1 after 8th digit, 100000000
int numberOfBytes = 0;
if((data[i] & 128) == 0) { // 0xxxxxxx, 1 byte, 128(10000000)
numberOfBytes = 1;
} else if((data[i] & 224) == 192) { // 110xxxxx, 2 bytes, 224(11100000), 192(11000000)
numberOfBytes = 2;
} else if((data[i] & 240) == 224) { // 1110xxxx, 3 bytes, 240(11110000), 224(11100000)
numberOfBytes = 3;
} else if((data[i] & 248) == 240) { // 11110xxx, 4 bytes, 248(11111000), 240(11110000)
numberOfBytes = 4;
} else {
return false;
}
for(int j=1;j<numberOfBytes;j++) { // check that the next n bytes start with 10xxxxxx
if(i+j>=data.length) return false;
if((data[i+j] & 192) != 128) return false; // 192(11000000), 128(10000000)
}
i=i+numberOfBytes-1;
}
return isValid;
}
``````

• good solution! can you explain what does this line do: i=i+numberOfBytes-1;

• @Sophie77
The line is just to indicate that the numbers are valid till i + numberOfBytes - 1, now let's check the next numbers in the array.

• ``````   Char. number range  |        UTF-8 octet sequence
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
``````

Rule 2:
Record the count of consecutive of 1.
Move the number 5 bit right, if it equals "110" means there is one '1'.
Move the number 4 bit right, if it equals "1110" means there are two '1'.
...
Move the number 7 bit right, if it equals "1" means it is "10000000" which has no meaning, return false.

Rule 1:
If it is not started with "10", return false;

``````public boolean validUtf8(int[] data) {
int count = 0;
for(int d:data){
if(count == 0){
if((d >> 5) == 0b110) count = 1;
else if((d >> 4) == 0b1110) count = 2;
else if((d >> 3) == 0b11110) count = 3;
else if((d >> 7) ==  1) return false;
} else {
if((d>>6) != 0b10) return false;
else count--;
}
}
return count == 0;
}
``````

• @hot13399 Nice solution! You should deserve more upvotes!

• This post is deleted!

• @fabrizio3 Small changes to your code to speed up:
"""
"""
public class Solution {
public boolean validUtf8(int[] data) {
if(data==null || data.length==0) return false;
int i = 0;
while(i<data.length) {
if(data[i]>255) return false; // 1 after 8th digit, 100000000
int numberOfBytes = 0;
if((data[i] & 128) == 0) { // 0xxxxxxx, 1 byte, 128(10000000)
numberOfBytes = 1;
} else if((data[i] & 224) == 192) { // 110xxxxx, 2 bytes, 224(11100000), 192(11000000)
numberOfBytes = 2;
} else if((data[i] & 240) == 224) { // 1110xxxx, 3 bytes, 240(11110000), 224(11100000)
numberOfBytes = 3;
} else if((data[i] & 248) == 240) { // 11110xxx, 4 bytes, 248(11111000), 240(11110000)
numberOfBytes = 4;
} else {
return false;
}
for(int j=1;j<numberOfBytes;j++) { // check that the next n bytes start with 10xxxxxx
if(i+j>=data.length) return false;
if((data[i+j] & 192) != 128) return false; // 192(11000000), 128(10000000)
}
i=i+numberOfBytes;
}
return true;
}
}
"""
"""
"""

• Another Approach.

``````public boolean validUtf8(int[] data) {
int idx = 0;
while (idx < data.length) {
int followers = count (data [idx ++]) - 1;
if (followers == 0 || followers >= 4) return false;
while (followers -- > 0) if (idx == data.length || count (data [idx ++]) != 1) return false;
}
return true;
}

private int count (int num) {
int bits = 0;
for (int idx = 7; idx >= 0; idx --) if ((num >> idx & 1) == 0) break; else bits ++;
return bits;
}``````

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.