Wrong test case


  • 1

    My solution gets "Wrong Answer" because it returns false for this test case: [240,130,138,147]. But the bit strings are [11110000, 10000010, 10001010, 10010011], meaning the actual character number (the 'x' parts) is 000000010001010010011, which in hexadecimal is 0x2293, which is outside the range 0x10000-0x10FFFF for 4-byte characters. So it's invalid and the expected answer true is wrong.


  • 0

    Thanks, Stefan. I've updated the test case to [240,162,138,147].


  • 0

    @1337c0d3r Well, now I just get to the next wrong test case. One with byte 246. Which in binary is 11110110. Even if all the continuation bits were 0 the number would still be 110000000000000000000, which is 0x180000 and thus way too big to be valid. But the expected answer is true.

    Instead of fixing the test cases, I suggest to simplify the problem. Say that we shall ignore the content of the "x"-bits. There are some other exceptions anyway, so even doing the ranges properly would still not be completely correct UTF-8 checking anyway.

    And then people doing it properly and checking the ranges aren't disadvantaged anymore compared to people who just ignore them (because everybody knows to ignore them). Also, if you did fix the test cases so that a proper UTF-8 checker gets accepted, then I think my solution would get accepted, but it shouldn't :-). It's just this:

    def validUtf8(self, data):
        try:
            ''.join(map(chr, data)).decode('utf8')
            return True
        except Exception as e:
            return False

  • 0
    Y

    @StefanPochmann
    Hi Stefan, maybe different from yours, but I also have some doubts regarding the test cases.
    Cloud you please take a look of following?

    The example 1 of the this problem, namely, case [197, 130, 1],
    in binary [11000101, 10000010, 00000001],
    is valid.

    Then I simply added two bits (10) before 197, so 197 becomes 709.
    In binary, 11000101 becomes 1011000101
    However, as stated in the problem note, Only the least significant 8 bits of each integer is used to store the data.

    So I think for case [709, 130, 1], it should be the same with case [197, 130, 1].
    However, the expected answer is giving "false".

    Am I misunderstanding something here?

    Thanks in advance.


  • 0
    M

    @StefanPochmann I would suggest fixing the test cases. It isn't too much more trouble to check the range of the decoded character, since the valid range is given in the problem statement.


  • 0
    Z

    @1337c0d3r It seems that only numbers < 256 exist in the test case.
    A test case [453, 130, 1] returns wrong. If "only the least significant 8 bits of each integer is used to store the data", this should be correct.
    453: 1 11000101
    130: 10000010
    1: 00000001


  • 0
    W

    For case [250, 145, 145, 145, 145], why the expect answer is false? Can anyone please explain it to me?


  • 1
    M

    @willysde UTF-8 does not have 5-byte characters. The longest character takes 4 bytes.


  • 0
    W

    @maigo Thank you Sir. I added a check for length not more than 4, now passed all testing cases.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.