Concise C++ implementation


  • 81
    F
    class Solution {
    public:
        bool validUtf8(vector<int>& data) {
            int count = 0;
            for (auto c : data) {
                if (count == 0) {
                    if ((c >> 5) == 0b110) count = 1;
                    else if ((c >> 4) == 0b1110) count = 2;
                    else if ((c >> 3) == 0b11110) count = 3;
                    else if ((c >> 7)) return false;
                } else {
                    if ((c >> 6) != 0b10) return false;
                    count--;
                }
            }
            return count == 0;
        }
    };
    

  • 0
    B

    @fight.for.dream +1, nice job! I guess the code cannot be shorter...


  • 0
    S

    @fight.for.dream so beautiful!!!


  • 1
    E

    I was wondering why there's no case for count = 4, 5, 6, then noticed that UTF8 is at most 4 bytes long.
    This is very concise. Thanks for sharing!


  • 0
    H

    @braydenCN it can be more shorter, LOL!

    class Solution {
    public:
        bool validUtf8(vector<int>& data) {
            int count = 0;
            for(const auto &val: data){
                if(!count){
                    if((val>>5)==0b110) count = 1;
                    else if((val>>4)==0b1110) count = 2;
                    else if((val>>3)==0b11110) count = 3;
                    else if(val>>7) return false;
                    continue;
                }
                if((val>>6)!=0b10) return false;
                count--;
            }
            return true;
        }
    };
    

  • 0
    B

    @Hermits hmm... really? seems to be just a copy-paste


  • 0
    C

    Can anyone tell me what b mean in 0b1110


  • 1
    H

    @chen2 binary number


  • 0
    K

    I'm confused about that when count ==0 you only judge four cases, why? The problem doesn't give the range of n, so why don't you judge (c>>0 == 0b11111110) and (c>>1 == 0b1111110)?
    Forgive my poor English expression.


  • 1
    A

    @fight.for.dream can you please the explanation for (val>>5)==0b110? I am still not able to get it?


  • 0
    C
        for(const auto &val: data){
            if(!count){
                if((val>>3)==0b11110) count = 3;
                else if((val>>4)==0b1110) count = 2;
                else if((val>>5)==0b110) count = 1;
                else if(val>>7) return false;
                continue;
            }
            if((val>>6)!=0b10) return false;
            count--;
        }
        return count==0;

  • 0
    C

    Hi, I have a question. For an input like [256], the binary is 1 0000 0000. Since only the 8 least significant are used, then it should be true right?


  • 0
    Z

    @fight.for.dream Hey my bro, do you assume the number is only 8 bit? what if I don't think you can handle a number like 1000.
    "The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data."


  • 0
    G

    fabulously concise


  • 4

    Great solution! I don't think it fabulous, you should always handle bits in this problem, this solution is concise and clear. Here is the Java version.

    public class Solution {
        public boolean validUtf8(int[] data) {
            int count = 0;
            for (int c : data) {
                if (count == 0) {
                    if ((c >> 5) == 0b110) count = 1;
                    else if ((c >> 4) == 0b1110) count = 2;
                    else if ((c >> 3) == 0b11110) count = 3;
                    else if ((c >> 7) != 0) return false;
                } else {
                    if ((c >> 6) != 0b10) return false;
                    count--;
                }
            }
            return count == 0;
        }
    }
    

  • 0
    Z

    Great job! So concise and clear!


  • 0
    J

    I think we should add "i = i&255" at the beginning of loop body.
    Since it mentioned: "Only the least significant 8 bits of each integer is used to store the data".
    Also , maybe it is necessary to update the OJ, cause test cases like [256] will give "false", but it's 100000000, least significant 8 bits are 00000000, which should give same result as input is [0].
    That is, if we gave a number bigger than 255, the judge will fail


  • 0
    A

    Interesting... My version was kind of "straightforward" but attempting to let the processor to execute as many instructions in parallel as possible. So the code is longer but faster than this concise one. The results are:

    32bit platform
    Mine: 8.9 sec
    Concise: 9.98 sec

    64bit platform
    Mine: 8.3 sec
    Concise: 11.5 sec

    My code and perf test are below.

    class Solution
    {
    public:
        bool validUtf8(const vector<int>& data) const
        {
            auto end = data.end();
            auto i = data.begin();
            if (i == end)
                return false;
    
            while (i != end)
            {
                int count = 0;
                unsigned char ch = *i;
                while (ch & 128)
                {
                    ch <<= 1;
                    ++count;
                    if (count > 4)
                        return false;
                }
    
                switch (count)
                {
                case 0:
                    ++i; break;
                case 1:
                    return false;
                case 2:
                    if ((end - i) < 2 || Wrong(i + 1))
                        return false;
    
                    i += 2;
                    break;
    
                case 3:
                    if ((end - i) < 3 || Wrong(i + 1) || Wrong(i + 2))
                        return false;
    
                    i += 3;
                    break;
    
                case 4:
                    if ((end - i) < 4 || Wrong(i + 1) || Wrong(i + 2) || Wrong(i + 3))
                        return false;
    
                    i += 4;
                    break;
                }
            }
    
            return true;
        }
    
    private:
        static inline bool Wrong(vector<int>::const_iterator i)
        {
            return (*i & 192) != 128;
        }
    };
    int main()
    {
        Solution s;
        auto r1true = s.validUtf8(vector<int> {197, 130, 1});
        auto r2false = s.validUtf8(vector<int> {235, 140, 4});
        auto r3 = s.validUtf8(vector<int> {197, 130, 1});
        auto r4false = s.validUtf8(vector<int> {191, 130, 1});
        auto r5true = s.validUtf8(vector<int> {223, 130, 1});
        auto r6true = s.validUtf8(vector<int> {239, 130, 130, 1});
        auto r7true = s.validUtf8(vector<int> {247, 130, 130, 130, 1});
        auto r8false = s.validUtf8(vector<int> {251, 130, 130, 130, 130, 1});
        auto r9false = s.validUtf8(vector<int> {247, 130, 130, 1});
        auto r10false = s.validUtf8(vector<int> {247, 130, 130});
    
        auto r11false = s.validUtf8(vector<int> {240, 162, 138, 147, 145 });
        auto r12false = s.validUtf8(vector<int> {247, 130, 130, 255, 1});
    
        vector<int> big;
        for (int i = 0; i < 1000000; ++i)
        {
            big.push_back(247);
            big.push_back(130);
            big.push_back(130);
            big.push_back(130);
            big.push_back(1);
    
            big.push_back(239);
            big.push_back(130);
            big.push_back(130);
            big.push_back(1);
    
            big.push_back(223);
            big.push_back(130);
            big.push_back(1);
        }
    
        int sum = 0;
        for (int i = 0; i < 200; ++i)
        {
            sum += s.validUtf8(big);
            big.push_back(130);
            sum += s.validUtf8(big);
            big.pop_back();
        }
    
        printf("%d", sum); // Should be 200 since one time we pass good vector and another a bad one.
        return 0;
    }
    

  • 0

    This is brilliant. Nice and clean.


  • 0

    I still cannot figure out the meaning of 0b110, 0b11110 or sth. like that. And I am sure I cannot remember them correctly in the interview. So I write it in a lazy way.

    31 = 2^4 + 2^3+2^2+2^1 + 2^0
    15 = 2^3+2^2+2^1 + 2^0
    ...
    It passes all the test cases. Is this solution right?

    public boolean validUtf8(int[] data) {
            int count = 0;
            for (int c: data) {
                if (count == 0) {
                    if (((c >> 3) & 31) == 31) {
                        return false;
                    } else if (((c >> 4) & 15) == 15) {
                        count = 3;
                    } else if (((c >> 5) & 7) == 7) {
                        count = 2;
                    } else if (((c >> 6) & 3) == 3) {
                        count = 1;
                    } else if (((c >> 7) & 1) == 1) {
                        return false;
                    }
                } else {
                    if (((c >> 6) & 2) != 2) {
                        return false;
                    }
                    count--;
                }
            }
            return count == 0;
        }
    

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.