@gorokhovsky Oh my, I haven't realized at all that it doesn't matter whether the highestdiffering bit is zero or one in the mask because it's zero anyway in m
. That certainly simplifies the solution!
Sergey Tachenov
@SergeyTachenov
I've been coding ever since before I was 13 years old. I had no PC, so I would write code on a piece of paper and then execute it in my mind. I work now for Russian Mission Control Center (dealing mainly with International Space Station telemetry), which is fun except that it's in Russia (not my kind of country, as I tend to respect laws and like quiet life). My hobbies are quantum mechanics and foreign languages, but I hardly devote any time to them because most of the time I either code or learn to code better. My dream is to leave Russia for good and live and work for a nice Western company in a nice Western country, but it remains just a dream for now.
Posts made by SergeyTachenov

RE: Java 8 ms oneliner, O(1), no loop, no log

RE: No more Europefriendly contests?
@1337c0d3r I don't have a Facebook account, but as for time... A start time around 11:00 Saturday or Sunday, or in the interval 06:00–08:00 Sunday (all UTC) would be absolutely ideal for me. I'm flexible enough, though, as long as it's daylight time in Europe and not Saturday morning.

RE: Java 8 ms oneliner, O(1), no loop, no log
@jedihy Naturally, I meant no floatingpoint log.

RE: A couple of Java solutions with explanations
@nikhilay That's because
++n
can make the number negative at some point. 
RE: C++ O(nlogk) solution with ordering bits, with O(logk) additional memory for recursion stack
OK, here it is. I think I was able to fix it, but those checks are ugly enough to allow a bug or two to slip in.

A solution based on @bartoszkp's, with missing test cases
Let's say we find the MSB that can be set to 1 in the result. Then we can partition the whole thing into two subsets. One element must be taken from one subset, the other from the other one. Then we move on to next bit that could possibly be set to one, but this time we're restricted to picking elements from different subsets generated at the first step. We do that by partitioning each subset in two subsets again based on the value of the next candidate bit, and we try to combine elements in such a way so that bit is set to one. There are two subsets now in each of the original subsets, so we try to combine elements in two ways based on that next bit: 10 and 01. The whole thing goes on recursively until we run out of bits, and then we just return the maximum, and everyone is happy. Or at least that's the idea.
This idea occurred to me when I was solving this problem at first, but I thought it would be too slow because I may get bad splits, so I switched to prefixes/sets instead. But then I saw this solution based on the same idea, which looked pretty impressive. During the discussion with the author we came to the conclusion that bad splits won't degrade runtime to O(n^2) because recursion depth is limited by the number of bits anyway.
Unlike that solution, my original idea was to use a mask that indicates which bits could be possibly set. Say, if a certain bit is 1 in all numbers, or is 0 in all numbers, there is no way to get 1 in that position. That means we should only consider bits that are set in some numbers, but not in all of them. “Set in some” = OR, “set in all” = AND, ”set in not all” = NOT AND, and therefore the mask for such bits is
or & ~and
.While I was at it, I realized that the original solution by @bartoszkp had a bug that wasn't detected by the OJ. It started with the MSB of the maximum element, but if that bit is set in all numbers, then the very first split will be wrong. The use of my mask incidentally fixed that too. A simple test case to demonstrate the problem:
[4, 6, 7]
.Another interesting test case:
[8, 10, 2]
. The code below fails it if the two lines testing for(mask & msb) == 0
in the helper function are commented out. That's because it tries to partition the array on the mask4
, but that bit is cleared in all numbers. And yet, it passes the OJ too.Now here is one version of the code, that is wrong too. I'm posting it because it clearly demonstrates a flaw with this approach in general.
class Solution { public: int findMaximumXOR(vector<int>& nums) { auto orOp = [](int a, int b) { return a  b; }; auto andOp = [](int a, int b) { return a & b; }; mask = accumulate(nums.cbegin(), nums.cend(), 0, orOp) & ~accumulate(nums.cbegin(), nums.cend(), 0x7FFFFFFF, andOp); auto msb = computeMsb(mask); auto msbSplit = msbPartition(nums.begin(), nums.end(), msb); return findMaximumXor(nums.begin(), msbSplit, msbSplit, nums.end(), msb >> 1); } int computeMsb(int n) { auto msb = n; msb = msb >> 1; msb = msb >> 2; msb = msb >> 4; msb = msb >> 8; msb = msb >> 16; return msb  (msb >> 1); } vector<int>::iterator msbPartition(const vector<int>::iterator &beginIt, const vector<int>::iterator &endIt, int msb) { auto msbSet = [msb](int n) { return (n & msb) != 0; }; return partition(beginIt, endIt, msbSet); } int findMaximumXor(const vector<int>::iterator& beginLeft, const vector<int>::iterator& endLeft, const vector<int>::iterator& beginRight, const vector<int>::iterator& endRight, int msb) { if (distance(beginLeft, endLeft) == 1 && distance(beginRight, endRight) == 1) return *beginLeft ^ *beginRight; if (msb == 0  beginLeft == endLeft  beginRight == endRight) return 0; if ((mask & msb) == 0) return findMaximumXor(beginLeft, endLeft, beginRight, endRight, msb >> 1); auto splitLeft = msbPartition(beginLeft, endLeft, msb); auto splitRight = msbPartition(beginRight, endRight, msb); auto result1 = findMaximumXor(beginLeft, splitLeft, splitRight, endRight, msb >> 1); auto result2 = findMaximumXor(splitLeft, endLeft, beginRight, splitRight, msb >> 1); return max(result1, result2); } private: int mask; };
The test case where it fails is
[14, 15, 9, 3, 2]
(not in the OJ either). It goes like this: first we split it like14, 15, 9 / 3, 2
, then we split the left part as14, 15 / 9
. And then, when we try to match14, 15
with3, 2
we get a problem. Even though the next candidate power of 2 is 1, we can't set it because2
is set in all numbers now. And yet it wasn't set in all numbers to begin with, so our mask fails to skip it.This flaw originally comes from the idea that we match subsets having different values of a certain bit. However, there is no guarantee that subsets even exist for that bit. Our mask only provides guarantee for the MSB, and later on it's just a hint. That means we need to check for that again after we partition. The fixed code is below, and I hope I got it right this time:
class Solution { public: int findMaximumXOR(vector<int>& nums) { auto orOp = [](int a, int b) { return a  b; }; auto andOp = [](int a, int b) { return a & b; }; mask = accumulate(nums.cbegin(), nums.cend(), 0, orOp) & ~accumulate(nums.cbegin(), nums.cend(), 0x7FFFFFFF, andOp); if (mask == 0) return 0; auto msb = computeMsb(mask); auto msbSplit = msbPartition(nums.begin(), nums.end(), msb); return findMaximumXor(nums.begin(), msbSplit, msbSplit, nums.end(), msb >> 1); } int computeMsb(int n) { auto msb = n; msb = msb >> 1; msb = msb >> 2; msb = msb >> 4; msb = msb >> 8; msb = msb >> 16; return msb  (msb >> 1); } vector<int>::iterator msbPartition(const vector<int>::iterator &beginIt, const vector<int>::iterator &endIt, int msb) { auto msbSet = [msb](int n) { return (n & msb) != 0; }; return partition(beginIt, endIt, msbSet); } int findMaximumXor(const vector<int>::iterator& beginLeft, const vector<int>::iterator& endLeft, const vector<int>::iterator& beginRight, const vector<int>::iterator& endRight, int msb) { if (msb == 0  (distance(beginLeft, endLeft) == 1 && distance(beginRight, endRight) == 1)) return *beginLeft ^ *beginRight; if ((mask & msb) == 0) return findMaximumXor(beginLeft, endLeft, beginRight, endRight, msb >> 1); auto splitLeft = msbPartition(beginLeft, endLeft, msb); auto splitRight = msbPartition(beginRight, endRight, msb); auto result = 0; if (distance(beginLeft, splitLeft) > 0 && distance(splitRight, endRight) > 0) result = findMaximumXor(beginLeft, splitLeft, splitRight, endRight, msb >> 1); if (distance(splitLeft, endLeft) > 0 && distance(beginRight, splitRight) > 0) result = max(result, findMaximumXor(splitLeft, endLeft, beginRight, splitRight, msb >> 1)); if (result == 0) // no way to set this bit to 1 result = findMaximumXor(beginLeft, endLeft, beginRight, endRight, msb >> 1); return result; } private: int mask; };
It runs for 26 ms, beating 99%.
The
result == 0
line executes when bothif
s above fail to run. That happens if we have bad splits on both sides, just like in the last mentioned test case.Another last funny test case is
[15, 15, 9, 3, 2]
. The problem description doesn't say numbers can't be duplicated. Well, the code above passes it thanks to themsb == 0
check in the beginning of the recursive function. It looks kind of funny because we just return a XOR of two randomly picked elements from both sides in that case without even checking how many elements are there. However, when we get tomsb == 0
, we have already split both sides based on every possible bit, so it's either that both subsets have size 1, or it's that they are all duplicates, and therefore picking first elements is just as fine.The last, but not least, is the test case where all numbers are duplicate. That is checked by
mask == 0
in the toplevel function. Funny thing, that test could be removed, but only because partitioning happens to use(n & msb) != 0
. If I change it to(n & msb) == 0
(which is OK in general), then we'll have a problem: the first MSB partitioning will generate an N/0 partition, which meansbeginRight
will beend()
, and we'll get UB by trying to dereference it. With(n & msb) != 0
it generates an 0/N partitioning, so bothbeginLeft
andbeginRight
point to the same element, and therefore eventually we return zero.Recap of the test cases to be added:
[4, 6, 7]
,[8, 10, 2]
,[14, 15, 9, 3, 2]
,[15, 15, 9, 3, 2]
. 
RE: C++ O(nlogk) solution with ordering bits, with O(logk) additional memory for recursion stack
Actually, disregard that last (deleted) post with the code. I think this approach has a fatal flaw, at least as it's presented right now. I'll post a separate topic on that because I think it's really worth discussing.

RE: C++ O(nlogk) solution with ordering bits, with O(logk) additional memory for recursion stack
@bartoszkp You're right! I felt like I was missing something, but I didn't think it was something that obvious!
You can do a minor improvement here, by the way. Instead of finding the max element and fiddling around with log2, you could compute OR of all elements ANDed together with the complement of AND of all elements. This way you'll get the mask of all bits that you could possibly set to 1 in the result. That mask essentially says which bits are set in at least one element, but not in all of them. Then you can quickly skip bits set to zero in this mask because they are either all ones or all zeroes, and therefore will be set to zero in any XOR.

RE: C++ O(nlogk) solution with ordering bits, with O(logk) additional memory for recursion stack
OK, what if one bucket contains 35 elements, and the other one several thousand? Or something like that? And suppose the larger bucket is the left one, so recursion continues, and then there's another unlucky split just like that, and so on. So you'll get O(N) levels of recursion with a partition call in each. Surely, that's pretty unlikely, but we're talking about the worst case here, so the question is whether that's possible, not whether that's likely.