This is a 4 partition problem. An easier 2 partition problem is Partition Equal Subset Sum. Thanks to @withacup. Following is the nicest I have

```
bool makesquare(vector<int>& nums) {
int tot = accumulate(nums.begin(),nums.end(),0);
if(!tot || tot%4) return 0;
int edge[4]={0};
sort(nums.begin(),nums.end(),greater<int>()); //try numbers from big to small
return isSqr(0, tot/4, edge,nums);
}
bool isSqr(int p, int sum, int edge[4], vector<int>& nums) {
if(p==nums.size()) return edge[0]==sum && edge[1]==sum && edge[2]==sum;
unordered_set<int> ht;
for(int i=0;i<4;i++) {
const auto &pr = ht.insert(edge[i]); //do not create duplicate states from a state
if(!pr.second || nums[p]+edge[i]>sum) continue; //do not expand to an invalid state
edge[i]+=nums[p];
if(isSqr(p+1,sum,edge,nums)) return 1;
edge[i]-=nums[p];
}
return 0;
}
```

There are 3 optimizations based on brute force.

- Do not expand to an invalid state with a larger than average partition.
- Try the numbers from big to small. This is not as easy as it seems. During DFS, we have 4 bins and we try to put a number in each of them to generate the next state. Worst case is O(4^n) which happens when a state always expands to 4 valid state, that is when expand from small numbers to larger numbers. In this case, we generate a lot of states from the small numbers and figure out they are useless because they cannot accommodate the large numbers. If we search from large numbers, then we may have a filled bin at first try and runtime becomes O(3^n).
- Do not create duplicate states from a state. A state is determined by the 4 bins and the current number. If memory allows, we can have a 5d memoization array to remove redundancy. Since a state is defined by 5 variables, it may not be very often to generate a duplicate state so it is probably ok to go without memorization and use the cheap trick.

What if the numbers are arbitrary? Looks like only #3 still works...