I messed up some calculations in my last post and now I fixed it.
I thought bitset<3> takes 3 bytes memory.
I think I got a idea where the problem is, it may help you a little to redesign the algorithm.
The problem is still stackoverflow even though you allocated the bs object on heap. I tried allocating bs on heap and using pointer to bs to do the bit manipulations. My compiler still complains about stackoverflow. I thought the problem could be the two lines of code below:
bs = ~bs;
in &= in << 1;
The above two statements require temporary storage on stack (256MB) and I believe by default the size of stack is 1 MB.
So, perhaps the problem can be resolved if the bit manipulation is done chunk (1MB) by chunk.