@Mrsuyi good idea to use stack, I was considering it too but it was a little trickier to implement. In your solution, you could use binary search (as your stack is naturally sorted) instead of popping elements one by one. When you find a larger element, you can use vector.resize() to quickly remove all smaller elements. rbegin() and rend() can simplify the logic a bit, as the stack will be sorted in the descending order.

I added another solution to my answer based on your code, optimized using the binary search.

It looks like LeetCode OJ does not have a good test case to see the difference, so I ran both algorithms using a large random input, and the version with the binary search was 20% faster.

My initial algorithm that uses set is several times slower. I am guessing that this is due to the overhead to maintain BST internally.