I think test cases in this problem are too weak

Designing a test case that fails for an algorithm in O(n) and TLE in O(n log n) is quite impractical. log is a really slow function. E.g. log2(25000) = 14.7 and it would be difficult to differentiate consistently the log factor to a large constant in the big O.
The word consistently is here important. First of all, you can see that testing the same code several times takes different durations. My guess is that @1337c0der does not want false negative. In order to avoid that you would have to find a test that succeeds consistently even with a large constant in the big O, and sometimes fails if there is an extra log factor. This would create a lot of false positive.
The other solution would be to plot the duration of several length and test if it's linear or linear * log. Doing that accurately is f*cking hard and expensive both in time and money. (See for example this discussion on stackOverflow)
In my community (I am a complexity guy), we are often happy enough if we have an algorithm that is optimal up to poly log factors. We even have a notation for that Õ.