The first time I see the O(n log n) solution I really wonder where the design auxiliary array comes from and how one may even hypothesize the array to be monotonically increasing. But later I figured out there's a path to evolve from the O(n^2) to O(nlogn) solution.

Step1: the "naive" DP
The O(n2) DP maintains an array L, with L[k] storing the max length of longest increasing subsequence (abbreviated as LIS below) ending with the kth element of input. So the optimal substructure is L[k] = max {L[j]+1  j < k and nums[j]<nums[k]}. 
Step2: DP with a shorter auxiliary array
The key of this step is to examine what may update L[k], and realize:
there could be multiple indices j of the same L[j], and the LIS with the smallest tail suffice to render a candidate LIS with length L[j]+1.
This implies that the O(n2) DP is wasting space, and hints that we should turn to cache ones causing most L[k] to potentially update  they happen to be those LIS with smallest tail of each length.
This gives us a new formulation of optimal substructure formulating the min tail of increasing subsequence in nums[1...k] of length l, which we denote as T[k, l]. It is trivial to make it a 1D update by only maintaining the last two rows of T.
Now we get a DP of complexity O(n * L[n]). It is already better than the naive DP with a long input whose increasing subsequences are all short. But there is more space to improve. 
Step3: The O(nlogn) solution
A natural question to ask, if we'd want to optimize the above DP further, is when T[k, l] is updated to num[k]. It is easy to find out that this happens when T[k1, l1] < nums[k] < T[k1, l]. The left "<" ensures nums[k] can be appened after T[k1, l1], while the right "<" makes an update.
This leads us to wonder about occasions where the updated positions is sparse and easy to search  when the row T[k1, ] is monotonic we can do binary search! At this point it is all about make believe first and then prove this is indeed true.