To start with, imagine we have an array that keeps count of every single element to the right of the current element, we maintain this array while iterating nums reversely.

Given this count array, we can get the number of elements smaller than it with sum(count[0:n]).

If we implement this naively, the count array will take up O(max(nums) - min(nums)) in space, and every time we query the sum of smaller items will cost O(max(nums) - min(nums)) as well, which doesn't sound good, since max(nums)-min(nums) can be very very very big.

To improve it, we can map the elements in nums to [1, N], N being the length of nums. This is possible because we can only have at most N distinct elements in nums and this mapping will keep the order of original elements. In this way we can optimize O(max(nums)-min(nums)) to O(N).

Now we have a O(N^2) algorithm, since every query of count of smaller elements takes O(N) and we need to call it N times for N elements. This is still pretty bad, because we can just solve the problem naively, which would yield a O(N^2) algorithm anyway. However, with this count array, we can implement a fenwick tree, which is an online data structure, that can give range sum and update element in O(logN). I think a segment tree can do the same thing.

TL;DR, map elements in the array to [1, N], iterate nums reversely, maintain a count array with fenwick tree to update counts of elements, query fenwick tree every time to get count of smaller elements, total time complexity is O(NlogN), space complexity is O(N)

```
class Solution(object):
def countSmaller(self, nums):
cur = 1
discrete = {}
for n in sorted(set(nums)):
discrete[n], cur = cur, cur+1
fenwick = [0] * cur
def add(i):
while i < len(fenwick):
fenwick[i] += 1
i += i & -i
def get_sum(i):
ans = 0
while i > 0:
ans += fenwick[i]
i -= i & -i
return ans
ans = []
for n in [discrete[i] for i in reversed(nums)]:
ans.append(get_sum(n-1))
add(n)
ans.reverse()
return ans
```