Straight forward Python DFS approach. Found it hard to optimize further since it seems copying nums in one way or another is unavoidable. O(n*n!) time complexity.

```
class Solution(object):
def permute(self, nums):
"""
:type nums: List[int]
:rtype: List[List[int]]
"""
self.res = []
self.dfs(nums, [])
return self.res
def dfs(self, nums, path):
if len(nums) == 0:
self.res.append(path)
return
for i in range(len(nums)):
self.dfs(nums[:i] + nums[i+1:], path + [nums[i]])
return
```