Find the Duplicate Number


@sreedharande "Here, we sort nums in place, so the memory footprint is constant. If we cannot modify the input array, then we must allocate linear space for a copy of nums and sort that instead."

@sirotan You are correct; good catch. I'll fix that now. Only that slide needs to be fixed, though, as if the two pointers point to two different locations with the same number, it's obviously the duplicate.

@emptyset I'm still a little confused with the correlation between cycle detection and this problem. In the slide, after phase I, can we directly say "the duplicate is 6" since we have found the intersection (node with the same value) instead of going through phase 2? Why is the entrance of a cycle the duplicate node?

@oyoi2005 The problem is that the slides don't correspond 1:1 to the code. What should actually happen there should be two more moves by tortoise (and 4 more moves by hare) and they should meet at index 3. From there you actually have to perform the second phase to find out that the duplicate is 6.
It's easier to understand why the second phase is needed if you take an array [1, 4, 4, 5, 6, 2, 3]. In this example they would meet at index 5 after the first phase and then find the duplicate 4 in the second phase. It should become quite clear when you draw a graph for this on paper and follow the steps of the algorithm.

@piyush.dez said in Find the Duplicate Number:
Swift (valid input : 1 <= x <= len(array))
REPLY
)
func findDuplicates(array: inout [Int]) {
var set: Set = Set<Int>()
for i in 0..<array.count {
let index = abs(array[i])1if array[index] < 0 { set.insert(abs(array[i])) } else { array[index] = array[index] } } }

@guo.lincode That test case is not valid. The array can contain only integers between
1
andn
.


Here's my understanding of the Floyd's Tortoise and Hare solution, and the analysis of its time complexity.
First of all, when traversing the array described in the problem by always using the current value as the next index to go to, there must be a loop. Let's say C is the length of the loop, which is smaller than the size of the array, aka C < n + 1. Before entering the loop, there are K steps to get from nums[0] to the beginning of the loop. Apparently, K is also smaller than the size of the array, aka K < n + 1.
Now let's see what's happening during the first loop. When Tortoise/Slow first reached the beginning of the loop, it moved K times; Meanwhile, Hare/Fast moved K times past the beginning of the loop, and is now somewhere in the loop. We could take note of Fast's current distance from the beginning of the loop, which is (K % C). At this point, how many moves it would take for Fast to catch up with Slow within the loop? It would surely take less than C moves. In fact, with each move, Fast would shorten the gap of the two by one. We know Fast is (K % C) steps ahead of the start point of the loop, so the gap between the two is (C  (K % C)). This is to say, When Fast catches Slow, aka the first loop exits, Slow has moved (C  (K % C)) steps past the beginning of the loop.
Time complexity of the first loop: Slow moved K times before the loop, and then less than one loop to be caught up. O(K + C) = O(n).
And then we move to the second loop. Slow's position remains the same while Fast is reset to nums[0]. Also Fast now moves at the same speed as Slow does. If Fast and Slow were to meet, they should meet at the beginning of the loop. Now would they? Let's take a look. It would take Fast K moves to get to the beginning of the loop. Where would Slow be after K moves? It would be (C  (K % C) + K) steps past the beginning of the loop. Where the hell is that? Let's mod it with C: (C  (K % C) + K) % C = (C % C)  ((K % C) % C) + (K % C) = 0  (K % C) + (K % C) = 0. So yes, after K moves, Slow is also at the beginning of the loop. Thus, it would take exactly K moves to exit the second loop, and we found the beginning of the loop.
Time complexity of the second loop: O(K) = O(n).
Overall, time complexity of the solution is O(n).

@emptyset If we are already getting the duplicate value when dowhile loop (// Find the intersection point of the two runners) ends then what's the need for that while loop (// Find the "entrance" to the cycle) at the end?

@andy_r_s I think it is because when tortoise == hare, maybe they are the same nums[i], so you can not say you get the duplicate value.

@andy_r_s I think this specific example is confusing in the way that it gives wrong impression to the reader that one pass is enough to find the duplicate, well in this case yes it is, since when it intersects, it is actually the entry of the loop, but giving another example which is not the case:
Given the following array and its value:
0 1 2 3 4 5 6
1 2 3 4 5 6 3
after 1st round:
t: 2
h: 3
after 2nd round:
t: 3
h: 5
after 3rd round:
t: 4
h: 3
after 4th round:
t: 5
h: 5
the first pass finishes sincet==h
Now after the first pass, the value is5
, but5
is not the duplicate, we are inside the cycle but not the entry of the cycle, therefore we have to find the entry of the cycle which is3
.