Proof of Correctness

This algorithm is the first thought that came to my mind when I first saw this problem, but I later decided it incorrect due to the pairing problem. But now that the OJ accepted this solution, I started thinking again the science behind this.

If the intervals stays in the original pairing after sorting, e.g.: [1,2], [3,4], [5, 6], after sorting would be:

start end
1 2
3 4
5 6

the start and end of interval i all stayed in the ith position in array starts or ends. In this situation, what's left to do is easy to understand, just make sure no start[i] is smaller than end[i-1] for i=1..length-1 is okay.

But the subtlety here is when the pairing is actually broken during the sorting.

E.g. if we have: [1,2], [3,4], [0, 6], then the sorting gives us:

start end
0 2
1 4
3 6

The algorithm would still return false in this case, but only due to the fact like 1 < 2, which only states the seemingly trivial fact of intervals[0].start < intervals[0].end. If we are trying to determine that no other interval begins before this interval stops, comparing 1, which is interval[0].start, to 2, which is interval[1].end, does not make any immediate sense.

Incidentally, to help understanding: if we visualize, we have something like:

[ ]
[ ]
[ ]
Lemma 1

If the pairing for interval i is broken during the separate sorting, there must be another interval that completely contains interval i. Naturally, the right answer in this situation would be to return false.

**Proof:** suppose interval[i].start is at position m of the sorted starts array, and interval[i].end is at the position n of the sorted ends array. if m == n, then we know that the pairing is not broken for interval[i]. Otherwise, the pairing is broken for interval[i]. WLOG we suppose m > n, and we know that there are m intervals that starts before interval i starts, and there are n intervals that ends before interval i ends, and immediately we know that there are m - n intervals that starts earlier than interval i, but ends later than interval i. Actually, we only need the existence of one such overdue interval. This overdue interval starts earlier than interval i, and ends later than interval i, which implies that it completely comprehends interval i. This is clearly a situation that you can't attend both interval i and this overarching interval.

If m < n, then it's obvious that there be another m' and n' that belongs to one same interval i', and also satisfying m' > n'. Because if e.g. 3 intervals started before interval i started, and 5 intervals ended before interval i ended, then there can only be at most 3 intervals that started before interval i and also ended before interval i ended. The 2 intervals left must have started after interval i have started, but also ended before interval i ended. These 2 intervals are the intervals with m' > n'. Interval i contains these two intervals in this case.

**End of Proof**

Let's clear up a little: we know that:

if no pairing is broken, this algorithm can determine the right result (true or false).
We know that if any pairing is broken, the right answer should be false. But we do not know yet whether the algorithm can/will spit out false in such a scenario.
Lemma 2

After sorting, for each position m, we have starts[m] <= ends[m]

**Proof: ** The point is, we can't have starts[m] > ends[m] for any m = 0..length-1.

Suppose we do have starts[m] > ends[m]. Then we call the interval corresponding to ends[m] the interval k. We prove that interval k can't exist by contradiction.

Since the arrays are sorted, the m - 1 intervals corresponding to ends[0]..ends[m-1] must all begin before starts[m]: they all ends before interval k, which ends at ends[m], and we also know that ends[m] < starts[m], which in turn means that all these m - 1 intervals all starts before starts[m] (because each interval starts before it ends). More graphically speaking, the m - 1 intervals corresponding to ends[0]..ends[m-1] must also all corresponds to starts[0]..starts[m-1] (although the order within may be broken).

Having this conclusion in hand, we know that there will be no way to arrange interval[k].start: it has to be within the first m - 1 entries of starts (because its start has to be smaller than ends[m], which is smaller than starts[m]), which are already all taken by the m - 1 intervals corresponding to ends[0]..ends[m-1]. Here we have the contradiction.

**End of Proof**

Lemma 3

For interval i, if its start is at position m of the sorted starts array, and its end is at position n of the sorted ends array, then this algorithm (which compares each pair of consecutive intervals) will return false.

**Proof: **

First, let's suppose m > n:

If m = n + 1, then the proof is very easy: we would end up compare starts[n+1] with ends[n], which is comparing interval[i].start with interval[i].end (because again reminder: interval[i].start = starts[m], interval[i].end = ends[n]) , which will return less than, which will make the algorithm return false;

If m is arbitrarily larger than n, things get a little more complicated. We prove the lemma in this situation with contradiction. Suppose the algorithm actually returns true, this means the algorithm finds this conclusion: , for any i = 1..length-1, starts[i] >= ends[i-1].

Remind yourself that starts[m] = interval[i].start, ends[n] = interval[i].end;

We apply this conclusion to the range of i = n+1..m:

starts[n+1] >= ends[n]
starts[n+2] >= ends[n+1]
...
starts[m] >= ends[m-1]

and we aggregate all these m - n inequalities, together with the conclusion from Lemma 2:

starts[n+1] <= ends[n+1]
starts[n+2] <= ends[n+2]
...
starts[m-1] <= ends[m-1]

We easily have (by transitivity):

starts[m] >= ends[n]

Which means

interval[i].start >= interval[i].end

Now, as long as the input is legal, we must have

interval[i].start < interval[i].end

for all i. Thus we have the contradiction.

In the case of m < n, we have a similar symmetric proof as we did for the proof for Lemma 1 (there must be some m' > n' for some interval i', and we apply the proof above to interval i').

**End of Proof**

I hope it's now clear that when combining the three lemmas together, the correctness of this algorithm is elucidated. In my opinion, the algorithm looks deceptively simple, but thinking for the proof of correctness is quite fun.