Detailed explanation of Graham scan in 14 lines (Python)

  • 4

    Graham scan is an O(n log n) algorithm to find the convex hull of a set of points, which is exactly what this problem entails. The idea is to start at one extreme point in the set (I chose the bottom most point on the left edge) and sweep in a circle. Going counterclockwise is convenient due to the convention in trigonometry that polar angles in the unit circle increase as you move counterclockwise with respect to the positive x-axis, but this algorithm could potentially be performed sweeping clockwise as well. As you perform this sweep, add the encountered points to the solution set. After each addition check the last three points in the solution set. If these points rotate in a direction opposite of the direct of your sweep, you know that the second to last point cannot be correct (assuming there are more than 3 points in your solution set). What does it mean for points to rotate? Imagine yourself starting at the first point, walking directly to the second point, and then directly to the third point. The rotation of the points is the rotation you had to make at point 2 in order to face point 3. It is easy to intuit that if you are walking a giant counterclockwise circle around the boundary of the points, then turning clockwise at any point puts you inside the absolute outside boundary (i.e. inside the convex hull).

    There are two details to be sorted out:

    • How do we know what direction the last three points turn in? Let's call them p1,p2, and p3 in order of their appearance in the solution set. Let the vector from p1 to p2 be v1; from p2 to p3, be v2. The cross product of v1 and v2 give the direction that points turn in. If the cross product is negative, it is a right hand / clockwise turn; if it is positive, left hand / counterclockwise turn. If the cross product is zero, the three points are colinear.
    • How do we traverse the points in a circular fashion? Sure you can import trigonometric functions to help you find the polar angle of the line formed between each point and the start. Maybe a simpler, more intuitive approach, is to simply sort the points by the slope of the line made with the start point. Incidentally, this is also why I chose the smallest left point in the set as my start. It is now convenient that all the slopes monotonically increase in the (-infinity, infinity] domain as you traverse counterclockwise from the negative verticle. If two slopes are equivalent, take the point with the higher y-coordinate. If two slopes are both zero, take the point with smaller x-coordinate.
    def outerTrees(self, points):
        # Computes the cross product of vectors p1p2 and p2p3
        # value of 0 means points are colinear; < 0, cw; > 0, ccw
        def cross(p1, p2, p3):
            return (p2.x - p1.x)*(p3.y - p1.y) - (p2.y - p1.y)*(p3.x - p1.x)
        # Computes slope of line between p1 and p2
        def slope(p1, p2):
            return 1.0*(p1.y-p2.y)/(p1.x-p2.x) if p1.x != p2.x else float('inf')
        # Find the smallest left point and remove it from points
        start = min(points, key=lambda p: (p.x, p.y))
        # Sort points so that traversal is from start in a ccw circle.
        points.sort(key=lambda p: (slope(p, start), -p.y, p.x))
        # Add each point to the convex hull.
        # If the last 3 points make a cw turn, the second to last point is wrong. 
        ans = [start]
        for p in points:
            while len(ans) > 2 and cross(ans[-3], ans[-2], ans[-1]) < 0:
        return ans

  • 0

    Nice solution!
    I tried to implement a similar Graham scan but sorting by the cosine derived from the scalar product.

    The problem I had was numerical instability, where angles that are truly identical (say the vector from origin to (6,2) and the vector from origin to (3, 1)) were slightly different.

    How do you avoid that here? It looks as though it could be an issue given the division in the slope function.

  • 0

    @yorkshire In my solution, if two points have the same slope with respect to the starting point, I first check the point with the larger y-coordinate. If two points both have slope zero w.r.t. the start (all lie on a horizontal line), I check the one with the smaller x-coordinate. This is expressed in the line points.sort(key=lambda p: (slope(p, start), -p.y, p.x)). Note this requires you take the lowest point of the far left as your starting point, i.e. you cannot have any points in your search vector that has a slope of negative infinity with respect to the starting point.

  • 0

    @david120 Yes I'm all ok with the neat way you use the tuple of (slope(p, start), -p.y, p.x) to get the ordering correct.

    It's the slope part that I'm wondering if there could be an issue with numerical precision.
    E.g if one point is at (5, 13) and another is at (10, 26) then they have the same slopes as fractions but as decimals they are 0.38461538461538461538461538461538.....
    With an infinite number of decimal places of course they are the same but there must be some truncation to represent them in memory. Is there any guarantee that one will not be rounded slightly up or down? Or more generally a / b is exactly equal to (a * c) / (b * c) ?

    There are plenty of articles about the difficulty of comparing floats such as this.

    The issue with my cosine is because I use the square root to get the length and 2 == (2 ** 0.5) ** 2 returns False because the right hand side is slightly greater than 2.

    Python seems to be ok that 5/13 == 10/26 but not that squaring a square root gets back the same integer. So I'm wondering why/how and what the constraints are.

  • 0

    @yorkshire Ah I see what you're saying now. I think that because the problem limits the grid to be 100 x 100, we can get away with lazy == checking for float division. In the case of using cosines and squares, it looks like (2 ** 0.5) ** 2 = 2.0000000000000004, so I think it would be sufficient to define equality as having two numbers be within some small epsilon of each other, a la math.isclose.

  • 0

    @david120 Yes it seems you are right that for small grids it is ok
    The same issue happens with this problem about points on a line. This discussion has some more detail.
    Apparently it will be safe up to 26 bits (~ 67 million) but beyond that there are examples such as,
    94911149/94911150 == 94911150/94911151 # false
    94911150/94911151 == 94911151/94911152 # true
    94911151/94911152 == 94911152/94911153 # false

Log in to reply

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.