Python A* Solution


  • 2
    E

    Below my solution to the Word Ladder problem using the A* search algorithm. It is far from the fastest solution out there (not even close to two-sided BFS) but A* is both optimal and complete, and in a situation with a larger search space it would quickly outpace most other options (it would also help if I could use the C code for calculating Hamming distance from the pylevenshtien package). I used hamming distance (the number of letter swaps in the same position) as the A* heuristic as it seemed to make the most sense and it's both admissible and consistent.

    Code:

    import operator
    import collections
    import heapq
    
    
    class Solution(object):
        def ladderLength(self, beginWord, endWord, wordList):
            """
            :type beginWord: str
            :type endWord: str
            :type wordList: Set[str]
            :rtype: int
            """        
            # Check inputs
            # Empty or identical
            if len(beginWord)==0 or len(wordList)==0 or beginWord==endWord:
                return 0
            # Single char
            if len(beginWord)==1:
                return 2
            
            
            # Funcs
            # Evertything is definied here in order to avoid penalties for out of scope references
            def make_p2w(words, w_idxs):  
                """Creates a map of all combinations of words with missing letters mapped 
                to all words in the list that match that pattern.
                E.g. hot -> {'_ot': ['hot'], 'h_t': ['hot'], 'ho_': ['hot']}
                """
                # Patterns to words and words to patterns
                p2w = collections.defaultdict(list)
    
                for w in words:
                    for i,j in w_idxs:
                        p = w[:i] + '_' + w[j:]
                        p2w[p].append(w)
                return p2w
                    
            # The search function!!!
            def a_star(start, goal, w_idxs, p2w):
                # Node structure: (int d+h, int d, int h, str word)
                # where d is the depth            
                visited = set([start])
                q = [(0, 1, 0, start)]
    
                while q:
                    # Get the next node
                    score, depth, h, word = heapq.heappop(q)
                    child_depth = depth+1
                                    
                    # Get the node's children
                    for i,j in w_idxs:
                        p = word[:i] + "_" + word[j:]
                        neighbor_words = p2w[p]
    
                        # Iterate through children
                        for nw in neighbor_words:
                            if nw not in visited:
                                # Goal check (before adding to the queue)
                                if nw == goal:
                                    return child_depth
                                visited.add(nw)
                                # Calculate the A* heurisitc
                                # In order to be optimal for a search on a graph the heuristic must be both admissible and 
                                # consistent.  In order to meet those criteria I'm using the number of different letters 
                                # (hamming distance) between the given word and the goal word.  This should never 
                                # overestimate the distance to the goal word and should also maintain the triangle 
                                # inequality for the graph (consistency). 
                                # See Chapter 3 of "Artificial Intelligence a Modern Approach" for details.
                                # Compute the number of swaps between two words by doing a boolean comparision 
                                # on each pair of letters and then using sum to caste the bools to ints.  
                                # E.g. if two words are neighbors, the sum should be 1.         
                                h = sum(map(operator.ne, nw, goal)) 
                                # Add to the q
                                heapq.heappush(q, (child_depth+h, child_depth, h, nw))
                # No route to goal 
                return 0
            
            # Get word length
            wl = len(beginWord)
            w_indexes = zip(range(wl), range(1, wl+1))
            # Preprocess the wordlist to make finding neighbors easy
            patterns2words = make_p2w(wordList, w_indexes)
            # Do the search
            return a_star(beginWord, endWord, w_indexes, patterns2words)
    

    EDIT
    I spent a bit more time with this and realized that I wasn't accounting for the case where a word is encountered a second (or more) time after it has already been encountered before from a longer path. E.g. given a word, w, that we put on the queue as Node(h_w, depth=5, w), we might come across w again from a different direction and have Node(h_w, depth=3, w). This is completely possible and it even happens in a couple of the leetcode test cases. The thing is, none of the test cases actually have the shortest path to the goal pass through any of those words. In fact, changing the code to account for this actually leads to visiting (though not expanding) more nodes than the way the code is above.

    The way this should work is that if a ward that is already in the visited set is encountered and the depth this time is less than when it was added to the queue, the depth and the heuristic score should be updated for that node in the queue and the queue should be re-heapified.

    A Note on Performance
    Even though it's not as fast as BFS it does expand as much as 200+ fewer nodes than an equivalent BFS solution. The problem is that calculating the heuristic is slow. If the search space were much larger A* would win.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.