Python bruteforce using an array of sets


  • 1

    Based on this StackOverflow answer. The idea is to have an array of sets matching the target at position i for each index in the target string. Special cases:

    1. If there are no words with length equal to the length of the target, just replace the whole thing with a number.
    2. If there are such words, but they never match target at a certain position, we can leave just the character at that position. Preference for the first and the last characters because they give us the shortest abbreviations.

    Otherwise, we generate abbreviations of increasing lengths. For each abbreviation we find the intersection of all sets matching the abbreviation at all non-wildcard positions. If the sets don't intersect, it means that there is no word in the dictionary matches the abbreviation completely, and therefore we return the abbreviation. Otherwise move to the next one.

    Submitted twice, the run times were 62-75 ms.

    The algorithm for generating abbreviations can probably be improved. Removing too little characters at certain steps may lead to recursion hitting a dead end, so instead of xrange(1, remove + 1) we could probably have something arcane for the starting value.

    dictionary = [w for w in dictionary if len(w) == len(target)]
    if not dictionary:
        return str(len(target))
    matches = [set() for __ in xrange(len(target))]
    for i, c in enumerate(target):
        for w in dictionary:
            if w[i] == c:  # add to set of words matching target at pos i
                matches[i].add(w)
    re_num = re.compile(r"(?:\d+|\D+)")
    if not matches[0]:
        return target if len(target) == 1 else target[0] + str(len(target) - 1)
    if not matches[-1]:
        return target if len(target) == 1 else str(len(target) - 1) + target[-1]
    if not all(matches):
        first_empty = matches.index(set())
        return str(first_empty) + target[first_empty] + str(len(target) - (first_empty + 1))
    
    def matching(abbr):
        i, s = 0, set()
        for p in re_num.findall(abbr):
            if p[0].isdigit():
                i += int(p)  # skip wildcard
            else:  # non-wildcard part
                for j in xrange(len(p)):
                    if s:
                        s &= matches[i + j]
                        if not s:  # intersection of all matching sets will be empty too
                            return s
                    else:  # first set
                        s = set(matches[i + j])  # copy to avoid modifications in matches
                i += len(p)
        return s
    
    def generate(abbr, pos, remove):
        if remove == 0:
            yield abbr + target[pos:]
        else:
            start = 0 if pos == 0 else pos + 1  # skip 1 char to avoid generating adjacent numbers
            for i in xrange(start, len(target) - remove):
                for r in xrange(1, remove + 1):  # decrease length by r (replace chars [i:r + 1] with a number)
                    for res in generate(abbr + target[pos:i] + str(r + 1), i + r + 1, remove - r):
                        yield res
    
    for L in xrange(2, len(target) + 1):
        for abbr in generate("", 0, len(target) - L):
            if not matching(abbr):
                return abbr
    

  • 1

    Very nice, both the matching part and the generation by increasing length.


  • 0
    W

    I didn't know python has regular expressions until seeing your "re.compile(r"(?:\d+|\D+)")"... Great solution!


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.