Python with bit masks


  • 23

    Gets accepted in ~130 ms.

    def minAbbreviation(self, target, dictionary):
        m = len(target)
        diffs = {sum(2**i for i, c in enumerate(word) if target[i] != c)
                 for word in dictionary if len(word) == m}
        if not diffs:
            return str(m)
        bits = max((i for i in range(2**m) if all(d & i for d in diffs)),
                   key=lambda bits: sum((bits >> i) & 3 == 0 for i in range(m-1)))
        s = ''.join(target[i] if bits & 2**i else '#' for i in range(m))
        return re.sub('#+', lambda m: str(len(m.group())), s)
    

    If the target is apple and the dictionary contains apply, then the abbreviation must include the e as the letter e, not in a number. It's the only letter distinguishing these two words. Similarly, if the dictionary contains tuple, then the abbreviation must include the a or the first p as a letter.

    For each dictionary word (of correct size), I create a diff-number whose bits tell me which of the word's letters differ from the target. Then I go through the 2m possible abbreviations, represented as number from 0 to 2m-1, the bits representing which letters of target are in the abbreviation. An abbreviation is ok if it doesn't match any dictionary word. To check whether an abbreviation doesn't match a dictionary word, I simply check whether the abbreviation number and the dictionary word's diff-number have a common 1-bit. Which means that the abbreviation contains a letter where the dictionary word differs from the target.

    Then from the ok abbreviations I find one that maximizes how much length it saves me. Two consecutive 0-bits in the abbreviation number mean that the two corresponding letters will be encoded as the number 2. It saves length 1. Three consecutive 0-bits save length 2, and so on. To compute the saved length, I just count how many pairs of adjacent bits are zero.

    Now that I have the number representing an optimal abbreviation, I just need to turn it into the actual abbreviation. First I turn it into a string where each 1-bit is turned into the corresponding letter of the target and each 0-bit is turned into #. Then I replace streaks of # into numbers.


  • 0

    short and clear, good job!


  • 5
    C

    Nice and clever! It took me a while to figure out the succinct implementation. I took notes and rewrote it as following

    def minAbbreviation(self, target, dictionary):
        def abbr(target, num):
            word, count = '', 0
            for w in target:
                if num & 1 == 1:
                    if count:
                        word += str(count)
                        count = 0
                    word += w
                else:
                    count += 1
    
                num >>= 1
            if count:
                word += str(count)
            return word
    
        m = len(target)
    
        # Figure out the different bits for a same length word in the dictionary
        diffs = []
        for word in dictionary:
            if len(word) != m:
                continue
    
            # The encoding is opposite
            bits = 0
            for i, char in enumerate(word):
                if char != target[i]:
                    bits += 2 ** i
            diffs += bits,
    
        # No word in dictionary has same length, return the shortest
        if not diffs:
            return str(m)        
        
        abbrs = []
        for i in range(2 ** m):
            # This abbreviation at least has one word different to every words in the dictionary
            if all(d & i for d in diffs):
                abbrs += abbr(target, i),
    
        return min(abbrs, key=lambda x: len(x))
    

  • 0
    O

    This solution is so impressive!

    I rewrote it without regular expression:

        def minAbbreviation(self, target, dictionary):
            n = len(target)
            diffs = [sum((target[i]!=c)<<i for i,c in enumerate(w)) for w in dictionary if len(w)==n]
            if not diffs:
                return str(n)
            x = max((i for i in range(2**n) if all(i&x for x in diffs)), key=lambda x: sum((x>>i)&3==0 for i in range(n-1))) + 2**n
            ret, lo = [], 0
            for hi in range(n+1):
                if (x>>hi)&1:
                    ret += [str(hi-lo)] if hi>lo else []
                    ret += [target[hi]] if hi<n else []
                    lo = hi+1
            return ''.join(ret)
    

  • 0
    L

    @StefanPochmann Any reason why you can just disregard the words that don't have the same length as the target?


  • 0
    Y

    @livelearn Words of different lengths cannot have the same abbreviation.


  • 0
    Y

    @StefanPochmann said in Python with bit masks:

    def minAbbreviation(self, target, dictionary):
    m = len(target)
    diffs = {sum(2i for i, c in enumerate(word) if target[i] != c)
    for word in dictionary if len(word) == m}
    if not diffs:
    return str(m)
    bits = max((i for i in range(2
    m) if all(d & i for d in diffs)),
    key=lambda bits: sum((bits >> i) & 3 == 0 for i in range(m-1)))
    s = ''.join(target[i] if bits & 2**i else '#' for i in range(m))
    return re.sub('#+', lambda m: str(len(m.group())), s)

    Hi, @StefanPochmann

    Very neat solution. But if following test case is used it will encounter TLE

    "usaandchinaarefriends"
    ["aaaaaaaaaaaaaaaaaaaaa"]
    

    It seems that using int to encode the abbreviation makes it too hard to calculate the length of the abbreviated string efficiently to prune the DFS. But I think it's very easy to adapt your solution to cope with this case.

    Just FYI. Anyway, your solution is very inspirational.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.