Lazy Python solution with reuse, regular expressions, and optimizations


  • 1

    Update: Due to an added test case (see discussion in the replies), this no longer gets Accepted but now gets Time Limit Exceeded.

    I'm lazy, so I reused generateAbbreviations and validWordAbbreviation from previous problems. And I was lazy in those as well, heavily using regular expressions and itertools etc. And I use the optimizations I described. Without those optimizations, I get Time Limit Exceeded for input "chinaandusaarefre", ["china","are","and","usa","friends","us","en","ar"]. With those optimizations, that input is trivial, and my solution gets accepted in about 250 ms.

    def minAbbreviation(self, target, dictionary):
    
        # Throw out wrong-length words
        dictionary = [word for word in dictionary
                      if len(word) == len(target)]
        if not dictionary:
            return str(len(target))
    
        # Throw out words not matching the target in any position.
        dictionary = [word for word in dictionary
                      if any(map(operator.eq, word, target))]
        if not dictionary:
            return target if len(target) < 2 else target[0] + str(len(target) - 1)
    
        # Old solution from https://discuss.leetcode.com/topic/32108/python-solutions
        def generateAbbreviations(word):
            return [re.sub('#+', lambda m: str(len(m.group())), ''.join(masked))
                    for masked in itertools.product(*(c+'#' for c in word))]
    
        # "Old" solution from https://discuss.leetcode.com/topic/61353/simple-regex-one-liner-java-python
        def validWordAbbreviation(word, abbr):
            return re.match(re.sub('([1-9]\d*)', r'.{\1}', abbr) + '$', word)
    
        # Try candidate abbreviations from shortest to longest and
        # return the first that doesn't match any dictionary word.
        candidates = sorted(generateAbbreviations(target),
                            key=lambda abbr: len(re.findall('\d+|[a-z]', abbr)))
        return next(abbr for abbr in candidates
                    if not any(validWordAbbreviation(word, abbr)
                               for word in dictionary))

  • 1

    @StefanPochmann Good job Stefan but I think this is way beyond what's expected, even in a tough Google interview. How come this problem is marked as Medium?


  • 0
    V

    @agave That's a good question. It definitely does not like something easy to code (even the TLE solutions take a lot of time to write down)


  • 0

    @agave Wait, what? This is way beyond what's expected? That sounds like this is a somehow very good solution, while I think it's rather lazy and ugly :-). I might try again with a different approach. Don't know why it's marked Medium, wasn't involved in that.

    What do you think is expected?


  • 0

    @StefanPochmann It was not a comment to your code, it was a comment to the problem in general.


  • 0

    @agave Ah, so you mean the problem is too hard for an interview? But it's a simplified version of this problem from an actual interview.


  • 0

    @StefanPochmann Hi, Stefan, I'm afraid your solution will get a TLE for the test case below:

    target = 'internationalizations'
    dictionary = []
    for x in range(len(target)):
        dictionary.append(target[:x] + 'x' + target[x+1:])
    

  • 0

    @在线疯狂 Your input is invalid, it violates log2(n) + m ≤ 20. But with a 16-characters target it's valid and I still get TLE. I'm not surprised, though, didn't think my solution is particularly efficient, like I said I was rather lazy here. And this was/is a contest problem, where all that matters is that it gets accepted (and that you're not cheating, of course). Which it does.

    That said, I think the valid version is a good test case:

    "internationalize"
    ["xnternationalize", "ixternationalize", "inxernationalize", "intxrnationalize", "intexnationalize", "interxationalize", "internxtionalize", "internaxionalize", "internatxonalize", "internatixnalize", "internatioxalize", "internationxlize", "internationaxize", "internationalxze", "internationalixe", "internationalizx"]
    

    Problem is, it might still be a bit too hard in general, for other solutions as well. The judge's Python solution also takes about 1000-1200 ms just for that test case alone. And local testing showed me that my solution takes about 19 times as long for this test case as the judge's solution, so maybe a 15-character target would be better. The judge's solution solves that in about 500-600 ms while mine is still far too slow.

    @1337c0d3r what do you think?


  • 0

    @StefanPochmann I just tested @SergeyTachenov's solution as well now, it's about 6 times faster than the judge's solution on that test case with 16-characters target, so over 100 times faster than mine. (And I think it's a very hard test case for Sergey's solution as well.)


  • 0

    @StefanPochmann Aha, thank you. I ignored an important constraint! I thought this problem was a hard one as well during the contest and failed to pass the system test.


  • 0

    @StefanPochmann said in Lazy Python solution with reuse, regular expressions, and optimizations:

    @1337c0d3r what do you think?

    I have added your test case, it's a good one. I have also extended the time limit such that other brute force solutions can still get Accepted.


Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.