Simple Regex One-liner (Java, Python)


  • 20

    Update

    Much nicer, I just turn an abbreviation like "i12iz4n" into a regular expression like "i.{12}iz.{4}n". Duh.

    Java:

    public boolean validWordAbbreviation(String word, String abbr) {
        return word.matches(abbr.replaceAll("[1-9]\\d*", ".{$0}"));
    }
    

    Python:

    def validWordAbbreviation(self, word, abbr):
        return bool(re.match(re.sub('([1-9]\d*)', r'.{\1}', abbr) + '$', word))
    


    Obsolete original

    (This now gets a memory error, since the exploding testcase I suggested at the end has been added to the test suite.)

    "Clean":

    def validWordAbbreviation(self, word, abbr):
        regex = re.sub('\d+', lambda m: '.' * int(m.group()), abbr)
        return bool(re.match(regex + '$', word)) and not re.search('(^|\D)0', abbr)
    

    "Dirty" (abusing how Python handles the > there):

    def validWordAbbreviation(self, word, abbr):
        regex = re.sub('\d+', lambda m: '.' * int(m.group()), abbr)
        return re.match(regex + '$', word) > re.search('(^|\D)0', abbr)
    

    I turn each number into that many dots to get a regular expression. For example, when asked whether "3t2de" is a valid abbreviation for word "leetcode", I turn "3t2de" into "...t..de" and check whether that regular expression matches "leetcode", which it does. I also need to rule out the number "0" and leading zeros, which I do with another regular expression.

    @1337c0d3r I suggest adding test case "bignumberhahaha", "999999999", as that gets me a fully deserved MemoryError :-)


  • 0

    This might be a stupid question. But how does replaceAll ("[1-9]\d*", ".{$0}") work? I understand that [1-9]\d* matches valid integer, but how did you turn $0 into the length of the replaced integer?


  • 1

    @mandyyo I don't. It's not the length of the replaced integer, it is the replaced integer.


  • 1

    @StefanPochmann Right... And how do you put the integer back? what does $0 mean?


  • 4

    @mandyyo $0 just means the match. In general, you can capture groups with parentheses and refer to the groups like that in the replacement string. For example, "leetcode".replaceAll("(.)(.)", "$2$1") finds pairs of characters and switches their order, resulting in the string "elteoced". Group zero is just the whole match.


  • 0

    @StefanPochmann Thanks a lot!


  • 0

    @StefanPochmann said in Simple Regex One-liner (Java, Python):

    @1337c0d3r I suggest adding test case "bignumberhahaha", "999999999", as that gets me a fully deserved MemoryError :-)

    Done. I've added your test case.


  • 6
    Z

    five body on floor.


  • 0
    M

    For fun, here's a flawed(?) c++ version:

            bool validWordAbbreviation(string word, string abbr)
            {
                regex rx("([1-9][0-9]+)");
                abbr = regex_replace(abbr, rx, ".{$&}");
    
                auto okay = regex_match(word, regex(abbr));
                return okay;
            }
    

    Using your "bignumberhahaha", "999999999" test case, this passes with visual studio, but fails with a runtime error in the leetcode environment. The following hack does pass:

                regex rx("([1-9][0-9]{0,3})");
                abbr = regex_replace(abbr, rx, ".{$&}");
    

    This limits the number of char matches to 9999. Maybe this difference in behavior reflects differences in the respective c++ regex libraries...

    The run time is poor compared to other c++ submissions -- at the lower 1% mark.


  • 1

    @Mix1984 You should use * instead of +, otherwise it's wrong.

    That's not the reason for the problem you observe, though. Apparently the regex library indeed somehow sucks. I tried smaller numbers. "Run Code" with "bignumberhahaha", "90000" still works but takes about 228 ms. With "100000" it already gets "Runtime Error". I also tried it on my PC (gcc 4.9.3 on Windows 10 on an i7-6700), where already regex_match("", regex(".{40000}")); takes about 8 seconds. Weird. Couldn't find something about it, asked on SO now.


  • 0
    M

    Thanks! I was thinking maybe differences between implementations of the Dinkumware C++ standard library used with visual studio/c++ and libstdc++ used by gcc. I will watch the SO thread.


  • 0
    This post is deleted!

  • 0
    L
    This post is deleted!

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.