two-phase (tokenize & dp) solution in python


  • 0
    X

    phase 1: tokenize

    make re expression into a list of pairs (once_or_many, some_char_or_any), called unit.
    e.g. "a*b.c*" -> [('*', 'a'), ('1', 'b'), ('1', '.'), ('*', 'c')]

    phase 2: dp

    make a 2-d array. dp[i][j] means whether string[:i+1] matches units[:j+1]

    1. add <begin> before the string and the units (conceptually). <begin> has index of -1.
    2. dp[-1][-1] = True since match("", "") == True
    3. dp[i][-1] = False since match(any-non-empty-string, "") == False
    4. dp[-1][j] = dp[-1][j-1] and units[j][0] == '*', since match("", re) == True iff re is are all * units
    5. dp[i][j] =
      1. units[j][0] != '*'
        in this case, the j-th unit shall eat a char in string, provided the prev string and units are match.
        dp[i][j] = dp[i-1][j-1] and atom_match(string[i], units[j])
      2. units[j][0] == '*'
        1. dp[i][j-1] == True, and the current * unit matches an empty char. e.g. "ab" and "abc*"

        2. dp[i-1][j] == True, the current * unit has eaten the prev char. it must be able to eat current char, too. we add and atom_match(string[i], units[j]).

          notice always we have dp[i-1][j-1] == dp[i-1][j] if units[j][0] == '*'

    6. return dp[len(string)][len(units)] aka dp[-2][-2]
    import operator
    
    #re unit type: 
    # unit = once atom | many atom
    # atom = char c | any
    def tokenize(re):
        prev_c = None
        for c in re:
            if c == '*':
                yield ('*', prev_c) #many atom
                prev_c = None
            else:
                if prev_c != None:
                    yield ('1', prev_c) #once atom
                prev_c = c
        if prev_c:
            yield ('1', prev_c)
    def atom_match(c, unit):
        return unit[1] == '.' or unit[1] == c
    
    def unit_match_dp(string, units):
        if not string and not units:
            return True
        if not units:
            return False
        if not string: #True if all units are '*'
            return reduce(operator.and_, map(lambda p:p[0] == '*', units))
        #has string and has units
        dp = [[False for _ in range(len(units) + 1)] for _ in range(len(string) + 1)]
        dp[-1][-1] = True
        for j in range(len(units)):
            dp[-1][j] = dp[-1][j-1] and units[j][0] == '*'
        for i in range(len(string)):
            for j in range(len(units)):
                if units[j][0] == '*':
                    dp[i][j] = (dp[i-1][j] and atom_match(string[i], units[j])) or (dp[i][j-1])
                else:
                    dp[i][j] = dp[i-1][j-1] and atom_match(string[i], units[j])
        return dp[-2][-2]
    
    def match(string, re):
        units = list(tokenize(re))
        return unit_match_dp(string, units)
    '''
        <b> a*  a   a   b   c*
    <b> 1   1   0   0   0   0
    a   0   1   1   0   0   0
    a   0   1   0   1   0   0
    b   0   0   0   0   1   0   
    c   0   0   0   0   0   1
    c   0   0   0   0   0   1
    '''
    
    assert match('aa', 'aa')
    assert match('a', 'a*a')
    assert match("abcd", "d*")
    assert match("aaaaaaaaaaaaab", "a*a*a*a*a*a*a*a*a*a*b")
    print('ok!')
    

Log in to reply
 

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.