# two-phase (tokenize & dp) solution in python

• phase 1: tokenize

make re expression into a list of pairs (once_or_many, some_char_or_any), called unit.
e.g. `"a*b.c*" -> [('*', 'a'), ('1', 'b'), ('1', '.'), ('*', 'c')]`

phase 2: dp

make a 2-d array. `dp[i][j]` means whether `string[:i+1]` matches `units[:j+1]`

1. add `<begin>` before the string and the units (conceptually). `<begin>` has index of `-1`.
2. `dp[-1][-1] = True` since `match("", "") == True`
3. `dp[i][-1] = False` since `match(any-non-empty-string, "") == False`
4. `dp[-1][j] = dp[-1][j-1] and units[j][0] == '*'`, since `match("", re) == True` iff `re` is are all `*` units
5. `dp[i][j] =`
1. `units[j][0] != '*'`
in this case, the `j`-th unit shall eat a char in string, provided the prev string and units are match.
`dp[i][j] = dp[i-1][j-1] and atom_match(string[i], units[j])`
2. `units[j][0] == '*'`
1. `dp[i][j-1] == True`, and the current `*` unit matches an empty char. e.g. `"ab"` and `"abc*"`

2. `dp[i-1][j] == True`, the current `*` unit has eaten the prev char. it must be able to eat current char, too. we add `and atom_match(string[i], units[j])`.

notice always we have `dp[i-1][j-1] == dp[i-1][j]` if `units[j][0] == '*'`

6. `return dp[len(string)][len(units)]` aka `dp[-2][-2]`
``````import operator

#re unit type:
# unit = once atom | many atom
# atom = char c | any
def tokenize(re):
prev_c = None
for c in re:
if c == '*':
yield ('*', prev_c) #many atom
prev_c = None
else:
if prev_c != None:
yield ('1', prev_c) #once atom
prev_c = c
if prev_c:
yield ('1', prev_c)
def atom_match(c, unit):
return unit[1] == '.' or unit[1] == c

def unit_match_dp(string, units):
if not string and not units:
return True
if not units:
return False
if not string: #True if all units are '*'
return reduce(operator.and_, map(lambda p:p[0] == '*', units))
#has string and has units
dp = [[False for _ in range(len(units) + 1)] for _ in range(len(string) + 1)]
dp[-1][-1] = True
for j in range(len(units)):
dp[-1][j] = dp[-1][j-1] and units[j][0] == '*'
for i in range(len(string)):
for j in range(len(units)):
if units[j][0] == '*':
dp[i][j] = (dp[i-1][j] and atom_match(string[i], units[j])) or (dp[i][j-1])
else:
dp[i][j] = dp[i-1][j-1] and atom_match(string[i], units[j])
return dp[-2][-2]

def match(string, re):
units = list(tokenize(re))
return unit_match_dp(string, units)
'''
<b> a*  a   a   b   c*
<b> 1   1   0   0   0   0
a   0   1   1   0   0   0
a   0   1   0   1   0   0
b   0   0   0   0   1   0
c   0   0   0   0   0   1
c   0   0   0   0   0   1
'''

assert match('aa', 'aa')
assert match('a', 'a*a')
assert match("abcd", "d*")
assert match("aaaaaaaaaaaaab", "a*a*a*a*a*a*a*a*a*a*b")
print('ok!')
``````

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.