# *Java* HashSet + backtracking (2ms beats 100%)

• The idea might not be so different, but I tried to optimize the solution as much as I could. In concrete:

• Instead of using HashMap, I use a String array to store the character --> String mapping
• Instead of trying all combinations, I only try necessary/possible ones.

I'd like to explain the second point a little bit more: Suppose we are successful in mapping the first i characters in pattern, and we are now at the j location of str. If i+1's character in pattern is not already mapped, then we would want to try all possible substrings in str, that is, the substring could be of length 1 (j's character), or length 2 (j and j+1), etc. Normally we would try up to the end of str, but this is really not necessary, because we have to spare enough space future characters in pattern. If this is not clear enough, let's take the following as an example:

pattern = "abca"
str = "xxxyyzzxxx"

Suppose we have successfully matched a to xxx and b to yy, and now we are at the third character of pattern, i.e., character c. We have not mapped c to anything, so we could try any of the following:

z
zz
zzx
zzxx
zzxxx

But do we really need to try them all? The answer is NO. Because we have already mapped a and b, and under this constraint, we have to save enough space for the fourth character of pattern, i.e., a. In other words, we can at most try z and zz, otherwise we are doomed to return false when we reach the fourth character a. This is what endPoint is about in the code.

Code in Java:

public boolean wordPatternMatch(String pattern, String str) {
String[] map = new String[26]; // mapping of characters 'a' - 'z'
HashSet<String> set = new HashSet<>(); // mapped result of 'a' - 'z'
return wordPatternMatch(pattern, str, map, set, 0, str.length()-1, 0, pattern.length()-1);
}
private boolean wordPatternMatch(String pattern, String str, String[] map, HashSet<String> set, int start, int end, int startP, int endP) {
if(startP==endP+1 && start==end+1) return true; // both pattern and str are exhausted
if((startP>endP && start<=end) || (startP<endP && start>end)) return false; // either of pattern or str is exhausted

char ch = pattern.charAt(startP);
String matched = map[ch-'a'];
if(matched!=null) { // ch is already mapped, then continue
int count = matched.length();
return start+count<=end+1 && matched.equals(str.substring(start, start+count)) // substring equals previously mapped string
&& wordPatternMatch(pattern, str, map, set, start+matched.length(), end, startP+1, endP); // moving forward
}
else {
int endPoint = end;
for(int i=endP; i>startP; i--) {
endPoint -= map[pattern.charAt(i)-'a']==null ? 1 : map[pattern.charAt(i)-'a'].length();
}
for(int i=start; i<=endPoint; i++) { // try every possible mapping, from 1 character to the end
matched = str.substring(start, i+1);
if(set.contains(matched)) continue; // different pattern cannot map to same substring

map[ch-'a'] = matched; // move forward, add corresponding mapping and set content

if(wordPatternMatch(pattern, str, map, set, i+1, end, startP+1, endP)) return true;

else { // backtracking, remove corresponding mapping and set content
map[ch-'a'] = null;
set.remove(matched);
}
}
}
return false; // exhausted
}

If you are interested in my other posts, please feel free to check my Github page here: https://github.com/F-L-A-G/Algorithms-in-Java

• the second optimization point is really cool, thanks! upvoted!
my naive dfs is pretty fast comparing to others' though, but far slower without this optimization...

• Neat! Very efficient solution! upvote!

• Beyond wonderful!! Thanks a lot!

• thank you so much for sharing this nice solution!

• The 2nd optimization is really brilliant! Thanks for sharing!

• @ElementNotFoundException Awesome optimization ideas! I was thinking about the second optimization, but couldn't find an easy way to code it up efficiently, and you did it in two lines, thanks! Same algorithm in C++:

class Solution {
public:
bool wordPatternMatch(string pattern, string str) {
unordered_set<string> unique;
vector<string> tbl(26,"");
return btrk(pattern,str,tbl,unique,0,0);
}
bool btrk(string& pattern, string& str, vector<string>& tbl, unordered_set<string>& unique, int si, int ti) {
if (ti==pattern.size() && si==str.size()) return true;
if (ti==pattern.size() || si==str.size()) return false;

if (tbl[pattern[ti]-'a']!="") { // current pattern char is known
string cp=tbl[pattern[ti]-'a'];

// known pattern char must find match in str, or there's no match for this tbl
if (cp!=str.substr(si,cp.length())) return false;

// if the known pattern char matched, we keep checking the next one
return btrk(pattern,str,tbl,unique,si+cp.length(),ti+1);
} else { // unknown pattern char

// go down the pattern string, and we know the NEXT unknown pattern char
// cannot take up the entire rest of str, so we remove length of all known
// pattern chars from pattern string, so we don't go unecessarily far down str
int end=str.size();
for (int i=ti; i<pattern.size(); i++) end-=tbl[pattern[i]-'a'].length();

for (int i=si; i<end; i++) {
string newkey=str.substr(si,i-si+1);
if (unique.find(newkey)==unique.end()) {
unique.insert(newkey);
tbl[pattern[ti]-'a']=str.substr(si,i-si+1);
if (btrk(pattern,str,tbl,unique,i+1,ti+1)) return true;
tbl[pattern[ti]-'a']="";
unique.erase(newkey);
}
}
}
return false;
}
};

• Wow anybody who can come up with the second optimization point here in a real interview would be a GOD lol.

Looks like your connection to LeetCode Discuss was lost, please wait while we try to reconnect.