前天wildcard matching没AC,今天接着搞,改用动态规划。题目再放一次:
'?'匹配任意字符,'*'匹配任意长度字符串
Some examples: isMatch("aa","a") → false isMatch("aa","aa") → true isMatch("aaa","aa") → false isMatch("aa", "*") → true isMatch("aa", "a*") → true isMatch("ab", "?*") → true isMatch("aab", "c*a*b") → false
表格的设计也比较直接,输入两个字符串s,p,假设长度为len_s和len_p,设置一张len_s*len_p的bool型表格PD,其中第i行第j列即PD[i][j]就代表isMatch(s[0...i-1], p[0,...,j-1])
观察一下规律不难得出
- 如果PD[i-1][j]==true或者PD[i][j-1]==true,并且s[i]=='*'或者p[j]=='*'时PD[i][j]==true;
- 如果PD[i-1][j-1]==true时,s[i]与p[j]匹配,则PD[i][j]==true。
代码就很容易搞定了。不过在讨论区看到不少人用DP,内存超出限制了,内存也比较好优化,因为从上面可以看到PD[i]只和PD[i-1]有关系,所以只需要保存上一行PD就可以了,这样空间复杂度基本是2*len_p。
还有要注意的是输入的字符串为空字符串的情况,代码如下:
class Solution { public: bool isMatch(const char *s, const char *p) { int len_s = strlen(s); int len_p = strlen(p); if (0==len_p*len_s) { if ('\0' == *s && '\0' == *p) { return true; } if ('*' == *s) { s++; return isMatch(s, p); } if ('*' == *p) { p++; return isMatch(s, p); } return false; } vector<bool> dp1(len_p, false); vector<bool> dp2(len_p, false); if ('*' == *s) { for (int j = 0; j < len_p; ++j) { dp1[j] = true; } } else if (*s == *p || '?' == *s || '?' == *p || '*' == *p) { dp1[0] = true; } vector<bool> &last = dp1; vector<bool> ¤t = dp2; for (int i = 1; i < len_s; ++i) { if ('*' == *p || (last[0] && '*'==s[i])) { current[0] = true; } for (int j = 1; j < len_p; ++j) { if (((last[j]||current[j-1]) && ('*'==p[j]||'*'==s[i])) || (last[j-1] && ('?'==s[i]||'?'==p[j]||s[i]==p[j]))) { current[j] = true; } else { current[j] = false; } } vector<bool> &tmp = last; last = current; current = tmp; } return last[len_p-1]; } };
本来是信心满满的提交代码的,没想到还是TLE了:(
这次没通用的测试用例是:s="32316个a",p="*32317个a*"
DP的时间复杂度是len_s*len_p,这个用例确实对DP很不利,但是很诡异,跑了非常久也没跑出来,虽然遍历32316行32319列不应该花这么久,尝试了一下吧vector<bool>改为bool* 大约8s就有结果了,百思不得其解,百度了一下,在cplusplus上解释:
This is a specialized version of vector, which is used for elements of type bool and optimizes for space.
- The storage is not necessarily an array of bool values, but the library implementation may optimize storage so that each value is stored in a single bit.
空间上的优化带来的是时间的损失,底层的实现没要求是连续的空间,下标索引必然耗时,向量大的时候尤为明显,用vector跑了一次耗时7000多s,相差近一千倍!
修改后的代码是:
class Solution { public: bool isMatch(const char *s, const char *p) { int len_s = strlen(s); int len_p = strlen(p); if (0==len_p*len_s) { if ('\0' == *s && '\0' == *p) { return true; } if ('*' == *s) { s++; return isMatch(s, p); } if ('*' == *p) { p++; return isMatch(s, p); } return false; } bool* last = (bool *)malloc(sizeof(bool) * len_p); bool* current = (bool *)malloc(sizeof(bool) * len_p); memset(last, false, sizeof(bool) * len_p); memset(current, false, sizeof(bool) * len_p); if ('*' == *s) { memset(last, true, sizeof(bool) * len_p); } else if (*s == *p || '?' == *s || '?' == *p || '*' == *p) { last[0] = true; } for (int i = 1; i < len_s; ++i) { memset(current, false, sizeof(bool) * len_p); if ('*' == *p || (last[0] && '*'==s[i])) { current[0] = true; } for (int j = 1; j < len_p; ++j) { if (((last[j]||current[j-1]) && ('*'==p[j]||'*'==s[i])) || (last[j-1] && ('?'==s[i]||'?'==p[j]||s[i]==p[j]))) { current[j] = true; } } bool* tmp = last; last = current; current = tmp; } return last[len_p-1]; } };
不过这样提交后还是TLE,可以考虑用贪心法做.