字符串匹配Sunday算法实现

时间:2022-12-26 15:59:31

今天稍稍度娘了一下匹配字符串的算法,主要搜索结果还是源自CSDN。其中一篇《字符串匹配算法总结 》介绍了好几种字符串匹配算法,写的非常好。我挑了其中作者也比较推崇,比较新,效能比较好,算法又比较简单的Sunday算法,稍加揣摩,然后发现贴的实现代码,居然和算法描述的从右往左比对是冲突的,因此我断定贴的代码是错的。仔细看就能发现从左向右比对,很容易会跳过一种匹配,如从a1a1a2寻找a1a2,就会匹配失败(?也许也是对的)。

下面是我做的实现,融入到我的‘类string’类的find实现当中了。对于其他网贴的实现方式略有调整。

//#define SupportLongSubString
#ifdef SupportLongSubString //用于支持较长的sub字符串
typedef unsigned short OCCType;
#else
typedef unsigned char OCCType;
#endif
size_t JinString::find(const char *sub, size_t startPos)
{
if(!data_ || data_->len()<startPos)return STRNPOS;
///[Sun 90] D.M. Sunday: A Very Fast Substring Search Algorithm. Communications of the ACM, 33, 8, 132-142 (1990)
/// http://www-igm.univ-mlv.fr/~lecroq/string/
/// http://www.iti.fh-flensburg.de/lang/algorithmen/pattern/sundayen.htm
/// http://blog.csdn.net/force_eagle/article/details/10340907
///[Sunday算法]根据网上的算法,纠错、调整、改进
//预处理和strlen一起做,原本算法这个数组是要预计算移位数,但是我不先计算长度
//所以重新约定occ[]值v含义:len-(v-1)= len-v+1 = 需移动的字节数. v=0表示无法匹配需要移动len+1字节
OCCType occ[256];
memset(occ,0,256);
int subLen=0;
unsigned char ch;
while((ch = (unsigned char)sub[subLen])){
occ[ch] = (OCCType)++subLen;
}
JAssert(subLen<(1<<(sizeof(OCCType)*8))); //断言subString长度不可越界
if(subLen+startPos>data_->len())return STRNPOS;

for(int matchAt=1;matchAt<=subLen;matchAt++)
{
if(data_->buf()[startPos + subLen - matchAt]
!= sub[subLen - matchAt])
{ //如果不匹配,按算法移动startPos重新匹配
startPos += subLen - occ[(unsigned char) (data_->buf()[startPos + subLen])] + 1;
if(subLen+startPos>data_->len())return STRNPOS;
matchAt = 0;
}
}
return startPos;
}


经过如下测试通过:

void TestJinString::unitTest()
{
JinString strs,strm,strjzy;
strs = "abcdefghijklmn";
strm = "MMMMM";
strjzy = "jzyajzybjzyajzyz";
JinString subjzy("jzyajzyz");

//JinString str("abcdefghijklmnMMMMMjzyajzybjzyajzyzMMMMMabcdefghijklmn")
JinString str;
str += strs;
str = str + strm + strjzy + strm + strs;

JAssert(str.find(subjzy) == strs.length() + strm.length() + 8);
JAssert(str.find(strs) == 0);
JAssert(str.find(strm) == strs.length());
JAssert(str.find(strs,str.find(strm)) == strs.length() + strm.length()*2 + strjzy.length());
JAssert(str.find("jzyajzybjzyc")==STRNPOS);
}