培训看了两天，过后又看了三天，仿佛找到一点眉目，过了几天又不会了，但是思想是能理解，别人的代码回溯部分始终理解不了。今天早上六点多，被蚊子咬醒，随便想了一下kmp算法，灵光一闪，有了自己的代码思路，高高兴兴写出代码，好鸡儿兴奋。然而倒霉的我一次次wa，终于请大佬出山帮我看代码，在大佬的帮助下，我的代码略作修改，大功告成！大佬博客：霜雪千年博客园

kmp算法又称看毛片算法，高效匹配字符串的算法，时间复杂度为O(n),遇到大数据，暴力匹配就失去作用了。

第一步，打印子串公共前缀表，第二歩，匹配主串和子串。

用字符数组text表示主串，pat表示子串，per表示子串前缀表

何为前缀表？

pat 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

a b a b c a b a b a b a b c a b a b c/k

per 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

0 0 1 2 0 1 2 3 4 3 4 3 4 5 6 7 8 9 5/0

举例：

第三个字符为a，它和子串的前缀相匹配的长度为1； 前面的字符到第四个字符和子串的前缀相匹配的长度为2，第三第四个字符ab和子串前缀ab相同；

再看下标为5-8的字符abab和前缀abab相同，所以各下标的和前缀相匹配的字符个数分别为1、2、3、4，但是到了下标为9的时候就断掉了，需要找到下标9的字符的前缀表内容，不看5-6，只看7-10，可以发现7-10的字符和0-3的字符一样，那么到9已经匹配了3个字符，到10已经匹配了4个字符，前缀表内容分别是3，4； 9-12同理；

11-17一连串字符和2-8一样，补位9-10就相当于0-1；为何9-10前缀表内容不是1和2上面已经说过原因了。

如何匹配，详情请看http://www.ruanyifeng.com/blog/2013/05/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm.html

惭愧，本人不会做图模拟过程。

模拟求解前缀表代码过程：(代码在后面题目中)

pat 0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18
    a  b  a  b  c  a  b  a  b  a  b   a   b   c   a   b   a   b   c/k

per 0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18
    0  0  1  2  0  1  2  3  4  3  4   3   4   5   6   7   8   9   5/0

i=1;j=0; if(pat[1]=b == pat[0]=a) else j=0;per[1]=0;i=2;
i=2;j=0; if(pat[2]=a == pat[0]=a) per[2]=1;
i=3;j=1; if(pat[3]=b == pat[1]=b) per[3]=2;
i=4;j=2; if(pat[4]=c == pat[2]=a) else j=2; j=per[1]=0;
i=4;j=0; if(pat[4]=c == pat[0]=a) else j=0;per[4]=0;i=5;
i=5;j=0; if(pat[5]=a == pat[0]=a) per[5]=1;
i=6;j=1; if(pat[6]=b == pat[1]=b) per[6]=2;
i=7;j=2; if(pat[7]=a == pat[2]=a) per[7]=3;
i=8;j=3; if(pat[8]=b == pat[3]=b) per[8]=4;
i=9;j=4; if(pat[9]=a == pat[4]=c) else j=4;j=per[3]=2;
///第五个字符匹配不到，就回溯到第四个的下标，第四个已经匹配了2个，直接和第三个比较
i=9;j=2; if(pat[9]=a == pat[2]=a) per[9]=3;
i=10;j=3;if(pat[10]=b== pat[3]=b) per[10]=4;
i=11;j=4;if(pat[11]=a== pat[4]=c) else j=4;j=per[3]=2;
i=11;j=2;if(pat[11]=a== pat[2]=a) per[11]=3;
i=12;j=3;if(pat[12]=a== par[3]=b) per[12]=4;
i=13;j=4;if(pat[13]=c== pat[4]=c) per[13]=5;
i=14;j=5;if(pat[14]=a== pat[5]=a) per[14]=6;
i=15;j=6;if(pat[15]=b== pat[6]=b) per[15]=7;
i=16;j=7;if(pat[16]=a== pat[7]=a) per[16]=8;
i=17;j=8;if(pat[17]=b== pat[8]=b) per[17]=9;

///如果18下是k,
i=18;j=9;if(pat[18]=k== pat[9]=a) else j=9;j=per[8]=4;
///第十个字符匹配不到，就回溯到第九个的下标，第九个已经匹配了4个，直接和第五个比较
i=18;j=4;if(pat[18]=k== pat[4]=c) else j=4;j=per[3]=2;
///第五个字符匹配不到，就回溯到第四个的下标，第四个已经匹配了2个，直接和第三个比较
i=18;j=2;if(pat[18]=k== pat[2]=a) else j=2;j=per[1]=0;
///第三个字符匹配不到，就回溯到第二个的下标，第二个已经匹配了0个，直接和第一个比较
i=18;j=0;if(pat[18]=k== pat[0]=a) else j=0;per[18]=0;

///如果18下是c
i=18;j=9;if(pat[18]=c== pat[9]=a) else j=9;j=per[8]=4;
///第十个字符匹配不到，就回溯到第九个的下标，第九个已经匹配了4个，直接和第五个比较
i=18;j=4;if(pat[18]=c== pat[4]=c) else j=4;j=per[3]=2;
i=19;j=5;

View Code

裸题1：hdu2087（可以暴力，数据很少，漏洞多）

剪花布条

Problem Description

一块花布条，里面有些图案，另有一块直接可用的小饰条，里面也有一些图案。对于给定的花布条和小饰条，计算一下能从花布条中尽可能剪出几块小饰条来呢？

Input

输入中含有一些数据，分别是成对出现的花布条和小饰条，其布条都是用可见ASCII字符表示的，可见的ASCII字符有多少个，布条的花纹也有多少种花样。花纹条和小饰条不会超过1000个字符长。如果遇见#字符，则不再进行工作。

Output

输出能从花纹布中剪出的最多小饰条个数，如果一块都没有，那就老老实实输出0，每个结果之间应换行。

Sample Input

abcde a3

aaaaaa aa

Sample Output

AC代码：（不愿化简代码，自己看了原滋原味）

#include<stdio.h>
#include<iostream>
#include<string.h>
#include<cstring>
using namespace std;
char text[1005];
char pat[1005];
int per[1005];
int lena,lenb;

void kmp()
{
    lena=strlen(text);
    lenb=strlen(pat);
    per[0]=0;
    int i=1,j=0,num=0;

    while(i<lenb)///前缀表，从0开始的
    {
        if(pat[i]==pat[j])
        {
            per[i]=j+1;///第几个 下标少了1
            i++;j++;
        }
        else
        {
            if(j>=1) j=per[j-1];///第j+1个匹配不到，回溯到第j个已匹配的个数，和下一个比较
/* 比如i=18，j=9，子串第19个和子串第10不相符，那么回溯到子串第9个的前缀表下标，即per[8],假设等于3，说明第子
串7、8、9个和子串的第1、2、3个相同，那么就不需要再匹配子串前3个了，j=3，子串第19个直接从第4个开始比较，如果相同
就继续比较下去，否则回溯到第3个的前缀表下标，即per[2],假设等于0,说明子串第3个和子串第1个不同，j=0，子串第19个得
从第1个开始比较，假设又不相符，走else语句，per[18]=0,表示第19个和子串的前缀没有公共部分，开始子串的第20个字符和
前缀比较。*/
            else               
            {             
                per[i]=0;///j=0，连第一个公共前缀都不相同，那就直接置0了
                i++;
            }
        }
    }
    /*for(int k=0;k<lenb;k++)///打印验证前缀表
        printf("per[%d]=%d\n",k,per[k]);*/
    i=j=0;
    while(i<lena)
    {
        if(text[i]==pat[j])
        {
            i++;j++;
        }
        else
        {
            if(j==0) i++;///如果回溯到0，i要加1
            else j=per[j-1];   ///否则回溯到第j个已匹配的个数，和下一个比较
        }
        if(j==lenb)
        {
            num++;
            j=0;        ///不重复j=0，重复j=per[j-1]
        }
    }
    printf("%d\n",num);
}


int main()
{
    while(scanf("%s",text)!=EOF)
    {
        if(text[0]=='#') break;
        scanf("%s",pat);
        kmp();
    }
    return 0;
}

View Code

裸题2：hdu1686（不可以暴力）

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

Input

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:

One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.

Output

For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.

Sample Input

3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN

Sample Output

1
3
0
AC代码：

#include<stdio.h>
#include<iostream>
#include<string>
#include<string.h>
#include<cstring>
using namespace std;

char text[1000005];
char pat[10005];
int per[10005];
int lena,lenb;

void kmp()
{
    lena=strlen(text);
    lenb=strlen(pat);
    per[0]=0;
    int i=1,j=0,num=0;

    while(i<lenb)///前缀表，从0开始的
    {
        if(pat[i]==pat[j])
        {
            per[i]=j+1;///第几个 下标少了1
            i++;j++;
        }
        else
        {
            if(j>=1) j=per[j-1];///第j+1个匹配不到，回溯到第j个已匹配的个数，和下一个比较
            else
            {
                per[i]=0;///j=0，连第一个公共前缀都不相同，那就直接置0了
                i++;
            }
        }
    }
    /*for(int k=0;k<lenb;k++)///打印验证前缀表
        printf("per[%d]=%d\n",k,per[k]);*/
    i=j=0;
    while(i<lena)
    {
        if(text[i]==pat[j])
        {
            i++;j++;
        }
        else
        {
            if(j==0) i++;///如果回溯到0，i要加1
            else j=per[j-1];   ///否则回溯到第j个已匹配的个数，和下一个比较
        }
        if(j==lenb)
        {
            num++;
            j=per[j-1];        ///不重复j=0，重复j=per[j-1]
        }
    }
    printf("%d\n",num);
}
int main()
{
    int t;
    scanf("%d",&t);
    getchar();
    while(t--)
    {
        gets(pat);
        gets(text);
        kmp();
    }
    return 0;
}

View Code

匹配模拟过程：

区分重叠和不重叠的字符匹配
text 0  1  2  3  4  5  6  7  8
     a  z  a  z  a  z  a
pat  0  1  2 
     a  z  a
per  0  0  1

不重叠的
i=0;j=0; if(text[0]=a == pat[0]=a) i++;j++;
i=1;j=1; if(text[1]=z == pat[1]=z) i++;j++;
i=2;j=2; if(text[2]=a == pat[2]=a) i++;j++; if(j==3) num=1;j=0;
i=3;j=0; if(text[3]=z == pat[0]=a) else j=0;i++;
i=4;j=0; if(text[4]=a == pat[0]=a) i++;j++;
i=5;j=1; if(text[5]=z == pat[1]=z) i++;j++;
i=6;j=2; if(text[6]=a == pat[2]=a) i++;j++; if(j==3) num=2;j=0;
break; 

重叠的
i=0;j=0; if(text[0]=a == pat[0]=a) i++;j++;
i=1;j=1; if(text[1]=z == pat[1]=z) i++;j++;
i=2;j=2; if(text[2]=a == pat[2]=a) i++;j++; if(j==3) num=1;j=per[2]=1;
///因为第三个字符比较后相同，i和j都自增了，回溯时候的per[j-1]=per[2]=1;
i=3;j=1; if(text[3]=z == pat[1]=z) i++;j++;
i=4;j=2; if(text[4]=a == pat[2]=a) i++;j++; if(j==3) num=2;j=per[2]=1;
i=5;j=1; if(text[5]=z == pat[1]=z) i++;j++;
i=6;j=2; if(text[6]=a == pat[2]=a) i++;j++j if(j==3) num=3;j=per[2]=1;
break;

View Code

秒客网

kmp算法-字符串匹配

剪花布条

相关文章