KMP字符串匹配算法

时间:2023-01-06 22:13:01

笔者介绍:姜雪伟,IT公司技术合伙人,IT高级讲师,CSDN社区专家,特邀编辑,畅销书作者,已出版书籍:《手把手教你架构3D游戏引擎》电子工业出版社和《Unity3D实战核心技术详解》电子工业出版社等。

CSDN视频网址:http://edu.csdn.net/lecturer/144

算法在各个领域都有非常广泛的应用,尤其现在比较流行的人工智能,深度学习以及搜索算法等等,当然游戏开发领域也需要算法的支撑,只是这些算法被一些库已封装好,不需要开发者重新编写,但是这不等于大家就可以不用学习算法了。本篇博客给读者介绍一种KMP模型搜索算法,学过数据结构的读者知道,搜索算法很多的,比如二分查找,二叉树搜索等等。

KMP算法是一种改进的字符串匹配算法,由D.E.Knuth,J.H.Morris和V.R.Pratt同时发现,因此人们称它为克努特——莫里斯——普拉特操作(简称KMP算法)。KMP算法的关键是利用匹配失败后的信息,尽量减少模式串与主串的匹配次数以达到快速匹配的目的。具体实现就是实现一个next()函数,函数本身包含了模式串的局部匹配信息。时间复杂度O(m+n)。网上很多关于KMP算法的解释,在这里就不多说了,直接通过案例给读者分享,通过代码的分析可以了解其算法的运行原理。


案例如下所示:

输入:

  txt[] =  "THIS IS A TEST TEXT"  pat[] = "TEST"

输出:

Pattern found at index 10

输入:

  txt[] =  "AABAACAADAABAAABAA"  pat[] = "AABA"

输出:

   Pattern found at index 0   Pattern found at index 9   Pattern found at index 13
首先给你一个txt数组,还有一个pat数组,根据pat数组内容从txt数组中找到它的匹配的索引位置。下面给读者展示用C语言实现的算法:
#include<stdio.h>#include<string.h>#include<stdlib.h> void computeLPSArray(char *pat, int M, int *lps); void KMPSearch(char *pat, char *txt){    int M = strlen(pat);    int N = strlen(txt);     // create lps[] that will hold the longest prefix suffix    // values for pattern    int *lps = (int *)malloc(sizeof(int)*M);    int j  = 0;  // index for pat[]     // Preprocess the pattern (calculate lps[] array)    computeLPSArray(pat, M, lps);     int i = 0;  // index for txt[]    while (i < N)    {      if (pat[j] == txt[i])      {        j++;        i++;      }       if (j == M)      {        printf("Found pattern at index %d \n", i-j);        j = lps[j-1];      }       // mismatch after j matches      else if (i < N && pat[j] != txt[i])      {        // Do not match lps[0..lps[j-1]] characters,        // they will match anyway        if (j != 0)         j = lps[j-1];        else         i = i+1;      }    }    free(lps); // to avoid memory leak} void computeLPSArray(char *pat, int M, int *lps){    int len = 0;  // length of the previous longest prefix suffix    int i;     lps[0] = 0; // lps[0] is always 0    i = 1;     // the loop calculates lps[i] for i = 1 to M-1    while (i < M)    {       if (pat[i] == pat[len])       {         len++;         lps[i] = len;         i++;       }       else // (pat[i] != pat[len])       {         if (len != 0)         {           // This is tricky. Consider the example            // AAACAAAA and i = 7.           len = lps[len-1];            // Also, note that we do not increment i here         }         else // if (len == 0)         {           lps[i] = 0;           i++;         }       }    }} // Driver program to test above functionint main(){   char *txt = "ABABDABACDABABCABAB";   char *pat = "ABABCABAB";   KMPSearch(pat, txt);   return 0;}

现在Python语言也是比较流行的,再用Python实现一遍代码如下所示:
# Python program for KMP Algorithmdef KMPSearch(pat, txt):    M = len(pat)    N = len(txt)     # create lps[] that will hold the longest prefix suffix     # values for pattern    lps = [0]*M    j = 0 # index for pat[]     # Preprocess the pattern (calculate lps[] array)    computeLPSArray(pat, M, lps)     i = 0 # index for txt[]    while i < N:        if pat[j] == txt[i]:            i += 1            j += 1         if j == M:            print "Found pattern at index " + str(i-j)            j = lps[j-1]         # mismatch after j matches        elif i < N and pat[j] != txt[i]:            # Do not match lps[0..lps[j-1]] characters,            # they will match anyway            if j != 0:                j = lps[j-1]            else:                i += 1 def computeLPSArray(pat, M, lps):    len = 0 # length of the previous longest prefix suffix     lps[0] # lps[0] is always 0    i = 1     # the loop calculates lps[i] for i = 1 to M-1    while i < M:        if pat[i]==pat[len]:            len += 1            lps[i] = len            i += 1        else:            # This is tricky. Consider the example.            # AAACAAAA and i = 7. The idea is similar             # to search step.            if len != 0:                len = lps[len-1]                 # Also, note that we do not increment i here            else:                lps[i] = 0                i += 1 txt = "ABABDABACDABABCABAB"pat = "ABABCABAB"KMPSearch(pat, txt)

当然Java语言也是必不可少的,Java编写的代码如下所示:
class KMP_String_Matching{    void KMPSearch(String pat, String txt)    {        int M = pat.length();        int N = txt.length();         // create lps[] that will hold the longest        // prefix suffix values for pattern        int lps[] = new int[M];        int j = 0;  // index for pat[]         // Preprocess the pattern (calculate lps[]        // array)        computeLPSArray(pat,M,lps);         int i = 0;  // index for txt[]        while (i < N)        {            if (pat.charAt(j) == txt.charAt(i))            {                j++;                i++;            }            if (j == M)            {                System.out.println("Found pattern "+                              "at index " + (i-j));                j = lps[j-1];            }             // mismatch after j matches            else if (i < N && pat.charAt(j) != txt.charAt(i))            {                // Do not match lps[0..lps[j-1]] characters,                // they will match anyway                if (j != 0)                    j = lps[j-1];                else                    i = i+1;            }        }    }     void computeLPSArray(String pat, int M, int lps[])    {        // length of the previous longest prefix suffix        int len = 0;        int i = 1;        lps[0] = 0;  // lps[0] is always 0         // the loop calculates lps[i] for i = 1 to M-1        while (i < M)        {            if (pat.charAt(i) == pat.charAt(len))            {                len++;                lps[i] = len;                i++;            }            else  // (pat[i] != pat[len])            {                // This is tricky. Consider the example.                // AAACAAAA and i = 7. The idea is similar                 // to search step.                if (len != 0)                {                    len = lps[len-1];                     // Also, note that we do not increment                    // i here                }                else  // if (len == 0)                {                    lps[i] = len;                    i++;                }            }        }    }     // Driver program to test above function    public static void main(String args[])    {        String txt = "ABABDABACDABABCABAB";        String pat = "ABABCABAB";        new KMP_String_Matching().KMPSearch(pat,txt);    }}

算法也是检验开发者处理问题的能力,在业余时间没事的时候,可以编写一下就当放松了。。。。。。。