POJ 3461 Oulipo（乌力波）

POJ 3461 Oulipo（乌力波）

Time Limit: 1000MS Memory Limit: 65536K

【Description】

【题目描述】

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

法国作家乔治·佩雷克(1936–1982)曾经不用字母'e'写了《消失》这本书。他是乌力波组织的一员。书中写道：

佩雷克将在随后的竞赛中取得或高或低的成绩。人们被要求写出关于某个主题的意味深长的文章，并尽可能少使用给定的“关键字”。我们的任务就是为评委会提供关键字统计程序，以此来决定选手的名次。这些人的文章通常又臭又长，还不用空格；比如500,000个连续的'T'。

因此我们想快速查询给定字符串在文章中的出现的次数。进一步说：给定字母表{'A', 'B', 'C', …, 'Z'}和由其中字母组成的两个有限串，一个关键字W和一段文章T，统计W在T中出现的次数。W中的连续字符必须完全匹配T中的连续字符。匹配的字符可能会发生重叠。

【Input】

【输入】

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:

One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.

输入文件的第一行是一个数字表示随后测试用例的数量。每个测试用例格式如下：

一行是关键字W,由{'A', 'B', 'C', …, 'Z'}中的元素组成的字符串，并且1 ≤ |W| ≤ 10,000（此处|W|表示字符串W的长度）。
一行文章T由{'A', 'B', 'C', …, 'Z'}中的元素组成的字符串，并且|W| ≤ |T| ≤ 1,000,000。

【Output】	【输出】
For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.	对于每个测试用例输出一个数字一行，表示关键字W在文章T中出现的次数。

【Sample Input - 输入样例】

【Sample Output - 输出样例】

BAPC

AZA

AZAZAZA

VERDI

AVERDXIVYERDIAN

【题解】

KMP的入门题，关于KMP主要思想是在比对的时候利用已经比对过的元素减少额外的比对次数。大概就是用next记录前缀后缀最长公共元素的长度来实现下标的跳转，减少比较次数。（网上的解释更加详细，此处不再赘述）

需要注意的应该就是处理好字符串的边界了。

【代码 C++】

 #include<cstdio>

 #include<cstring>

 #define mx 1000005

 char w[mx], t[mx];

 int next[mx], wED, tED;

 void rdy(){

     int i = , j;

     next[] = j = -;

     while (i <= wED){

         if (j == - || w[i] == w[j]) next[++i] = ++j;

         else j = next[j];

     }

     w[wED] = '#';

 }

 int count(){

     int opt = , iw = , it = ;

     while (it < tED){

         while (w[iw] == t[it] || iw == -) ++iw, ++it;

         if (iw == wED) ++opt;

         iw = next[iw];

     }

     return opt;

 }

 int main(){

     int n;

     scanf("%d", &n); getchar();

     while (n--){

         gets(w); gets(t);

         wED = strlen(w); tED = strlen(t);

         rdy();

         printf("%d\n", count());

     }

     return ;

 }

感谢飘过的小牛巨的代码提供的代码简略思路

秒客网

POJ 3461 Oulipo（乌力波）

相关文章