后缀数组---Milk Patterns

时间:2022-05-07 21:37:17

POJ  3261

Description

Farmer John has noticed that the quality of milk given by his cows varies from day to day. On further investigation, he discovered that although he can't predict the quality of milk from one day to the next, there are some regular patterns in the daily milk quality.

To perform a rigorous study, he has invented a complex classification scheme by which each milk sample is recorded as an integer between 0 and 1,000,000 inclusive, and has recorded data from a single cow over N (1 ≤ N ≤ 20,000) days. He wishes to find the longest pattern of samples which repeats identically at least K (2 ≤ K ≤ N) times. This may include overlapping patterns -- 1 2 3 2 3 2 3 1 repeats 2 3 2 3 twice, for example.

Help Farmer John by finding the longest repeating subsequence in the sequence of samples. It is guaranteed that at least one subsequence is repeated at least K times.

Input

Line 1: Two space-separated integers: N and K
Lines 2.. N+1: N integers, one per line, the quality of the milk on day i appears on the ith line.

Output

Line 1: One integer, the length of the longest pattern which occurs at least K times

Sample Input

8 2
1
2
3
2
3
2
3
1

Sample Output

4

题意: 给了N和K,接下来有N个数输入,N<=20000,每个数小于1000,000,求一个最长的子串,这个子串在这个串中至少出现K次,K>=2,保证至少存在一个串符合;

思路:我们可以通过二分子串的长度len来做,这时就将题目变成了是否存在重复次数至少为K次且长度不小len的字符串。首先我们可以把相邻的所有不小于len的height[]看成一组,这组内有多少个字符串,就相当于有多少个长度至少为len的重复的子串。之所以可以这么做,是因为排名第i的字符串和排名第j的字符串的最长公共前缀等于height[i],height[i+1],...,height[j]中的最小值,所以把所有不小于len的height[]看成一组就保证了组内任意两个字符串的最长公共前缀都至少为k,且长度为k的前缀是每个字符串共有的,因此这组内有多少个字符串,就相当于有多少个长度至少为k的重复的子串(任意一个子串都是某个后缀的前缀);
#include <iostream>
#include <algorithm>
#include <cstdio>
#include <cstring>
#include <map>
#define rep(i,n) for(int i = 0;i < n; i++)
using namespace std;
const int size=,INF=<<;
int rk[size],sa[size],height[size],w[size],wa[size],res[size];
int N,K;
void getSa (int len,int up) {
int *k = rk,*id = height,*r = res, *cnt = wa;
rep(i,up) cnt[i] = ;
rep(i,len) cnt[k[i] = w[i]]++;
rep(i,up) cnt[i+] += cnt[i];
for(int i = len - ; i >= ; i--) {
sa[--cnt[k[i]]] = i;
}
int d = ,p = ;
while(p < len){
for(int i = len - d; i < len; i++) id[p++] = i;
rep(i,len) if(sa[i] >= d) id[p++] = sa[i] - d;
rep(i,len) r[i] = k[id[i]];
rep(i,up) cnt[i] = ;
rep(i,len) cnt[r[i]]++;
rep(i,up) cnt[i+] += cnt[i];
for(int i = len - ; i >= ; i--) {
sa[--cnt[r[i]]] = id[i];
}
swap(k,r);
p = ;
k[sa[]] = p++;
rep(i,len-) {
if(sa[i]+d < len && sa[i+]+d <len &&r[sa[i]] == r[sa[i+]]&& r[sa[i]+d] == r[sa[i+]+d])
k[sa[i+]] = p - ;
else k[sa[i+]] = p++;
}
if(p >= len) return ;
d *= ,up = p, p = ;
}
} void getHeight(int len) {
rep(i,len) rk[sa[i]] = i;
height[] = ;
for(int i = ,p = ; i < len - ; i++) {
int j = sa[rk[i]-];
while(i+p < len&& j+p < len&& w[i+p] == w[j+p]) {
p++;
}
height[rk[i]] = p;
p = max(,p - );
}
} int getSuffix(int s[]) {
int len =N,up = ;
for(int i = ; i < len; i++) {
w[i] = s[i];
up = max(up,w[i]);
}
w[len++] = ;
getSa(len,up+);
getHeight(len);
return len;
}
void solve()///二分;
{
int i,j,k,cnt,ans,mid,min,max;
min=,max=N;
for(;;)
{
mid = (max + min) / ;
if(mid==min)
break;
ans=cnt=;
for(i=;i<=N;i++)
{///计算连续的height[];
if(height[i]<mid)
{
if(cnt>ans)
ans=cnt;
cnt=;
}
else
{
if(!cnt)
cnt=;
else
++cnt;
}
}
if(cnt > ans)
ans = cnt;
if(ans >= K)
min = mid;
else
max = mid;
}
printf("%d\n", mid);
}
map<int,int>q;
int main()
{
int s[size],a[size];
while(scanf("%d%d",&N,&K)!=EOF)
{
for(int i=;i<N;i++)
{
scanf("%d",&s[i]);
a[i]=s[i];
}
sort(a,a+N);
int pre=,tot=;///离散化处理;
for(int i=;i<N;i++)
{
if(a[i]==pre)
{
a[i]=tot;
q[pre]=tot;
}
else
{
pre=a[i];
a[i]=++tot;
q[pre]=tot;
}
}
for(int i=;i<N;i++)
{
s[i]=q[s[i]];
}
getSuffix(s);
solve();
}
}