如何在半复杂场景中正确检测局部极大值和曲线窗口?

时间:2022-09-23 22:37:26

I have a series of data and need to detect peak values in the series within a certain number of readings (window size) and excluding a certain level of background "noise." I also need to capture the starting and stopping points of the appreciable curves (ie, when it starts ticking up and then when it stops ticking down).

我有一系列数据,需要在一定数量的读数(窗口大小)内检测该系列的峰值,并排除一定水平的背景“噪音”。我还需要捕捉可感知曲线的起始点和停止点(例如,当它开始滴答地走,然后停止滴答地走)。

The data are high precision floats.

数据是高精度的浮动。

Here's a quick sketch that captures the most common scenarios that I'm up against visually: 如何在半复杂场景中正确检测局部极大值和曲线窗口?

这里有一个快速的草图,它捕捉了我视觉上最常见的场景:

One method I attempted was to pass a window of size X along the curve going backwards to detect the peaks. It started off working well, but I missed a lot of conditions initially not anticipated. Another method I started to work out was a growing window that would discover the longer duration curves. Yet another approach used a more calculus based approach that watches for some velocity / gradient aspects. None seemed to hit the sweet spot, probably due to my lack of experience in statistical analysis.

我尝试的一种方法是沿着曲线通过一个大小为X的窗口来检测峰值。它开始运行良好,但我错过了许多最初没有预料到的情况。我开始研究的另一种方法是一个增长窗口,可以发现更长的持续时间曲线。另一种方法使用了基于微积分的方法来观察速度/梯度方面。也许是因为我在统计分析方面缺乏经验,似乎没有人达到最佳状态。

Perhaps I need to use some kind of a statistical analysis package to cover my bases vs writing my own algorithm? Or would there be an efficient method for tackling this directly with SQL with some kind of local max techniques? I'm simply not sure how to approach this efficiently. Each method I try it seems that I keep missing various thresholds, detecting too many peak values or not capturing entire events (reporting a peak datapoint too early in the reading process).

也许我需要使用某种统计分析包来覆盖我的基础和编写我自己的算法?或者有没有一种有效的方法可以直接使用SQL来处理这个问题?我只是不知道如何有效地解决这个问题。我尝试的每个方法似乎都忽略了不同的阈值,检测了太多的峰值值,或者没有捕获整个事件(在读取过程中,报告峰值数据的时间过早)。

Ultimately this is implemented in Ruby and so if you could advise as to the most efficient and correct way to approach this problem with Ruby that would be appreciated, however I'm open to a language agnostic algorithmic approach as well. Or is there a certain library that would address the various issues I'm up against in this scenario of detecting the maximum peaks?

最终,这是在Ruby中实现的,因此,如果您能就用Ruby解决这个问题的最有效和正确的方式提出建议,您会很感激,但是我也愿意接受一种语言不可知的算法方法。或者,是否有某个库可以解决我在检测最大峰值的场景中遇到的各种问题?

4 个解决方案

#1


2  

my idea is simple, after get your windows of interest you will need find all the peaks in this window, you can just compare the last value with the next , after this you will have where the peaks occur and you can decide where are the best peak.

我的想法很简单,得到感兴趣的窗口后,你需要在这个窗口中找到所有的峰值,你可以将最后一个值与下一个值进行比较,然后你会得到峰值出现的地方,你可以决定最好的峰值在哪里。

I wrote one simple source in matlab to show my idea!

我在matlab中编写了一个简单的源代码来展示我的想法!

My example are in wave from audio file :-)

我的例子是wave从音频文件:

waveFile='Chick_eco.wav';

[y, fs, nbits]=wavread(waveFile);

subplot(2,2,1); plot(y); legend('Original signal');

startIndex=15000;
WindowSize=100;
endIndex=startIndex+WindowSize-1;
frame = y(startIndex:endIndex);

nframe=length(frame)

%find the peaks 

peaks = zeros(nframe,1);

k=3;

while(k <= nframe - 1)
    y1 = frame(k - 1);
    y2 = frame(k);
    y3 = frame(k + 1);
    if (y2 > 0)
    if (y2 > y1 && y2 >= y3)
        peaks(k)=frame(k);
    end
    end
    k=k+1;
end



peaks2=peaks;
peaks2(peaks2<=0)=nan;


subplot(2,2,2); plot(frame); legend('Get Window Length = 100');


subplot(2,2,3); plot(peaks); legend('Where are the PEAKS');



subplot(2,2,4); plot(frame); legend('Peaks in the Window');
hold on; plot(peaks2, '*');


for j = 1 : nframe
if (peaks(j) > 0)
     fprintf('Local=%i\n', j);
     fprintf('Value=%i\n', peaks(j));   

end
end


%Where the Local Maxima occur
[maxivalue, maxi]=max(peaks)

如何在半复杂场景中正确检测局部极大值和曲线窗口?

you can see all the peaks and where it occurs

你可以看到所有的山峰和它发生的地方。

Local=37

当地= 37

Value=3.266296e-001

值= 3.266296 e - 001

Local=51

当地= 51

Value=4.333496e-002

值= 4.333496 e - 002

Local=65

当地= 65

Value=5.049438e-001

值= 5.049438 e - 001

Local=80

当地= 80

Value=4.286804e-001

值= 4.286804 e - 001

Local=84

当地= 84

Value=3.110046e-001

值= 3.110046 e - 001

#2


2  

I'll propose a couple of different ideas. One is to use discrete wavelets, the other is to use the geographer's concept of prominence.

我将提出一些不同的观点。一个是使用离散小波,另一个是使用地理学家的突出概念。

Wavelets: Apply some sort of wavelet decomposition to your data. There are multiple choices, with Daubechies wavelets being the most widely used. You want the low frequency peaks. Zero out the high frequency wavelet elements, reconstruct your data, and look for local extrema.

小波:对数据应用某种小波分解。有多种选择,其中最广泛使用的是Daubechies小波。你想要低频峰。将高频小波元素归零,重构数据,寻找局部极值。

Prominence: Those noisy peaks and valleys are of key interest to geographers. They want to know exactly which of a mountain's multiple little peaks is tallest, the exact location of the lowest point in the valley. Find the local minima and maxima in your data set. You should have a sequence of min/max/min/max/.../min. (You might want to add an arbitrary end points that are lower than your global minimum.) Consider a min/max/min sequence. Classify each of these triples per the difference between the max and the larger of the two minima. Make a reduced sequence that replaces the smallest of these triples with the smaller of the two minima. Iterate until you get down to a single min/max/min triple. In your example, you want the next layer down, the min/max/min/max/min sequence.

突出:那些嘈杂的山峰和山谷是地理学家们最感兴趣的。他们想确切地知道一座山的几座小山峰中哪座最高,山谷中最低谷的确切位置。在你的数据集中找到局部最小值和最大值。你应该有一个最小/最大值/最小/最大值/最小/最小值的序列。(您可能希望添加一个低于全局最小值的任意端点。)考虑一个最小/最大值/最小值序列。根据两个最小值的最大值和较大值之间的差异,对这三个值进行分类。用较小的两个最小值替换最小的三元组。迭代,直到得到一个最小值/最大值/最小值三倍。在您的示例中,您希望下一层是min/max/min/max/min序列。

#3


2  

Note: I'm going to describe the algorithmic steps as if each pass were distinct. Obviously, in a specific implementation, you can combine steps where it makes sense for your application. For the purposes of my explanation, it makes the text a little more clear.

注意:我将描述算法步骤,就好像每个步骤都是不同的。显然,在特定的实现中,您可以在对应用程序有意义的地方合并步骤。为了我的解释,它使文本更加清晰。

I'm going to make some assumptions about your problem:

我将对你的问题做一些假设:

  1. The windows of interest (the signals that you are looking for) cover a fraction of the entire data space (i.e., it's not one long signal).
  2. 感兴趣的窗口(您正在寻找的信号)覆盖了整个数据空间的一小部分(例如。它不是一个长信号)。
  3. The windows have significant scope (i.e., they aren't one pixel wide on your picture).
  4. 这些窗口具有重要的作用域(例如。你的照片上没有一个像素宽。
  5. The windows have a minimum peak of interest (i.e., even if the signal exceeds the background noise, the peak must have an additional signal excess of the background).
  6. windows有一个最小的兴趣高峰值(即:,即使信号超过背景噪声,峰值也必须有一个额外的信号超过背景)。
  7. The windows will never overlap (i.e., each can be examined as a distinct sub-problem out of context of the rest of the signal).
  8. 窗口永远不会重叠。,每个问题都可以作为一个独立的子问题在信号的其余部分的上下文中进行检查)。

Given those, you can first look through your data stream for a set of windows of interest. You can do this by making a first pass through the data: moving from left to right, look for noise threshold crossing points. If the signal was below the noise floor and exceeds it on the next sample, that's a candidate starting point for a window (vice versa for the candidate end point).

有了这些,您可以首先查看您的数据流以获得一组感兴趣的窗口。您可以通过对数据进行第一次传递来实现这一点:从左到右移动,寻找噪声阈值交叉点。如果信号低于噪声层,并且在下一个样本中超过了噪声层,这就是窗口的候选起点(反之亦然)。

Now make a pass through your candidate windows: compare the scope and contents of each window with the values defined above. To use your picture as an example, the small peaks on the left of the image barely exceed the noise floor and do so for too short a time. However, the window in the center of the screen clearly has a wide time extent and a significant max value. Keep the windows that meet your minimum criteria, discard those that are trivial.

现在通过候选窗口:将每个窗口的范围和内容与上面定义的值进行比较。以您的图片为例,图像左边的小峰值几乎不会超过噪声层,并且这样做的时间太短。然而,屏幕*的窗口明显具有较宽的时间范围和较大的最大值。保持符合你的最低标准的窗口,丢弃那些无关紧要的。

Now to examine your remaining windows in detail (remember, they can be treated individually). The peak is easy to find: pass through the window and keep the local max. With respect to the leading and trailing edges of the signal, you can see n the picture that you have a window that's slightly larger than the actual point at which the signal exceeds the noise floor. In this case, you can use a finite difference approximation to calculate the first derivative of the signal. You know that the leading edge will be somewhat to the left of the window on the chart: look for a point at which the first derivative exceeds a positive noise floor of its own (the slope turns upwards sharply). Do the same for the trailing edge (which will always be to the right of the window).

现在仔细检查你剩下的窗口(记住,它们可以单独处理)。峰很容易找到:通过窗口,保持局部最大值。对于信号的前缘和后缘,你可以看到你有一个窗口比信号超过噪声层的实际点稍微大一点的图片。在这种情况下,你可以用有限差分近似来计算信号的一阶导数。你知道前缘会在图表的左边:寻找一个点,在这个点上一阶导数超过了它自己的正噪声底板(斜率急剧上升)。对尾缘(总是在窗口的右边)做同样的操作。

Result: a set of time windows, the leading and trailing edges of the signals and the peak that occured in that window.

结果:一组时间窗口、信号的前缘和后缘以及该窗口中出现的峰值。

#4


0  

It looks like the definition of a window is the range of x over which y is above the threshold. So use that to determine the size of the window. Within that, locate the largest value, thus finding the peak.

看起来窗口的定义是x的范围,y在这个范围内。用它来确定窗口的大小。在这个范围内,找到最大的值,从而找到峰值。

If that fails, then what additional criteria do you have for defining a region of interest? You may need to nail down your implicit assumptions to more than 'that looks like a peak to me'.

如果失败了,那么您有什么其他标准来定义一个感兴趣的区域呢?你可能需要明确自己的隐含假设,而不仅仅是“在我看来,这是一个峰值”。

#1


2  

my idea is simple, after get your windows of interest you will need find all the peaks in this window, you can just compare the last value with the next , after this you will have where the peaks occur and you can decide where are the best peak.

我的想法很简单,得到感兴趣的窗口后,你需要在这个窗口中找到所有的峰值,你可以将最后一个值与下一个值进行比较,然后你会得到峰值出现的地方,你可以决定最好的峰值在哪里。

I wrote one simple source in matlab to show my idea!

我在matlab中编写了一个简单的源代码来展示我的想法!

My example are in wave from audio file :-)

我的例子是wave从音频文件:

waveFile='Chick_eco.wav';

[y, fs, nbits]=wavread(waveFile);

subplot(2,2,1); plot(y); legend('Original signal');

startIndex=15000;
WindowSize=100;
endIndex=startIndex+WindowSize-1;
frame = y(startIndex:endIndex);

nframe=length(frame)

%find the peaks 

peaks = zeros(nframe,1);

k=3;

while(k <= nframe - 1)
    y1 = frame(k - 1);
    y2 = frame(k);
    y3 = frame(k + 1);
    if (y2 > 0)
    if (y2 > y1 && y2 >= y3)
        peaks(k)=frame(k);
    end
    end
    k=k+1;
end



peaks2=peaks;
peaks2(peaks2<=0)=nan;


subplot(2,2,2); plot(frame); legend('Get Window Length = 100');


subplot(2,2,3); plot(peaks); legend('Where are the PEAKS');



subplot(2,2,4); plot(frame); legend('Peaks in the Window');
hold on; plot(peaks2, '*');


for j = 1 : nframe
if (peaks(j) > 0)
     fprintf('Local=%i\n', j);
     fprintf('Value=%i\n', peaks(j));   

end
end


%Where the Local Maxima occur
[maxivalue, maxi]=max(peaks)

如何在半复杂场景中正确检测局部极大值和曲线窗口?

you can see all the peaks and where it occurs

你可以看到所有的山峰和它发生的地方。

Local=37

当地= 37

Value=3.266296e-001

值= 3.266296 e - 001

Local=51

当地= 51

Value=4.333496e-002

值= 4.333496 e - 002

Local=65

当地= 65

Value=5.049438e-001

值= 5.049438 e - 001

Local=80

当地= 80

Value=4.286804e-001

值= 4.286804 e - 001

Local=84

当地= 84

Value=3.110046e-001

值= 3.110046 e - 001

#2


2  

I'll propose a couple of different ideas. One is to use discrete wavelets, the other is to use the geographer's concept of prominence.

我将提出一些不同的观点。一个是使用离散小波,另一个是使用地理学家的突出概念。

Wavelets: Apply some sort of wavelet decomposition to your data. There are multiple choices, with Daubechies wavelets being the most widely used. You want the low frequency peaks. Zero out the high frequency wavelet elements, reconstruct your data, and look for local extrema.

小波:对数据应用某种小波分解。有多种选择,其中最广泛使用的是Daubechies小波。你想要低频峰。将高频小波元素归零,重构数据,寻找局部极值。

Prominence: Those noisy peaks and valleys are of key interest to geographers. They want to know exactly which of a mountain's multiple little peaks is tallest, the exact location of the lowest point in the valley. Find the local minima and maxima in your data set. You should have a sequence of min/max/min/max/.../min. (You might want to add an arbitrary end points that are lower than your global minimum.) Consider a min/max/min sequence. Classify each of these triples per the difference between the max and the larger of the two minima. Make a reduced sequence that replaces the smallest of these triples with the smaller of the two minima. Iterate until you get down to a single min/max/min triple. In your example, you want the next layer down, the min/max/min/max/min sequence.

突出:那些嘈杂的山峰和山谷是地理学家们最感兴趣的。他们想确切地知道一座山的几座小山峰中哪座最高,山谷中最低谷的确切位置。在你的数据集中找到局部最小值和最大值。你应该有一个最小/最大值/最小/最大值/最小/最小值的序列。(您可能希望添加一个低于全局最小值的任意端点。)考虑一个最小/最大值/最小值序列。根据两个最小值的最大值和较大值之间的差异,对这三个值进行分类。用较小的两个最小值替换最小的三元组。迭代,直到得到一个最小值/最大值/最小值三倍。在您的示例中,您希望下一层是min/max/min/max/min序列。

#3


2  

Note: I'm going to describe the algorithmic steps as if each pass were distinct. Obviously, in a specific implementation, you can combine steps where it makes sense for your application. For the purposes of my explanation, it makes the text a little more clear.

注意:我将描述算法步骤,就好像每个步骤都是不同的。显然,在特定的实现中,您可以在对应用程序有意义的地方合并步骤。为了我的解释,它使文本更加清晰。

I'm going to make some assumptions about your problem:

我将对你的问题做一些假设:

  1. The windows of interest (the signals that you are looking for) cover a fraction of the entire data space (i.e., it's not one long signal).
  2. 感兴趣的窗口(您正在寻找的信号)覆盖了整个数据空间的一小部分(例如。它不是一个长信号)。
  3. The windows have significant scope (i.e., they aren't one pixel wide on your picture).
  4. 这些窗口具有重要的作用域(例如。你的照片上没有一个像素宽。
  5. The windows have a minimum peak of interest (i.e., even if the signal exceeds the background noise, the peak must have an additional signal excess of the background).
  6. windows有一个最小的兴趣高峰值(即:,即使信号超过背景噪声,峰值也必须有一个额外的信号超过背景)。
  7. The windows will never overlap (i.e., each can be examined as a distinct sub-problem out of context of the rest of the signal).
  8. 窗口永远不会重叠。,每个问题都可以作为一个独立的子问题在信号的其余部分的上下文中进行检查)。

Given those, you can first look through your data stream for a set of windows of interest. You can do this by making a first pass through the data: moving from left to right, look for noise threshold crossing points. If the signal was below the noise floor and exceeds it on the next sample, that's a candidate starting point for a window (vice versa for the candidate end point).

有了这些,您可以首先查看您的数据流以获得一组感兴趣的窗口。您可以通过对数据进行第一次传递来实现这一点:从左到右移动,寻找噪声阈值交叉点。如果信号低于噪声层,并且在下一个样本中超过了噪声层,这就是窗口的候选起点(反之亦然)。

Now make a pass through your candidate windows: compare the scope and contents of each window with the values defined above. To use your picture as an example, the small peaks on the left of the image barely exceed the noise floor and do so for too short a time. However, the window in the center of the screen clearly has a wide time extent and a significant max value. Keep the windows that meet your minimum criteria, discard those that are trivial.

现在通过候选窗口:将每个窗口的范围和内容与上面定义的值进行比较。以您的图片为例,图像左边的小峰值几乎不会超过噪声层,并且这样做的时间太短。然而,屏幕*的窗口明显具有较宽的时间范围和较大的最大值。保持符合你的最低标准的窗口,丢弃那些无关紧要的。

Now to examine your remaining windows in detail (remember, they can be treated individually). The peak is easy to find: pass through the window and keep the local max. With respect to the leading and trailing edges of the signal, you can see n the picture that you have a window that's slightly larger than the actual point at which the signal exceeds the noise floor. In this case, you can use a finite difference approximation to calculate the first derivative of the signal. You know that the leading edge will be somewhat to the left of the window on the chart: look for a point at which the first derivative exceeds a positive noise floor of its own (the slope turns upwards sharply). Do the same for the trailing edge (which will always be to the right of the window).

现在仔细检查你剩下的窗口(记住,它们可以单独处理)。峰很容易找到:通过窗口,保持局部最大值。对于信号的前缘和后缘,你可以看到你有一个窗口比信号超过噪声层的实际点稍微大一点的图片。在这种情况下,你可以用有限差分近似来计算信号的一阶导数。你知道前缘会在图表的左边:寻找一个点,在这个点上一阶导数超过了它自己的正噪声底板(斜率急剧上升)。对尾缘(总是在窗口的右边)做同样的操作。

Result: a set of time windows, the leading and trailing edges of the signals and the peak that occured in that window.

结果:一组时间窗口、信号的前缘和后缘以及该窗口中出现的峰值。

#4


0  

It looks like the definition of a window is the range of x over which y is above the threshold. So use that to determine the size of the window. Within that, locate the largest value, thus finding the peak.

看起来窗口的定义是x的范围,y在这个范围内。用它来确定窗口的大小。在这个范围内,找到最大的值,从而找到峰值。

If that fails, then what additional criteria do you have for defining a region of interest? You may need to nail down your implicit assumptions to more than 'that looks like a peak to me'.

如果失败了,那么您有什么其他标准来定义一个感兴趣的区域呢?你可能需要明确自己的隐含假设,而不仅仅是“在我看来,这是一个峰值”。