I've got a hole bunch of data (10,000 - 50,000 values for each series of measurements) and I'm interested in automatically identifying local maxima/minima out of the density estimation of the distribution of these values. In fact, I assume that usually there should be two peaks, separated by a pit, and I'd like to find that pit which separates the two peaks from each other in order to split the data into two parts for further processing. If possible, I'd like also to know where the peaks are located.
我有一堆数据(每个测量系列10,000到50,000个值),我有兴趣自动识别这些值分布的密度估计中的局部最大值/最小值。事实上,我认为通常应该有两个峰,由一个凹坑隔开,我想找到将两个峰彼此分开的凹坑,以便将数据分成两部分进行进一步处理。如果可能的话,我也想知道峰的位置。
As the density estimation may contain very small local changes, I'd like to have the possibility of adjusting the "sensitivity". The best I could find so far was this solution of @Tommy : https://*.com/a/6836924/1003358 Here is an example:
由于密度估计可能包含非常小的局部变化,我希望有可能调整“灵敏度”。到目前为止我能找到的最好的是@Tommy的解决方案:https://*.com/a/6836924/1003358这是一个例子:
library(ggplot2)
d <- density(faithful$eruptions, bw = "sj")
loc.max <- d$x[localMaxima(d$y)]
ggplot(faithful, aes(eruptions)) + geom_density(adjust=1/2) +
geom_vline(x=loc.max, col="red") +
xlab("Measured values")
Now, my data are much noisier:
现在,我的数据噪音很大:
d <- density(my.df$Values, bw = "sj")
loc.max <- d$x[localMaxima(d$y)]
ggplot(my.df, aes(Values)) + geom_density(adjust=1/2) +
geom_vline(x=loc.max, col="red") +
xlab("Measured values")
Trying to adjust the parameters (note that two "unwanted" peaks in the tail have been found):
试图调整参数(注意尾部有两个“不需要的”峰值):
d <- density(my.df$Values, bw="nrd", adjust=1.2)
loc.max <- d$x[localMaxima(d$y)]
ggplot(my.df, aes(Values)) + geom_density(adjust=1/2) +
geom_vline(x=loc.max, col="red") +
xlab("Measured values")
So the questions are:
所以问题是:
1) How to automatically identify real peaks within such a noisy dataset? 2) How to reliably find the pits that separate those peaks?
1)如何在这样的噪声数据集中自动识别真实峰值? 2)如何可靠地找到分离这些峰的凹坑?
1 个解决方案
#1
1
My favorite is pastecs::turnpoints
. But you're correct that you'll have to do some subjective filtering to distinguish spiky noise from true peaks. One way to do this is to require either the raw or splined data to remain above some threshold for N consecutive values.
我最喜欢的是pastecs :: turnpoints。但你是正确的,你必须做一些主观过滤,以区分尖刺噪声和真正的峰值。一种方法是要求原始数据或样条数据保持在N个连续值的某个阈值之上。
#1
1
My favorite is pastecs::turnpoints
. But you're correct that you'll have to do some subjective filtering to distinguish spiky noise from true peaks. One way to do this is to require either the raw or splined data to remain above some threshold for N consecutive values.
我最喜欢的是pastecs :: turnpoints。但你是正确的,你必须做一些主观过滤,以区分尖刺噪声和真正的峰值。一种方法是要求原始数据或样条数据保持在N个连续值的某个阈值之上。