计算当天开始和最小值之间包含的最大值

时间:2021-11-06 07:38:20

I need to calculate the max value contained between the beginning of the day and the moment when the min value happened. This is a toy example of my dataset for one day and one dendro:

我需要计算一天开始和最小值发生时的最大值。这是我的数据集一天和一个dendro的玩具示例:

             TIMESTAMP year DOY ring dendro diameter
1  2013-05-02 00:00:00 2013 122    1      1     3405
2  2013-05-02 00:15:00 2013 122    1      1     3317
3  2013-05-02 00:30:00 2013 122    1      1     3217
4  2013-05-02 00:45:00 2013 122    1      1     3026
5  2013-05-02 01:00:00 2013 122    1      1     4438
6  2013-05-03 00:00:00 2013 123    1      1     3444
7  2013-05-03 00:15:00 2013 123    1      1     3410
8  2013-05-03 00:30:30 2013 123    1      1     3168
9  2013-05-03 00:45:00 2013 123    1      1     3373
10 2013-05-02 00:00:00 2013 122    2      4     5590
11 2013-05-02 00:15:00 2013 122    2      4     5602
12 2013-05-02 00:30:00 2013 122    2      4     5515
13 2013-05-02 00:45:00 2013 122    2      4     4509
14 2013-05-02 01:00:00 2013 122    2      4     5566
15 2013-05-02 01:15:00 2013 122    2      4     6529

First, I calculated the MIN diameter for each day (DOY= day of the year) in each dendro (contained in one ring), also getting the time at what that min value happened:

首先,我计算了每个dendro(包含在一个环中)中每天的MIN直径(DOY =一年中的某一天),还得到了最小值发生的时间:

library(plyr)
dailymin <- ddply(datamelt, .(year, DOY, ring, dendro),function(x)x[which.min(x$diameter), ])

Now, my problem is that I want to calculate the MAX diameter for each day. However, sometimes de max value occurs after the min value. I am only interested in the max value contained BEFORE the min value. I am not interested in the total max value if it happened after the min. Therefore, I need the max value contained (for each DAY) WITHIN THE TIME INTERVAL FROM THE BEGINNING OF THE DAY (00:00:00) TO THE THE MIN DIAMETER. Like I did with the min, I also need to know at what time that max value happened. This is what I want from the previous df:

现在,我的问题是我想计算每天的最大直径。但是,有时de max值出现在min值之后。我只对最小值之前包含的最大值感兴趣。如果在最小值之后发生,我对总最大值不感兴趣。因此,我需要在(从00:00:00开始)到最小直径的时间间隔内(每天)包含的最大值。就像我用min做的那样,我也需要知道最大值发生的时间。这是我想要的前一个df:

  year DOY ring dendro             timeMin  min             timeMax  max
1 2013 122    1      1 2013-05-02 00:45:00 3026 2013-05-02 00:00:00 3405
2 2013 123    1      1 2013-05-03 00:30:00 3168 2013-05-03 00:00:00 3444
3 2013 122    2      4 2013-05-02 00:45:00 4509 2013-05-02 00:00:15 5602

As you can see, the min value is the actual min value. However, the max value I want is not the max value of the day, it is the max value that happened between the beginning of the day and the min value. My first attempt, unsuccessful, returns the max value of the day, even in it is out of the desired time interval:

如您所见,最小值是实际的最小值。但是,我想要的最大值不是当天的最大值,它是在一天的开始和最小值之间发生的最大值。我的第一次尝试,不成功,返回当天的最大值,即使它超出了所需的时间间隔:

    dailymax <- ddply(datamelt, .(year, DOY, ring, dendro),
function(x)x[which.max(x$diameter[1:which.min(datamelt$diameter)]), ]) 

Any ideas?

有任何想法吗?

1 个解决方案

#1


1  

In a data.table, you could write:

在data.table中,您可以编写:

DT[,{
  istar <- which.min(diameter)
  list(
    dmin=diameter[istar],
    prevmax=max(diameter[1:istar])
)},by='year,DOY,ring,dendro']

#    year DOY ring dendro dmin prevmax
# 1: 2013 242    6      8  470   477.2

I assume that a similar function can be written with your **ply

我假设你的**层可以写出类似的功能

EDIT1: where DT comes from...

EDIT1:DT来自......

require(data.table)
DT <- data.table(header=TRUE, text='
date TIMESTAMP year DOY ring dendro diameter
1928419 2013-08-30 00:00:00 2013 242    6      8    471.5
1928420 2013-08-30 01:30:00 2013 242    6      8    477.2
1928421 2013-08-30 03:00:00 2013 242    6      8    474.7
1928422 2013-08-30 04:30:00 2013 242    6      8    470.0
1928423 2013-08-30 06:00:00 2013 242    6      8    475.6
1928424 2013-08-30 08:30:00 2013 242    6      8    478.7')

Your "TIMESTAMP" has a space in it, so I'm reading it as two columns, with the first called "date". Paste them together if you like. Next time, you can look into making a "reproducible example", as described here: How to make a great R reproducible example?

你的“TIMESTAMP”中有一个空格,所以我把它读成两列,第一列称为“日期”。如果你愿意,将它们粘贴在一起下一次,你可以看看制作一个“可重复的例子”,如下所述:如何制作一个很好的R可重复的例子?

EDIT2: For the time of the max and min:

EDIT2:最大和最小时间:

DT[,{
  istar   <- which.min(diameter)
  istar2  <- which.max(diameter[1:istar])
  list(
    dmin     = diameter[istar],
    tmin     = TIMESTAMP[istar],
    dmax     = diameter[istar2],
    tmax     = TIMESTAMP[istar2]
)},by='year,DOY,ring,dendro']

#    year DOY ring dendro dmin     tmin  dmax     tmax
# 1: 2013 242    6      8  470 04:30:00 477.2 01:30:00

As mentioned in EDIT1, I don't have both pieces of your TIMESTAMP variable in a single column because you did not provide them that way. To add more columns, just add new expressions in the list() above. The idea behind the code is that the {} expression is a code block where you can work with the variables in the chunk of data associated with each year,DOY,ring,dendro combination and return a list of new columns.

正如EDIT1中所提到的,我没有将TIMESTAMP变量的两个部分放在一个列中,因为您没有这样提供它们。要添加更多列,只需在上面的列表()中添加新表达式。代码背后的想法是{}表达式是一个代码块,您可以在其中处理与每年相关的数据块中的变量,DOY,ring,dendro组合并返回新列的列表。

#1


1  

In a data.table, you could write:

在data.table中,您可以编写:

DT[,{
  istar <- which.min(diameter)
  list(
    dmin=diameter[istar],
    prevmax=max(diameter[1:istar])
)},by='year,DOY,ring,dendro']

#    year DOY ring dendro dmin prevmax
# 1: 2013 242    6      8  470   477.2

I assume that a similar function can be written with your **ply

我假设你的**层可以写出类似的功能

EDIT1: where DT comes from...

EDIT1:DT来自......

require(data.table)
DT <- data.table(header=TRUE, text='
date TIMESTAMP year DOY ring dendro diameter
1928419 2013-08-30 00:00:00 2013 242    6      8    471.5
1928420 2013-08-30 01:30:00 2013 242    6      8    477.2
1928421 2013-08-30 03:00:00 2013 242    6      8    474.7
1928422 2013-08-30 04:30:00 2013 242    6      8    470.0
1928423 2013-08-30 06:00:00 2013 242    6      8    475.6
1928424 2013-08-30 08:30:00 2013 242    6      8    478.7')

Your "TIMESTAMP" has a space in it, so I'm reading it as two columns, with the first called "date". Paste them together if you like. Next time, you can look into making a "reproducible example", as described here: How to make a great R reproducible example?

你的“TIMESTAMP”中有一个空格,所以我把它读成两列,第一列称为“日期”。如果你愿意,将它们粘贴在一起下一次,你可以看看制作一个“可重复的例子”,如下所述:如何制作一个很好的R可重复的例子?

EDIT2: For the time of the max and min:

EDIT2:最大和最小时间:

DT[,{
  istar   <- which.min(diameter)
  istar2  <- which.max(diameter[1:istar])
  list(
    dmin     = diameter[istar],
    tmin     = TIMESTAMP[istar],
    dmax     = diameter[istar2],
    tmax     = TIMESTAMP[istar2]
)},by='year,DOY,ring,dendro']

#    year DOY ring dendro dmin     tmin  dmax     tmax
# 1: 2013 242    6      8  470 04:30:00 477.2 01:30:00

As mentioned in EDIT1, I don't have both pieces of your TIMESTAMP variable in a single column because you did not provide them that way. To add more columns, just add new expressions in the list() above. The idea behind the code is that the {} expression is a code block where you can work with the variables in the chunk of data associated with each year,DOY,ring,dendro combination and return a list of new columns.

正如EDIT1中所提到的,我没有将TIMESTAMP变量的两个部分放在一个列中,因为您没有这样提供它们。要添加更多列,只需在上面的列表()中添加新表达式。代码背后的想法是{}表达式是一个代码块,您可以在其中处理与每年相关的数据块中的变量,DOY,ring,dendro组合并返回新列的列表。