df1中的总和值基于df2中的日期范围

时间:2022-08-25 18:59:51

I am trying to return the sum of values of one data frame between two dates in another data frame. The answers provided in Stack don't seem to work for my application. I have tried using data.table but to no avail, so here goes.

我试图在另一个数据框中的两个日期之间返回一个数据帧的值之和。 Stack中提供的答案似乎不适用于我的应用程序。我试过使用data.table但无济于事,所以这里。

Create the Date Ranges

MeanRemaining <- seq(as.Date("2017-01-01"),as.Date("2017-02-28"),2)
MeanRemaining<-as.data.frame(cbind(MeanRemaining,lag(MeanRemaining)))
colnames(MeanRemaining)<-c("InspDate", "PrevInspDate")
MeanRemaining$InspDate<-as.Date(MeanRemaining$InspDate, origin = "1970/01/01")
MeanRemaining$PrevInspDate<-as.Date(MeanRemaining$PrevInspDate, origin = "1970/01/01")

Important to note that the date ranges are not actually fixed like they are above and could be any range of possibilities up to about a week apart.

重要的是要注意,日期范围实际上并不像它们在上面那样固定,并且可以是相隔大约一周的任何可能范围。

Create the values to sum

DailyTonnes <- as.data.frame(cbind(as.data.frame(seq(as.Date
+ ("2016-12-01"),as.Date("2017-03-28"),1)),(replicate(1,sample(abs(rnorm(118))*1000,rep=TRUE)))))
colnames(DailyTonnes)<-c("date","Vol")

The Aim

I want to sum 'Vol' from 'DailyTonnes' between each of the date ranges in 'MeanRemaining' and return the total 'Vol' to the corresponding row in 'MeanRemaining'.

我想在'MeanRemaining'中的每个日期范围之间将'Daily'与'DailyTonnes'相加并将总'Vol'返回到'MeanRemaining'中的相应行。

With some assistance of similar questions I've tried

在我试过的类似问题的帮助下

library(data.table)
setDT(MeanRemaining)
setDT(DailyTonnes)

MeanRemaining[DailyTonnes[MeanRemaining, sum(Vol), on = .(date >= InspDate, date <= PrevInspDate),
            by = .EACHI], TotalVol := V1, on = .(InspDate=date)]

However this returns NA values.

但是,这会返回NA值。

Any advice would be much appreciated.

任何建议将不胜感激。

1 个解决方案

#1


0  

I believe your question had all the pieces you needed for the answer.

我相信你的问题包含了答案所需的所有部分。

I polished your code a bit and change the last line (which was the only wrong one). The join in this last line was overly complicated and I don't think it could bring any memory/performance gain.

我稍稍修改了你的代码并改变了最后一行(这是唯一错误的一行)。最后一行中的连接过于复杂,我认为它不会带来任何内存/性能提升。

library(data.table)
# Create MeanRemaining
MeanRemaining <-
  data.table(InspDate = seq(as.Date("2017-01-01"), as.Date("2017-02-28"), 2))
# I changed lag by shift, I think it is clearer this way
MeanRemaining[, PrevInspDate := shift(InspDate, type = "lead", fill = 1000000L)]

# set seed for repetibility
set.seed(13)
# Create DailyTonnes, I changed the end date to generate empty intervals
DailyTonnes <- data.table(date = seq(as.Date("2016-12-01"),
                                     as.Date("2017-01-28"), 1),
                          Vol = sample(abs(rnorm(118)) * 1000, rep = TRUE))

# I changed the <= condition to <, I think it fits PrevInspDate better
# This should be your final result if I'm not wrong
SingleCase <-
  DailyTonnes[MeanRemaining, sum(Vol), on = .(date >= InspDate, date < PrevInspDate), by = .EACHI]

# SingleCase has two variables called date, this may be a small bug in data.table
print(names(SingleCase))

# change the names of the data.table to suit your needs
names(SingleCase) <- c("InspDate", "PrevInspDate", "TotalVol")

Edit: Recover multiple variables from table MeanRemaining

The case in which you retrieve multiple variables from MeanRemaining is quite tricky. It is easily solvable for a small amount of variables:

从MeanRemaining中检索多个变量的情况非常棘手。对于少量变量,它很容易解决:

# Add variables to MeanRemaining
for (i in 1:100) {
  MeanRemaining[, paste0("extracol", i) := sample(.N)]
}

# Two variable case
smallmultiple <-
  DailyTonnes[MeanRemaining, list(TotalVol = sum(Vol),
                                  extracol1 = i.extracol1 ,
                                  extracol2 = i.extracol2), on = .(date >= InspDate, date < PrevInspDate), by = .EACHI]

# Correct date names
names(smallmultiple)[1:2] <- c("InspDate", "PrevInspDate")

When it comes to a lot of variable it becomes hard. There is this feature request in github that would solve yout problem but it is not available at the moment. This question faces a similar issue but cannot be used in your case.

当谈到很多变量时,变得很难。 github中有此功能请求可以解决您的问题,但目前无法使用。此问题面临类似问题,但不能在您的情况下使用。

The way around for a big amount of variables is:

大量变量的方法是:

# obtain names of variables to be kept in the later join
joinkeepcols <-
  setdiff(names(MeanRemaining),  c("InspDate", "PrevInspDate"))

# the "i" indicates the table to take the variables from
joinkeepcols2 <- paste0("i.", joinkeepcols)

# Prepare a expression for the data.table environment
keepcols <-
  paste(paste(joinkeepcols, joinkeepcols2, sep = " = "), collapse = ", ")

# Complete expression to be evaluated in data.table
evalexpression <- paste0("list(
                         TotalVol = sum(Vol),",
                         keepcols, ")")

# The magic comes here (you can assign it to MeanRemaining)
bigmultiple <-
  DailyTonnes[MeanRemaining, eval(parse(text = evalexpression)), on = .(date >= InspDate, date < PrevInspDate), by = .EACHI]

# Correct date names
names(bigmultiple)[1:2] <- c("InspDate", "PrevInspDate")

#1


0  

I believe your question had all the pieces you needed for the answer.

我相信你的问题包含了答案所需的所有部分。

I polished your code a bit and change the last line (which was the only wrong one). The join in this last line was overly complicated and I don't think it could bring any memory/performance gain.

我稍稍修改了你的代码并改变了最后一行(这是唯一错误的一行)。最后一行中的连接过于复杂,我认为它不会带来任何内存/性能提升。

library(data.table)
# Create MeanRemaining
MeanRemaining <-
  data.table(InspDate = seq(as.Date("2017-01-01"), as.Date("2017-02-28"), 2))
# I changed lag by shift, I think it is clearer this way
MeanRemaining[, PrevInspDate := shift(InspDate, type = "lead", fill = 1000000L)]

# set seed for repetibility
set.seed(13)
# Create DailyTonnes, I changed the end date to generate empty intervals
DailyTonnes <- data.table(date = seq(as.Date("2016-12-01"),
                                     as.Date("2017-01-28"), 1),
                          Vol = sample(abs(rnorm(118)) * 1000, rep = TRUE))

# I changed the <= condition to <, I think it fits PrevInspDate better
# This should be your final result if I'm not wrong
SingleCase <-
  DailyTonnes[MeanRemaining, sum(Vol), on = .(date >= InspDate, date < PrevInspDate), by = .EACHI]

# SingleCase has two variables called date, this may be a small bug in data.table
print(names(SingleCase))

# change the names of the data.table to suit your needs
names(SingleCase) <- c("InspDate", "PrevInspDate", "TotalVol")

Edit: Recover multiple variables from table MeanRemaining

The case in which you retrieve multiple variables from MeanRemaining is quite tricky. It is easily solvable for a small amount of variables:

从MeanRemaining中检索多个变量的情况非常棘手。对于少量变量,它很容易解决:

# Add variables to MeanRemaining
for (i in 1:100) {
  MeanRemaining[, paste0("extracol", i) := sample(.N)]
}

# Two variable case
smallmultiple <-
  DailyTonnes[MeanRemaining, list(TotalVol = sum(Vol),
                                  extracol1 = i.extracol1 ,
                                  extracol2 = i.extracol2), on = .(date >= InspDate, date < PrevInspDate), by = .EACHI]

# Correct date names
names(smallmultiple)[1:2] <- c("InspDate", "PrevInspDate")

When it comes to a lot of variable it becomes hard. There is this feature request in github that would solve yout problem but it is not available at the moment. This question faces a similar issue but cannot be used in your case.

当谈到很多变量时,变得很难。 github中有此功能请求可以解决您的问题,但目前无法使用。此问题面临类似问题,但不能在您的情况下使用。

The way around for a big amount of variables is:

大量变量的方法是:

# obtain names of variables to be kept in the later join
joinkeepcols <-
  setdiff(names(MeanRemaining),  c("InspDate", "PrevInspDate"))

# the "i" indicates the table to take the variables from
joinkeepcols2 <- paste0("i.", joinkeepcols)

# Prepare a expression for the data.table environment
keepcols <-
  paste(paste(joinkeepcols, joinkeepcols2, sep = " = "), collapse = ", ")

# Complete expression to be evaluated in data.table
evalexpression <- paste0("list(
                         TotalVol = sum(Vol),",
                         keepcols, ")")

# The magic comes here (you can assign it to MeanRemaining)
bigmultiple <-
  DailyTonnes[MeanRemaining, eval(parse(text = evalexpression)), on = .(date >= InspDate, date < PrevInspDate), by = .EACHI]

# Correct date names
names(bigmultiple)[1:2] <- c("InspDate", "PrevInspDate")