使用一个data.frame中的数据为R中另一个data.frame中的新列生成值

时间:2021-10-13 12:43:05

I have two dataframes, one which contains a timestamp and air_temperature

我有两个数据帧,一个包含时间戳和air_temperature

air_temp  time_stamp
85.1      1396335600
85.4      1396335860

And another, which contains startTime, endTime, location coordinates, and a canonical name.

另一个包含startTime,endTime,位置坐标和规范名称。

startTime    endTime       location.lat    location.lon    name
1396334278   1396374621    37.77638        -122.4176       Work
1396375256   1396376369    37.78391        -122.4054       Work

For each row in the first data frame, I want to identify which time range in the second data frame it lies in, i.e if the timestamp 1396335600, is between the startTime 1396334278, and endTime 1396374621, add the location and name value to the row in the first data.frame.

对于第一个数据帧中的每一行,我想确定它所在的第二个数据帧中的哪个时间范围,即时间戳1396335600是否在startTime 1396334278和endTime 1396374621之间,将位置和名称值添加到该行在第一个data.frame中。

The start and end time in the second data frame don't overlap, and are linearly increasing. However they are not perfectly continuous, so if the timestamp falls between two time bands, I need to mark the location as NA. If it does fit between the start and end times, I want to add the location.lat, location.lon, and name columns to the first data frame.

第二数据帧中的开始和结束时间不重叠,并且线性增加。但是它们不是完全连续的,所以如果时间戳落在两个时间段之间,我需要将该位置标记为NA。如果它确实适合开始和结束时间,我想将location.lat,location.lon和name列添加到第一个数据框。

Appreciate your help.

感谢你的帮助。

3 个解决方案

#1


0  

Interesting problem... Turned out to be more complicated than I originally thought!! Step1: Set up the data!

有趣的问题......原来比我原先想象的要复杂得多!!第1步:设置数据!

DF1 <- read.table(text="air_temp  time_stamp
85.1      1396335600
85.4      1396335860",header=TRUE)

DF2 <- read.table(text="startTime    endTime       location.lat    location.lon    name
1396334278   1396374621    37.77638        -122.4176       Work
1396375256   1396376369    37.78391        -122.4054       Work",header=TRUE)

Step2: For each time_stamp in DF1 compute appropriate index in DF2:

步骤2:对于DF1中的每个time_stamp,计算DF2中的适当索引:

index <- sapply(DF1$time_stamp, 
       function(i) {
         dec <- which(i >= DF2$startTime & i <= DF2$endTime)
         ifelse(length(dec) == 0, NA, dec)
         }
       )
index

Step3: Merge the two data frames:

第3步:合并两个数据框:

DF1 <- cbind(DF1,DF2[index,3:5])
row.names(DF1) <- 1:nrow(DF1)
DF1

Hope this helps!!

希望这可以帮助!!

#2


1  

Try this. Not tested.

尝试这个。未经测试。

newdata <- data2[data1$timestamp>=data2$startTime & data1$timestamp<=data2$endTime  ,3:5]
data1 <- cbind(data1[data1$timestamp>=data2$startTime & data1$timestamp<=data2$endTime,],newdata)

This won't return any values if timestamp isn't between startTime and endTime, so in theory your returned dataset could be shorter than the original. Just in case I treated data1 with the same TRUE FALSE vector as data2 so they will be the same length.

如果timestamp不在startTime和endTime之间,则不会返回任何值,因此理论上您返回的数据集可能比原始数据集短。以防万一我使用与data2相同的TRUE FALSE向量处理data1,因此它们的长度相同。

#3


0  

rowidx <- sapply(dfrm1$time_stamp, function(x) which( dfrm2$startTime <= x & dfrm2$endTime >= x) 
cbind(dfrm1$time_stamp. dfrm2[ rwoidx, c("location.lat","location.lon","name")]

Mine's not test either and looks substantially similar to CCurtis, so give him the check if it works.

我也没有测试,看起来与CCurtis基本相似,所以给他检查它是否有效。

#1


0  

Interesting problem... Turned out to be more complicated than I originally thought!! Step1: Set up the data!

有趣的问题......原来比我原先想象的要复杂得多!!第1步:设置数据!

DF1 <- read.table(text="air_temp  time_stamp
85.1      1396335600
85.4      1396335860",header=TRUE)

DF2 <- read.table(text="startTime    endTime       location.lat    location.lon    name
1396334278   1396374621    37.77638        -122.4176       Work
1396375256   1396376369    37.78391        -122.4054       Work",header=TRUE)

Step2: For each time_stamp in DF1 compute appropriate index in DF2:

步骤2:对于DF1中的每个time_stamp,计算DF2中的适当索引:

index <- sapply(DF1$time_stamp, 
       function(i) {
         dec <- which(i >= DF2$startTime & i <= DF2$endTime)
         ifelse(length(dec) == 0, NA, dec)
         }
       )
index

Step3: Merge the two data frames:

第3步:合并两个数据框:

DF1 <- cbind(DF1,DF2[index,3:5])
row.names(DF1) <- 1:nrow(DF1)
DF1

Hope this helps!!

希望这可以帮助!!

#2


1  

Try this. Not tested.

尝试这个。未经测试。

newdata <- data2[data1$timestamp>=data2$startTime & data1$timestamp<=data2$endTime  ,3:5]
data1 <- cbind(data1[data1$timestamp>=data2$startTime & data1$timestamp<=data2$endTime,],newdata)

This won't return any values if timestamp isn't between startTime and endTime, so in theory your returned dataset could be shorter than the original. Just in case I treated data1 with the same TRUE FALSE vector as data2 so they will be the same length.

如果timestamp不在startTime和endTime之间,则不会返回任何值,因此理论上您返回的数据集可能比原始数据集短。以防万一我使用与data2相同的TRUE FALSE向量处理data1,因此它们的长度相同。

#3


0  

rowidx <- sapply(dfrm1$time_stamp, function(x) which( dfrm2$startTime <= x & dfrm2$endTime >= x) 
cbind(dfrm1$time_stamp. dfrm2[ rwoidx, c("location.lat","location.lon","name")]

Mine's not test either and looks substantially similar to CCurtis, so give him the check if it works.

我也没有测试,看起来与CCurtis基本相似,所以给他检查它是否有效。