按R中的时间间隔合并数据帧

时间:2022-07-06 02:52:56

I have two Data Frames. One is an Eye Tracking data frame with subject, condition, timestamp, xposition, and yposition. It has over 400,000 rows. Here's a toy data set for an example:

我有两个数据框架。一个是具有主题,条件,时间戳,x位置和yposition的眼动跟踪数据框。它有超过400,000行。这是一个玩具数据集的示例:

   subid condition time xpos ypos
1      1         1 1.40  195  140
2      1         1 2.50  138  147
3      1         1 3.40  140  162
4      1         1 4.10  188  150
5      1         2 1.10  131  194
6      1         2 2.10  149  111

eyedata <- data.frame(subid = rep(1:2, each = 8),
           condition = rep(rep(1:2, each = 4),2),
           time = c(1.4, 2.5, 3.4, 4.1, 
                    1.1, 2.1, 3.23, 4.44, 
                    1.33, 2.3, 3.11, 4.1,
                    .49, 1.99, 3.01, 4.2),
           xpos = round(runif(n = 16, min = 100, max = 200)),
           ypos = round(runif(n = 16, min = 100, max = 200)))

Then I have a Data Frame with subject, condition, a trial number, and a trial begin and end time. It looks like this:

然后我有一个数据框,包括主题,条件,试验编号,以及试验开始和结束时间。它看起来像这样:

   subid condition trial begin end
1      1         1     1  1.40 2.4
2      1         1     2  2.50 3.2
3      1         1     2  3.21 4.5
4      1         2     1  1.10 1.6
5      1         2     2  2.10 3.3
6      1         2     2  3.40 4.1
7      2         1     1  0.50 1.1
8      2         1     1  1.44 2.9
9      2         1     2  2.97 3.3
10     2         2     1  0.35 1.9
11     2         2     1  2.12 4.5
12     2         2     2  3.20 6.3

trials <- data.frame(subid = rep(1:2, each = 6),
                     condition = rep(rep(1:2, each = 3),2),
                     trial= c(rep(c(1,rep(2,2)),2),rep(c(rep(1,2),2),2)),
                     begin = c(1.4, 2.5, 3.21, 
                               1.10, 2.10, 3.4, .50,
                               1.44,2.97,.35,2.12,3.20),
                     end = c(2.4,3.2,4.5,1.6,
                             3.3,4.1,1.1,2.9,
                             3.3,1.9,4.5,6.3))

The number of trials in a condition are variable, and I want to add a column to my eyetracking dataframe that specifies the correct trial based upon whether the timestamp falls within the time interval. The time intervals do not overlap, but there will be many rows for the eyetracking data in between trials. In the end I'd like a dataframe like this:

条件中的试验数量是可变的,我想在我的眼睛跟踪数据框中添加一列,根据时间戳是否在时间间隔内指定正确的试验。时间间隔不重叠,但试验之间的眼球跟踪数据会有很多行。最后我想要一个像这样的数据帧:

subid condition trial time xpos ypos
    1      1        1 1.40  198  106
    1      1        2 2.50  166  139
    1      1        2 3.40  162  120
    1      1        2 4.10  113  164
    1      2        1 1.10  162  120
    1      2        2 2.10  162  120

I've seen data.table rolling joins, but would prefer a solution with dplyr or fuzzyjoin. Thanks in advance.

我见过data.table滚动连接,但更喜欢使用dplyr或fuzzyjoin的解决方案。提前致谢。

1 个解决方案

#1


2  

Here's what I tried, but I can't figure the discrepancies, so it is likely an incomplete answer. Row 12,13 of this result may be an overlap in time. Also, when using random generation functions such as runif please set.seed -- here xpos and ypos have no bearing on the result, so not an issue.

这是我尝试过的,但我无法找出差异,所以这可能是一个不完整的答案。该结果的第12,13行可能是时间上的重叠。此外,当使用像runif这样的随机生成函数时,请设置set.seed - 这里xpos和ypos对结果没有影响,所以不是问题。

eyedata  %>%
  left_join(trials, by = c("subid", "condition")) %>%
  filter( (time >= begin & time <= end)) 

#    subid condition time xpos ypos trial begin end
# 1      1         1 1.40  143  101     1  1.40 2.4
# 2      1         1 2.50  152  173     2  2.50 3.2
# 3      1         1 3.40  185  172     2  3.21 4.5
# 4      1         1 4.10  106  119     2  3.21 4.5
# 5      1         2 1.10  155  165     1  1.10 1.6
# 6      1         2 2.10  169  154     2  2.10 3.3
# 7      1         2 3.23  166  134     2  2.10 3.3
# 8      2         1 2.30  197  171     1  1.44 2.9
# 9      2         1 3.11  140  135     2  2.97 3.3
# 10     2         2 0.49  176  139     1  0.35 1.9
# 11     2         2 3.01  187  180     1  2.12 4.5
# 12     2         2 4.20  147  176     1  2.12 4.5
# 13     2         2 4.20  147  176     2  3.20 6.3          

#1


2  

Here's what I tried, but I can't figure the discrepancies, so it is likely an incomplete answer. Row 12,13 of this result may be an overlap in time. Also, when using random generation functions such as runif please set.seed -- here xpos and ypos have no bearing on the result, so not an issue.

这是我尝试过的,但我无法找出差异,所以这可能是一个不完整的答案。该结果的第12,13行可能是时间上的重叠。此外,当使用像runif这样的随机生成函数时,请设置set.seed - 这里xpos和ypos对结果没有影响,所以不是问题。

eyedata  %>%
  left_join(trials, by = c("subid", "condition")) %>%
  filter( (time >= begin & time <= end)) 

#    subid condition time xpos ypos trial begin end
# 1      1         1 1.40  143  101     1  1.40 2.4
# 2      1         1 2.50  152  173     2  2.50 3.2
# 3      1         1 3.40  185  172     2  3.21 4.5
# 4      1         1 4.10  106  119     2  3.21 4.5
# 5      1         2 1.10  155  165     1  1.10 1.6
# 6      1         2 2.10  169  154     2  2.10 3.3
# 7      1         2 3.23  166  134     2  2.10 3.3
# 8      2         1 2.30  197  171     1  1.44 2.9
# 9      2         1 3.11  140  135     2  2.97 3.3
# 10     2         2 0.49  176  139     1  0.35 1.9
# 11     2         2 3.01  187  180     1  2.12 4.5
# 12     2         2 4.20  147  176     1  2.12 4.5
# 13     2         2 4.20  147  176     2  3.20 6.3