如何比较两个连续的行与R中的参考值？

I have a data frame of vehicle trajectories. Here's a snapshot:

我有车辆轨迹的数据框。这是一个快照:

> head(df)
  vehicle frame globalx class velocity lane
1       2    43 6451214     2    37.76    2
2       2    44 6451217     2    37.90    2
3       2    45 6451220     2    38.05    2
4       2    46 6451223     2    38.18    2
5       2    47 6451225     2    38.32    2
6       2    48 6451228     2    38.44    2

where, vehicle= vehicle id (repeats because same vehicle is observed in several time frames), frame= frame id of time frames in which it was observed, globalx = x coordinate of the front center of the vehicle, class=type of vehicle (1=motorcycle, 2=car, 3=truck), velocity=speed of vehicles in feet per second, lane= lane number (there are 6 lanes). I think following illustration will better explain the problem: 如何比较两个连续的行与R中的参考值？ The 'frame' represents one tenth of a second i.e. one frame is 0.1 seconds long. At frame 't' the vehicle has globalx coordinate x(t) and at frame 't-1' (0.1 seconds before) it was x(t-1). The reference location is 'U'(globalx=6451179.1116) and I simply want a new column in df called 'u' which has 'yes' in the row where globalx of the vehicle was greater than reference coordinate at 'U' AND the previous consecutive globalx coordinate of this vehicle was less than reference coordinate at 'U'. This means that if df has 100 vehicles then there will be 100 'yes' in 'u' column because every vehicle will meet the above criteria once. I have tried to do this by running the function with ifelse and also tried to do the same using a for loop but it doesn't work for me. The output should have one new column:

其中,vehicle = vehicle id(重复,因为在几个时间帧内观察到相同的车辆),frame =观察到它的时间帧的帧id,globalx =车辆前方中心的x坐标,class =车辆类型( 1 =摩托车,2 =汽车,3 =卡车),速度=车辆速度,单位为英尺/秒,车道=车道编号(有6个车道)。我认为下面的插图将更好地解释问题:'框架'代表十分之一秒,即一帧长0.1秒。在帧't'处车辆具有全局坐标x(t)并且在帧't-1'(之前0.1秒)处具有x(t-1)。参考位置是'U'(globalx = 6451179.1116),我只想要一个名为'u'的新列,其中车辆的globalx大于'U'和前一个参考坐标的行中的'是'该车辆的连续globalx坐标小于'U'处的参考坐标。这意味着如果df有100辆车,那么'u'栏中将有100'是',因为每辆车都符合上述标准一次。我试图通过使用ifelse运行该函数来尝试这样做,并尝试使用for循环执行相同的操作,但它对我不起作用。输出应该有一个新列:

vehicle frame globalx class velocity lane u

I tried using ifelse inside for loop and a function but it doesn't work for me.

我尝试在内部使用ifelse for循环和一个函数,但它对我不起作用。

2 个解决方案

#1

I assume the data frame is sorted primarily for vehicle and secondarily for globalx. If it's not you can do it by:

我假设数据框主要是为车辆排序,其次是globalx。如果不是,你可以通过以下方式实现:

idx <- with(df,order(vehicle,globalx))
df <- df[idx,]

Now, you can perform it with the following vectorized operations:

现在,您可以使用以下向量化操作执行它:

# example reference line
U <- 6451220
# adding the extra column
samecar <- duplicated(df[,"vehicle"])
passU <- c(FALSE,diff(sign(df[,"globalx"]-U+1e-10))>0)
df[,"u"] <- ifelse(samecar & passU,"yes","no")

#2

Here is my solution:

这是我的解决方案:

First create dummy data, based on your provided data (I have saved it to data.txt on my desktop), duplicate the data so that there are two cars with the same identical data, but different vehicle id's:

首先根据您提供的数据创建虚拟数据(我已将其保存到桌面上的data.txt),复制数据,以便有两辆车具有相同的相同数据,但车辆ID不同:

library(plyr)
df <- read.table("~/Desktop/data.txt",header=T)
df.B <- df; df.B$vehicle = 3 #For demonstration
df <- rbind(df,df.B); rm(df.B)

Then we can build a function to process:

然后我们可以构建一个处理函数:

mvt <- function(xref=NULL,...,data=df){
  if(!is.numeric(xref)) #Input must be numeric
    stop("xref must be numeric",call.=F)
  xref = xref[1]

  ##Split on vehicle and process.
  ddply(data,"vehicle",function(d){
    L = nrow(d) #Number of Rows
    d$u = FALSE #Default to Not crossing

    #One or more rows can be checked.
    if(L == 1)
      d$u = (d$globalx > xref)
    else if(L > 1){
      ix <- which(d$globalx[2:L] > xref & d$globalx[1:(L-1)] <= xref)
      if(length(ix) > 0)
        d$u[ix + 1] = TRUE
    }

    #done
    return(d)
  })
}

Which can be used in the following manner:

可以通过以下方式使用:

mvt(6451216)
mvt(6451217)

#1