I have trip data that looks something like this
我的旅行数据看起来像这样
ClientID <- c("45675")
Date <- c("10/10/2016")
PickUpAddress <- c("123 Street", "45 Way", "66 Blvd")
DropOffAddress <- c("45 Way", "66 Blvd", "123 Street")
PickUpTime <- c("08:00", "17:00", "18:00")
DropOffTime <- c("8:30", "17:30", "19:00")
df <- data.frame(ClientID, Date, PickUpAddress, DropOffAddress, PickUpTime, DropOffTime)
df
ClientID Date PickUpAddress DropOffAddress PickUpTime DropOffTime
1 45675 10/10/2016 123 Street 45 Way 08:00 8:30
2 45675 10/10/2016 45 Way 66 Blvd 17:00 17:30
3 45675 10/10/2016 66 Blvd 123 Street 18:00 19:00
But with thousands of records and varying numbers of trips per client though the year.
但是,每年有数千条记录和每个客户的不同旅行次数。
The third row in this example is the return trip (the trip to the original origin). I would like to remove all return trips from the database.
此示例中的第三行是返回行程(原始行程的行程)。我想从数据库中删除所有回程。
Any suggestions?
1 个解决方案
#1
0
You can try the following solution which is based of the definition of client home address.
您可以尝试以下基于客户端归属地址定义的解决方案。
library(dplyr)
library(lubridate)
# create date/time format variables
df$Date_PickUpTime <- paste(df$Date, df$PickUpTime, sep = " ")
df$Date_DropOffTime <- paste(df$Date, df$DropOffTime, sep = " ")
df$Date_PickUpTime <- mdy_hm(df$Date_PickUpTime)
df$Date_DropOffTime <- mdy_hm(df$Date_DropOffTime)
str(df) # as you can see Date_PickUpTime and Date_DropOffTime are in POSIXct format
# define the client home address
df %>%
group_by(ClientID) %>% # group by client
arrange(Date_PickUpTime) %>% # order the data by Date_PickUpTime
mutate(HomeAddress = PickUpAddress[1]) # client home address is the first PickUpAddress
# ... then add filter to the above code
df %>%
group_by(ClientID) %>% # group by client
arrange(Date_PickUpTime) %>% # order the data
mutate(HomeAddress = PickUpAddress[1]) %>% # client home address
filter(DropOffAddress != HomeAddress) # condition for filter:
# DropOffAddress is different to HomeAddress
# return trip (3rd) is not selected
#1
0
You can try the following solution which is based of the definition of client home address.
您可以尝试以下基于客户端归属地址定义的解决方案。
library(dplyr)
library(lubridate)
# create date/time format variables
df$Date_PickUpTime <- paste(df$Date, df$PickUpTime, sep = " ")
df$Date_DropOffTime <- paste(df$Date, df$DropOffTime, sep = " ")
df$Date_PickUpTime <- mdy_hm(df$Date_PickUpTime)
df$Date_DropOffTime <- mdy_hm(df$Date_DropOffTime)
str(df) # as you can see Date_PickUpTime and Date_DropOffTime are in POSIXct format
# define the client home address
df %>%
group_by(ClientID) %>% # group by client
arrange(Date_PickUpTime) %>% # order the data by Date_PickUpTime
mutate(HomeAddress = PickUpAddress[1]) # client home address is the first PickUpAddress
# ... then add filter to the above code
df %>%
group_by(ClientID) %>% # group by client
arrange(Date_PickUpTime) %>% # order the data
mutate(HomeAddress = PickUpAddress[1]) %>% # client home address
filter(DropOffAddress != HomeAddress) # condition for filter:
# DropOffAddress is different to HomeAddress
# return trip (3rd) is not selected