Example data frame
示例数据帧
date name speed acceleration
1/1/17 bob 5 NA
1/1/15 george 5 NA
1/1/15 bob NA 4
1/1/17 bob 4 NA
I want to condense all rows with the same name into one row and keep the newest non-na value for the speed and acceleration column.
我想将所有同名的行压缩为一行,并保留speed和加速度列的最新非na值。
Desired output
期望输出值
date name speed acceleration
1/1/17 bob 5 4
1/1/15 george 5 NA
2 个解决方案
#1
3
You can do it this way:
你可以这样做:
library(dplyr)
library(lubridate)
input = read.table(text =
"date name speed acceleration
1/1/17 bob 5 NA
1/1/15 george 5 NA
1/1/15 bob NA 4
1/1/17 bob 4 NA",
header = TRUE, stringsAsFactors = FALSE)
output <- input %>%
mutate(date = mdy(date)) %>% # or maybe dmy, depending on your date format
group_by(name) %>%
arrange(desc(date)) %>%
summarise_all(funs(na.omit(.)[1]))
output
# # A tibble: 2 × 4
# name date speed acceleration
# <chr> <date> <int> <int>
# 1 bob 2017-01-01 5 4
# 2 george 2015-01-01 5 NA
#2
0
Here is an option using data.table
. Convert the 'data.frame' to 'data.table' (setDT(input)
), order
the 'date' after converting to Date
class, grouped by 'name', loop through the columns and get the first non-NA element
这里有一个使用data.table的选项。将“data.frame”转换为“data”。表(setDT(input))),将“date”转换为“date”类,按“name”分组,遍历列,得到第一个非na元素
library(data.table)
library(lubridate)
setDT(input)[order(-mdy(date)), lapply(.SD, function(x) x[!is.na(x)][1]), name]
# name date speed acceleration
#1: bob 1/1/17 5 4
#2: george 1/1/15 5 NA
#1
3
You can do it this way:
你可以这样做:
library(dplyr)
library(lubridate)
input = read.table(text =
"date name speed acceleration
1/1/17 bob 5 NA
1/1/15 george 5 NA
1/1/15 bob NA 4
1/1/17 bob 4 NA",
header = TRUE, stringsAsFactors = FALSE)
output <- input %>%
mutate(date = mdy(date)) %>% # or maybe dmy, depending on your date format
group_by(name) %>%
arrange(desc(date)) %>%
summarise_all(funs(na.omit(.)[1]))
output
# # A tibble: 2 × 4
# name date speed acceleration
# <chr> <date> <int> <int>
# 1 bob 2017-01-01 5 4
# 2 george 2015-01-01 5 NA
#2
0
Here is an option using data.table
. Convert the 'data.frame' to 'data.table' (setDT(input)
), order
the 'date' after converting to Date
class, grouped by 'name', loop through the columns and get the first non-NA element
这里有一个使用data.table的选项。将“data.frame”转换为“data”。表(setDT(input))),将“date”转换为“date”类,按“name”分组,遍历列,得到第一个非na元素
library(data.table)
library(lubridate)
setDT(input)[order(-mdy(date)), lapply(.SD, function(x) x[!is.na(x)][1]), name]
# name date speed acceleration
#1: bob 1/1/17 5 4
#2: george 1/1/15 5 NA