一个数据的最大值。表列基于其他列。

时间:2021-01-02 08:04:29

I have a R data.table

我有一个R数据表

DT = data.table(x=rep(c("b","a",NA_character_),each=3), y=rep(c('A', NA_character_, 'C'), each=3), z=c(NA_character_), v=1:9) 
DT
#    x  y  z v
#1:  b  A NA 1
#2:  b  A NA 2
#3:  b  A NA 3
#4:  a NA NA 4
#5:  a NA NA 5
#6:  a NA NA 6
#7: NA  C NA 7
#8: NA  C NA 8
#9: NA  C NA 9

For each column if the value is not NA, I want to extract the max value from column v. I am using

对于每一列,如果值不是NA,我想从我正在使用的列v中提取最大值

sapply(DT, function(x) { ifelse(all(is.na(x)), NA_integer_, max(DT[['v']][!is.na(x)])) })
 #x  y  z  v 
 #6  9 NA  9

Is there a simpler way to achive this?

有没有更简单的方法来实现这个目标?

2 个解决方案

#1


3  

here is a way, giving you -Inf (and a warning) if all values of the column are NA (you can later replace that by NA if you prefer):

这里有一种方法,如果列的所有值都是NA(稍后您可以用NA来替代),那么就给您-Inf(和一个警告):

DT[, lapply(.SD, function(x) max(v[!is.na(x)]))]
#    x y    z v
# 1: 6 9 -Inf 9

As suggested by @DavidArenburg, to ensure that everything goes well even when all values are NA (no warning and directly NA as result), you can do:

@DavidArenburg建议,即使所有的值都是NA(无警告,直接为NA),也要确保一切正常运行,您可以做到:

DT[, lapply(.SD, function(x) {
  temp <- v[!is.na(x)] 
  if(!length(temp)) NA else max(temp)
})]
#   x y  z v
#1: 6 9 NA 9

#2


1  

We can use summarise_each from dplyr

我们可以使用来自dplyr的summary _each

library(dplyr)
DT %>%
   summarise_each(funs(max(v[!is.na(.)])))
#    x y    z v
#1: 6 9 -Inf 9

#1


3  

here is a way, giving you -Inf (and a warning) if all values of the column are NA (you can later replace that by NA if you prefer):

这里有一种方法,如果列的所有值都是NA(稍后您可以用NA来替代),那么就给您-Inf(和一个警告):

DT[, lapply(.SD, function(x) max(v[!is.na(x)]))]
#    x y    z v
# 1: 6 9 -Inf 9

As suggested by @DavidArenburg, to ensure that everything goes well even when all values are NA (no warning and directly NA as result), you can do:

@DavidArenburg建议,即使所有的值都是NA(无警告,直接为NA),也要确保一切正常运行,您可以做到:

DT[, lapply(.SD, function(x) {
  temp <- v[!is.na(x)] 
  if(!length(temp)) NA else max(temp)
})]
#   x y  z v
#1: 6 9 NA 9

#2


1  

We can use summarise_each from dplyr

我们可以使用来自dplyr的summary _each

library(dplyr)
DT %>%
   summarise_each(funs(max(v[!is.na(.)])))
#    x y    z v
#1: 6 9 -Inf 9