基于另外两列创建新列,但在两者中都观察到平均值

时间:2021-12-19 09:51:59

I have two numeric columns score.a and score.b. I want to create a new variables score.c that transfers the observed score from a or b, but when they are observed in both, I need to take the average.

我有两个数字列得分.a和得分.b。我想创建一个新的变量score.c,它从a或b中传输观察到的分数,但是当它们在两者中被观察时,我需要取平均值。

help <- data.frame(deid = c(5, 7, 12, 15, 25, 32, 42, 77, 92, 100, 112, 113),
               score.a = c(NA, 2, 2, 2, NA, NA, NA, NA, NA, NA, 2, NA),
               score.b = c(4, NA, NA, 4, 4, 4, NA, NA, 4, 4, NA, 4))

creates

    deid score.a score.b
1     5      NA       4
2     7       2      NA
3    12       2      NA
4    15       2       4
5    25      NA       4
6    32      NA       4
7    42      NA      NA
8    77      NA      NA
9    92      NA       4
10  100      NA       4
11  112       2      NA
12  113      NA       4

And I am hoping to create a df that looks like

我希望创建一个看起来像的df

     deid score.a score.b score.c
1     5      NA       4     4
2     7       2      NA     2
3    12       2      NA     2
4    15       2       4     3
5    25      NA       4     4
6    32      NA       4     4
7    42      NA      NA     NA
8    77      NA      NA     NA
9    92      NA       4     4
10  100      NA       4     4
11  112       2      NA     2
12  113      NA       4     4

for example, in row 4 it takes the mean.

例如,在第4行中它取平均值。

My attempt used help %>% group_by(deid) %>% mutate(score.c = (score.a + score.b)/2) but this only handled the data observed in both columns.

我的尝试使用了帮助%>%group_by(deid)%>%mutate(score.c =(score.a + score.b)/ 2)但这只处理了两列中观察到的数据。

2 个解决方案

#1


6  

Try

  help$score.c <- rowMeans(help[2:3], na.rm=TRUE)

Or a possible approach with dplyr (not tested thoroughly)

或者使用dplyr的可能方法(未经过彻底测试)

 library(dplyr)
 help %>%
     mutate(val= (pmax(score.a, score.b, na.rm=TRUE)+
                  pmin(score.a, score.b, na.rm=TRUE))/2)

#2


3  

A data.table solution would be:

data.table解决方案将是:

library(data.table)
setDT(help)
help[,.(rMean=rowMeans(.SD,na.rm = T)),.SDcols = c('score.a','score.b')]

#1


6  

Try

  help$score.c <- rowMeans(help[2:3], na.rm=TRUE)

Or a possible approach with dplyr (not tested thoroughly)

或者使用dplyr的可能方法(未经过彻底测试)

 library(dplyr)
 help %>%
     mutate(val= (pmax(score.a, score.b, na.rm=TRUE)+
                  pmin(score.a, score.b, na.rm=TRUE))/2)

#2


3  

A data.table solution would be:

data.table解决方案将是:

library(data.table)
setDT(help)
help[,.(rMean=rowMeans(.SD,na.rm = T)),.SDcols = c('score.a','score.b')]