使用dplyr大小写时将值赋给向量

I would like to assign vales to a named vector depending on the values in a df that I'm looping through by rows. I see according to the documentation that the RHS of case when is a vector, whereas what I'm trying to do is to have the RHS be an actual assignment step. Is this possible? Case_when is really much more elegant here than having to use if statements:

我想将vales分配给一个命名向量,具体取决于我通过行循环的df中的值。我根据文档看到案例的RHS是一个向量,而我想要做的是让RHS成为一个实际的分配步骤。这可能吗? Case_when在这里比使用if语句更优雅:

test.df <- data.frame(cat1 = c('label1', 'label2', 'label3'), 
                  cat2 = c('label3', '', ''),
                  cat3 = c('', 'label2', 'label1'))

test.lst <- apply(test.df, 1, function(x){
                test.vec <- c(label1 = 0, label2 = 0, label3 = 0)

                case_when(
                  x[['cat1']]=='label1' | x[['cat2']]=='label1' | x[['cat3']]=='label1' ~ test.vec['label1'] <- 1,
                  x[['cat1']]=='label2' | x[['cat2']]=='label2' | x[['cat3']]=='label2' ~ test.vec['label2'] <- 1,
                  x[['cat1']]=='label3' | x[['cat2']]=='label3' | x[['cat3']]=='label3' ~ test.vec['label3'] <- 1
                )
              })

2 个解决方案

#1

You can use the transmute function from the dplyr package to only keep the columns created/modified in the function call. So you can in affect create an entirely new dataframe. It would look like this:

您可以使用dplyr包中的transmute函数仅保留在函数调用中创建/修改的列。因此,您可以创建一个全新的数据帧。它看起来像这样:

test.lst <- test.df %>% 
  transmute(label1 = case_when(
    cat1 == "label1" | cat2 == "label1" | cat3 == "label1" ~ 1,
    TRUE ~ 0
  ),
  label2 = case_when(
    cat1 == "label2" | cat2 == "label2" | cat3 == "label2" ~ 1,
    TRUE ~ 0
  ),
  labels3 = case_when(
    cat1 == "label3" | cat2 == "label3" | cat3 == "label3" ~ 1,
    TRUE ~ 0
  ))

and your output would look like this:

你的输出看起来像这样:

  label1 label2 labels3
1      1      0       1
2      0      1       0
3      1      0       1

As a note, the dplyr package and most of its functions are vectorized. So they perform the desired operation on each of the rows already, without the need for a for loop or an apply/map function. This has the added benefits of speeding up your code and making it more readable.

需要注意的是,dplyr包及其大部分功能都是矢量化的。因此,它们已经在每个行上执行所需的操作,而不需要for循环或apply / map函数。这样可以加快代码速度并使其更具可读性。

#2

case_when are not necessary, here is an alternative solution:

case_when没有必要,这是一个替代解决方案:

sapply(paste0('label', 1:3), function(x) sign(rowSums(as.matrix(test.df) == x)) )
#      label1 label2 label3
# [1,]      1      0      1
# [2,]      0      1      0
# [3,]      1      0      1

#1