I would like to assign vales to a named vector depending on the values in a df that I'm looping through by rows. I see according to the documentation that the RHS of case when is a vector, whereas what I'm trying to do is to have the RHS be an actual assignment step. Is this possible? Case_when is really much more elegant here than having to use if statements:
我想将vales分配给一个命名向量,具体取决于我通过行循环的df中的值。我根据文档看到案例的RHS是一个向量,而我想要做的是让RHS成为一个实际的分配步骤。这可能吗? Case_when在这里比使用if语句更优雅:
test.df <- data.frame(cat1 = c('label1', 'label2', 'label3'),
cat2 = c('label3', '', ''),
cat3 = c('', 'label2', 'label1'))
test.lst <- apply(test.df, 1, function(x){
test.vec <- c(label1 = 0, label2 = 0, label3 = 0)
case_when(
x[['cat1']]=='label1' | x[['cat2']]=='label1' | x[['cat3']]=='label1' ~ test.vec['label1'] <- 1,
x[['cat1']]=='label2' | x[['cat2']]=='label2' | x[['cat3']]=='label2' ~ test.vec['label2'] <- 1,
x[['cat1']]=='label3' | x[['cat2']]=='label3' | x[['cat3']]=='label3' ~ test.vec['label3'] <- 1
)
})
2 个解决方案
#1
4
You can use the transmute
function from the dplyr
package to only keep the columns created/modified in the function call. So you can in affect create an entirely new dataframe. It would look like this:
您可以使用dplyr包中的transmute函数仅保留在函数调用中创建/修改的列。因此,您可以创建一个全新的数据帧。它看起来像这样:
test.lst <- test.df %>%
transmute(label1 = case_when(
cat1 == "label1" | cat2 == "label1" | cat3 == "label1" ~ 1,
TRUE ~ 0
),
label2 = case_when(
cat1 == "label2" | cat2 == "label2" | cat3 == "label2" ~ 1,
TRUE ~ 0
),
labels3 = case_when(
cat1 == "label3" | cat2 == "label3" | cat3 == "label3" ~ 1,
TRUE ~ 0
))
and your output would look like this:
你的输出看起来像这样:
label1 label2 labels3
1 1 0 1
2 0 1 0
3 1 0 1
As a note, the dplyr
package and most of its functions are vectorized. So they perform the desired operation on each of the rows already, without the need for a for
loop or an apply
/map
function. This has the added benefits of speeding up your code and making it more readable.
需要注意的是,dplyr包及其大部分功能都是矢量化的。因此,它们已经在每个行上执行所需的操作,而不需要for循环或apply / map函数。这样可以加快代码速度并使其更具可读性。
#2
0
case_when
are not necessary, here is an alternative solution:
case_when没有必要,这是一个替代解决方案:
sapply(paste0('label', 1:3), function(x) sign(rowSums(as.matrix(test.df) == x)) )
# label1 label2 label3
# [1,] 1 0 1
# [2,] 0 1 0
# [3,] 1 0 1
#1
4
You can use the transmute
function from the dplyr
package to only keep the columns created/modified in the function call. So you can in affect create an entirely new dataframe. It would look like this:
您可以使用dplyr包中的transmute函数仅保留在函数调用中创建/修改的列。因此,您可以创建一个全新的数据帧。它看起来像这样:
test.lst <- test.df %>%
transmute(label1 = case_when(
cat1 == "label1" | cat2 == "label1" | cat3 == "label1" ~ 1,
TRUE ~ 0
),
label2 = case_when(
cat1 == "label2" | cat2 == "label2" | cat3 == "label2" ~ 1,
TRUE ~ 0
),
labels3 = case_when(
cat1 == "label3" | cat2 == "label3" | cat3 == "label3" ~ 1,
TRUE ~ 0
))
and your output would look like this:
你的输出看起来像这样:
label1 label2 labels3
1 1 0 1
2 0 1 0
3 1 0 1
As a note, the dplyr
package and most of its functions are vectorized. So they perform the desired operation on each of the rows already, without the need for a for
loop or an apply
/map
function. This has the added benefits of speeding up your code and making it more readable.
需要注意的是,dplyr包及其大部分功能都是矢量化的。因此,它们已经在每个行上执行所需的操作,而不需要for循环或apply / map函数。这样可以加快代码速度并使其更具可读性。
#2
0
case_when
are not necessary, here is an alternative solution:
case_when没有必要,这是一个替代解决方案:
sapply(paste0('label', 1:3), function(x) sign(rowSums(as.matrix(test.df) == x)) )
# label1 label2 label3
# [1,] 1 0 1
# [2,] 0 1 0
# [3,] 1 0 1