根据另一列查找R中的特定列

时间:2022-05-25 22:55:58

In an experiment, people had four candidates to choose from; sometimes they're male, other times they're female. In the below dataframe, C1 means Candidate 1, C2 means Candidate 2, and so on. F denotes female while M denotes male. A response of 1 indicates the person chose C1, a response of 2 indicates the person chose C2, and so on.

在一项实验中,人们有四个候选人可供选择;有时他们是男性,有时他们是女性。在下面的数据框中,C1表示候选人1,C2表示候选人2,依此类推。 F表示女性,而M表示男性。响应为1表示该人选择了C1,响应为2表示该人选择了C2,依此类推。

C1    C2    C3    C4    response
F     F     M     M     2
M     M     F     M     1

I want a new column "ChooseFemale" which equals to 1 if the candidate chose a female candidate, and zero otherwise. So the first row should have ChooseFemale equal to 1, while the second row should have ChooseFemale equal to zero.

我想要一个新专栏“ChooseFemale”,如果候选人选择了女性候选人,则等于1,否则为零。所以第一行应该让ChooseFemale等于1,而第二行应该让ChooseFemale等于零。

This would require me to look up a certain column depending on the value of "response" column.

这将要求我根据“响应”列的值查找某个列。

How can I do this?

我怎样才能做到这一点?

6 个解决方案

#1


1  

Another base R solution:

另一个基础R解决方案

x <- df[["response"]]

df$ChooseFemale <- as.integer(df[cbind(seq_along(x), x)] == "F")
  C1 C2 C3 C4 response ChooseFemale
1  F  F  M  M        2            1
2  M  M  F  M        1            0

Data:

Lines <- "C1    C2    C3    C4    response
F     F     M     M     2
M     M     F     M     1"

df <- read.table(text = Lines, header = TRUE, stringsAsFactors = FALSE)

#2


0  

# create dataframe
my.df <- data.frame(c1=c('f','m'),
                    c2=c('f','m'),
                    c3=c('m','f'),
                    c4=c('m','m'),
                    resp=c(2, 1))

# add column
my.df$ChooseFemale <- NA

# loop over rows
for (row in 1:nrow(my.df)){

  # extract the column to check from response column
  col <- paste0('c', my.df$resp[row])

  # fill in new column
  my.df$ChooseFemale[row] <- ifelse(my.df[row, col]=='f', 1, 0)
}

#3


0  

apply(df,1,function(x) ifelse(df[,as.numeric(x['response'])]=='F',1,0))[,1]
[1] 1 0

Here is the basic idea, select the column using the value in response. Then use apply with MARGIN=1 to apply this function row by row.

这是基本思路,使用响应中的值选择列。然后使用apply with MARGIN = 1逐行应用此函数。

df[1,'response']
[1] 2

df[1,df[1,'response']]
[1] F
Levels: F M

data

df <- read.table(text = "
  C1    C2    C3    C4    response
   F     F     M     M     2
   M     M     F     M     1
",header=T)

#4


0  

You can create a simple function to check whether the response number matches "F", and then apply it to each row at once.

您可以创建一个简单的函数来检查响应号是否与“F”匹配,然后立即将其应用于每一行。

A tidyverse approach:

一个整齐的方法:

library(tidyverse)

mydata <- data.frame(C1=sample(c("F","M"),10,replace = T),
                     C2=sample(c("F","M"),10,replace = T),
                     C3=sample(c("F","M"),10,replace = T),
                     C4=sample(c("F","M"),10,replace = T),
                     response=sample(c(1:4),10,replace = T),
                     stringsAsFactors = FALSE)

   C1 C2 C3 C4 response
1   M  M  M  M        1
2   F  F  F  M        4
3   M  F  M  M        2
4   F  M  M  F        2
5   M  M  M  F        1
6   M  F  M  F        4
7   M  M  M  F        3
8   M  M  M  M        2
9   M  F  M  M        3
10  F  F  M  F        4

Custom function to check if the response matches "F"

用于检查响应是否与“F”匹配的自定义功能

female_choice <- function(C1, C2, C3, C4, response) {

    c(C1, C2, C3, C4)[response] == "F"

}   

And then just use mutate() to modify your dataframe, and pmap() to use its rows, one by one, as the set of arguments for female_choice()

然后只需使用mutate()修改数据帧,并使用pmap()逐行使用其行作为female_choice()的参数集

mydata %>% 
    mutate(ChooseFemale = pmap_chr(., female_choice))

   C1 C2 C3 C4 response ChooseFemale
1   M  M  M  M        1        FALSE
2   F  F  F  M        4        FALSE
3   M  F  M  M        2         TRUE
4   F  M  M  F        2        FALSE
5   M  M  M  F        1        FALSE
6   M  F  M  F        4         TRUE
7   M  M  M  F        3        FALSE
8   M  M  M  M        2        FALSE
9   M  F  M  M        3        FALSE
10  F  F  M  F        4         TRUE

#5


0  

Here is one way to do it using tidyverse packages. As specified in the question, this takes into account both which candidate was chosen (C1-C4) and sex of the candidate (F/M):

这是使用tidyverse包实现它的一种方法。如问题中所述,这考虑了选择了哪个候选人(C1-C4)和候选人的性别(F / M):

# loading needed libraries
library(tidyverse)

# data
df <- utils::read.table(text = "C1    C2    C3    C4    response
                 F     F     M     M     2
                 M     M     F     M     1", header = TRUE) %>%
  tibble::as_data_frame(x = .) %>%
  tibble::rowid_to_column(.)

# manipulation
dplyr::full_join(
# creating dataframe with the new chooseFemale variable
  x = df %>%
    tidyr::gather(
      data = .,
      key = "candidate",
      value = "choice",
      C1:C4
    ) %>%
    dplyr::mutate(choice_new = paste("C", response, sep = "")) %>%
# creating the needed column by checking both the candidate chosen and 
# the sex of the candidate
    dplyr::mutate(chooseFemale = dplyr::case_when((choice_new == candidate) &
                                                    (choice == "F") ~ 1,
                                                  (choice_new == candidate) &
                                                    (choice == "M") ~ 0
    )) %>%
    dplyr::select(.data = ., -choice_new) %>%
    tidyr::spread(data = ., key = candidate, value = choice) %>%
    dplyr::filter(.data = ., !is.na(chooseFemale)) %>%
    dplyr::select(.data = ., -c(C1:C4)),
# original dataframe
  y = df,
  by = c("rowid", "response")
) %>% # removing the redundant row id
  dplyr::select(.data = ., -rowid) %>% # rearranging the columns 
  dplyr::select(.data = ., C1:C4, response, chooseFemale)

#> # A tibble: 2 x 6
#>   C1    C2    C3    C4    response chooseFemale
#>   <fct> <fct> <fct> <fct>    <int>        <dbl>
#> 1 F     F     M     M            2            1
#> 2 M     M     F     M            1            0

Created on 2018-08-24 by the reprex package (v0.2.0.9000).

由reprex包创建于2018-08-24(v0.2.0.9000)。

#6


-1  

I'll provide an answer in the tidyr format. Your data is in a "wide" format. This makes it very human readable, but not necessarily machine readable. The first step to making it more tidy is to convert the data to long format. In other words, let's transform the data so that we don't have to do calculations across multiple columns in a single row.

我将以tidyr格式提供答案。您的数据采用“宽”格式。这使得它非常人性化,但不一定是机器可读的。使其更整洁的第一步是将数据转换为长格式。换句话说,让我们转换数据,这样我们就不必在一行中的多个列上进行计算。

tidy format allows you to use grouping variables, create summaries, etc.

整洁的格式允许您使用分组变量,创建摘要等。

library(dplyr)
library(tidyr)

df <- data.frame(C1 = c("F","M"),
           C2 = c("F","M"),
           C3 = c("M","F"),
           C4 = c("M","M"),
           stringsAsFactors = FALSE)
> df
  C1 C2 C3 C4
1  F  F  M  M
2  M  M  F  M

Let's add an "id" field so we can keep track of each unique row. This is the same as the row number...but we are going to be converting the wide data to long data with different row numbers. Then use gather to convert from wide data to long data.

让我们添加一个“id”字段,以便我们可以跟踪每个唯一的行。这与行号相同......但我们将把宽数据转换为具有不同行号的长数据。然后使用gather将宽数据转换为长数据。

df_long <- df %>%
  mutate(id = row_number(C1)) %>%
  gather(key = "key", value = "value",C1:C4)
> df_long
  id key value
1  1  C1     F
2  2  C1     M
3  1  C2     F
4  2  C2     M
5  1  C3     M
6  2  C3     F
7  1  C4     M
8  2  C4     M

Now it is possible to use group_by() to group based on variables, perform summaries, etc.

现在可以使用group_by()基于变量进行分组,执行摘要等。

For what you've asked you group by the id column and then perform calculations on the group. In this case we will take the sum of all values that are "F". Then we ungroup and spread back to the wide / human readable format.

对于您要求您按id列分组的内容,然后对该组执行计算。在这种情况下,我们将取所有“F”值的总和。然后我们取消组合并扩展回宽/人类可读格式。

df_long %>%
  group_by(id) %>%
  mutate(response = sum(value=="F",na.rm=TRUE)) %>%
  ungroup()
> df_long
# A tibble: 8 x 4
     id key   value response
  <int> <chr> <chr>    <int>
1     1 C1    F            2
2     2 C1    M            1
3     1 C2    F            2
4     2 C2    M            1
5     1 C3    M            2
6     2 C3    F            1
7     1 C4    M            2
8     2 C4    M            1

To get the data back in wide format once you are done doing all calculations that you need in long format:

在完成所需的长格式计算后,以宽格式恢复数据:

df <- df_long %>%
  spread(key,value) 
> df
# A tibble: 2 x 6
     id response C1    C2    C3    C4   
  <int>    <int> <chr> <chr> <chr> <chr>
1     1        2 F     F     M     M    
2     2        1 M     M     F     M

To get the data back in the order you had it:

要按照您的顺序恢复数据:

df <- df %>%
  select(-id) %>%
  select(C1:C4,everything())
> df
# A tibble: 2 x 5
  C1    C2    C3    C4    response
  <chr> <chr> <chr> <chr>    <int>
1 F     F     M     M            2
2 M     M     F     M            1

You can of course use the pipes to do this all in one step.

您当然可以使用管道一步完成所有操作。

df <- df %>%
  mutate(id = row_number(C1)) %>%
  gather(key = "key", value = "value",C1:C4) %>%
  group_by(id) %>%
  mutate(response = sum(value=="F",na.rm=TRUE)) %>%
  ungroup() %>%
  spread(key,value) %>%
  select(-id) %>%
  select(C1:C4,everything())

#1


1  

Another base R solution:

另一个基础R解决方案

x <- df[["response"]]

df$ChooseFemale <- as.integer(df[cbind(seq_along(x), x)] == "F")
  C1 C2 C3 C4 response ChooseFemale
1  F  F  M  M        2            1
2  M  M  F  M        1            0

Data:

Lines <- "C1    C2    C3    C4    response
F     F     M     M     2
M     M     F     M     1"

df <- read.table(text = Lines, header = TRUE, stringsAsFactors = FALSE)

#2


0  

# create dataframe
my.df <- data.frame(c1=c('f','m'),
                    c2=c('f','m'),
                    c3=c('m','f'),
                    c4=c('m','m'),
                    resp=c(2, 1))

# add column
my.df$ChooseFemale <- NA

# loop over rows
for (row in 1:nrow(my.df)){

  # extract the column to check from response column
  col <- paste0('c', my.df$resp[row])

  # fill in new column
  my.df$ChooseFemale[row] <- ifelse(my.df[row, col]=='f', 1, 0)
}

#3


0  

apply(df,1,function(x) ifelse(df[,as.numeric(x['response'])]=='F',1,0))[,1]
[1] 1 0

Here is the basic idea, select the column using the value in response. Then use apply with MARGIN=1 to apply this function row by row.

这是基本思路,使用响应中的值选择列。然后使用apply with MARGIN = 1逐行应用此函数。

df[1,'response']
[1] 2

df[1,df[1,'response']]
[1] F
Levels: F M

data

df <- read.table(text = "
  C1    C2    C3    C4    response
   F     F     M     M     2
   M     M     F     M     1
",header=T)

#4


0  

You can create a simple function to check whether the response number matches "F", and then apply it to each row at once.

您可以创建一个简单的函数来检查响应号是否与“F”匹配,然后立即将其应用于每一行。

A tidyverse approach:

一个整齐的方法:

library(tidyverse)

mydata <- data.frame(C1=sample(c("F","M"),10,replace = T),
                     C2=sample(c("F","M"),10,replace = T),
                     C3=sample(c("F","M"),10,replace = T),
                     C4=sample(c("F","M"),10,replace = T),
                     response=sample(c(1:4),10,replace = T),
                     stringsAsFactors = FALSE)

   C1 C2 C3 C4 response
1   M  M  M  M        1
2   F  F  F  M        4
3   M  F  M  M        2
4   F  M  M  F        2
5   M  M  M  F        1
6   M  F  M  F        4
7   M  M  M  F        3
8   M  M  M  M        2
9   M  F  M  M        3
10  F  F  M  F        4

Custom function to check if the response matches "F"

用于检查响应是否与“F”匹配的自定义功能

female_choice <- function(C1, C2, C3, C4, response) {

    c(C1, C2, C3, C4)[response] == "F"

}   

And then just use mutate() to modify your dataframe, and pmap() to use its rows, one by one, as the set of arguments for female_choice()

然后只需使用mutate()修改数据帧,并使用pmap()逐行使用其行作为female_choice()的参数集

mydata %>% 
    mutate(ChooseFemale = pmap_chr(., female_choice))

   C1 C2 C3 C4 response ChooseFemale
1   M  M  M  M        1        FALSE
2   F  F  F  M        4        FALSE
3   M  F  M  M        2         TRUE
4   F  M  M  F        2        FALSE
5   M  M  M  F        1        FALSE
6   M  F  M  F        4         TRUE
7   M  M  M  F        3        FALSE
8   M  M  M  M        2        FALSE
9   M  F  M  M        3        FALSE
10  F  F  M  F        4         TRUE

#5


0  

Here is one way to do it using tidyverse packages. As specified in the question, this takes into account both which candidate was chosen (C1-C4) and sex of the candidate (F/M):

这是使用tidyverse包实现它的一种方法。如问题中所述,这考虑了选择了哪个候选人(C1-C4)和候选人的性别(F / M):

# loading needed libraries
library(tidyverse)

# data
df <- utils::read.table(text = "C1    C2    C3    C4    response
                 F     F     M     M     2
                 M     M     F     M     1", header = TRUE) %>%
  tibble::as_data_frame(x = .) %>%
  tibble::rowid_to_column(.)

# manipulation
dplyr::full_join(
# creating dataframe with the new chooseFemale variable
  x = df %>%
    tidyr::gather(
      data = .,
      key = "candidate",
      value = "choice",
      C1:C4
    ) %>%
    dplyr::mutate(choice_new = paste("C", response, sep = "")) %>%
# creating the needed column by checking both the candidate chosen and 
# the sex of the candidate
    dplyr::mutate(chooseFemale = dplyr::case_when((choice_new == candidate) &
                                                    (choice == "F") ~ 1,
                                                  (choice_new == candidate) &
                                                    (choice == "M") ~ 0
    )) %>%
    dplyr::select(.data = ., -choice_new) %>%
    tidyr::spread(data = ., key = candidate, value = choice) %>%
    dplyr::filter(.data = ., !is.na(chooseFemale)) %>%
    dplyr::select(.data = ., -c(C1:C4)),
# original dataframe
  y = df,
  by = c("rowid", "response")
) %>% # removing the redundant row id
  dplyr::select(.data = ., -rowid) %>% # rearranging the columns 
  dplyr::select(.data = ., C1:C4, response, chooseFemale)

#> # A tibble: 2 x 6
#>   C1    C2    C3    C4    response chooseFemale
#>   <fct> <fct> <fct> <fct>    <int>        <dbl>
#> 1 F     F     M     M            2            1
#> 2 M     M     F     M            1            0

Created on 2018-08-24 by the reprex package (v0.2.0.9000).

由reprex包创建于2018-08-24(v0.2.0.9000)。

#6


-1  

I'll provide an answer in the tidyr format. Your data is in a "wide" format. This makes it very human readable, but not necessarily machine readable. The first step to making it more tidy is to convert the data to long format. In other words, let's transform the data so that we don't have to do calculations across multiple columns in a single row.

我将以tidyr格式提供答案。您的数据采用“宽”格式。这使得它非常人性化,但不一定是机器可读的。使其更整洁的第一步是将数据转换为长格式。换句话说,让我们转换数据,这样我们就不必在一行中的多个列上进行计算。

tidy format allows you to use grouping variables, create summaries, etc.

整洁的格式允许您使用分组变量,创建摘要等。

library(dplyr)
library(tidyr)

df <- data.frame(C1 = c("F","M"),
           C2 = c("F","M"),
           C3 = c("M","F"),
           C4 = c("M","M"),
           stringsAsFactors = FALSE)
> df
  C1 C2 C3 C4
1  F  F  M  M
2  M  M  F  M

Let's add an "id" field so we can keep track of each unique row. This is the same as the row number...but we are going to be converting the wide data to long data with different row numbers. Then use gather to convert from wide data to long data.

让我们添加一个“id”字段,以便我们可以跟踪每个唯一的行。这与行号相同......但我们将把宽数据转换为具有不同行号的长数据。然后使用gather将宽数据转换为长数据。

df_long <- df %>%
  mutate(id = row_number(C1)) %>%
  gather(key = "key", value = "value",C1:C4)
> df_long
  id key value
1  1  C1     F
2  2  C1     M
3  1  C2     F
4  2  C2     M
5  1  C3     M
6  2  C3     F
7  1  C4     M
8  2  C4     M

Now it is possible to use group_by() to group based on variables, perform summaries, etc.

现在可以使用group_by()基于变量进行分组,执行摘要等。

For what you've asked you group by the id column and then perform calculations on the group. In this case we will take the sum of all values that are "F". Then we ungroup and spread back to the wide / human readable format.

对于您要求您按id列分组的内容,然后对该组执行计算。在这种情况下,我们将取所有“F”值的总和。然后我们取消组合并扩展回宽/人类可读格式。

df_long %>%
  group_by(id) %>%
  mutate(response = sum(value=="F",na.rm=TRUE)) %>%
  ungroup()
> df_long
# A tibble: 8 x 4
     id key   value response
  <int> <chr> <chr>    <int>
1     1 C1    F            2
2     2 C1    M            1
3     1 C2    F            2
4     2 C2    M            1
5     1 C3    M            2
6     2 C3    F            1
7     1 C4    M            2
8     2 C4    M            1

To get the data back in wide format once you are done doing all calculations that you need in long format:

在完成所需的长格式计算后,以宽格式恢复数据:

df <- df_long %>%
  spread(key,value) 
> df
# A tibble: 2 x 6
     id response C1    C2    C3    C4   
  <int>    <int> <chr> <chr> <chr> <chr>
1     1        2 F     F     M     M    
2     2        1 M     M     F     M

To get the data back in the order you had it:

要按照您的顺序恢复数据:

df <- df %>%
  select(-id) %>%
  select(C1:C4,everything())
> df
# A tibble: 2 x 5
  C1    C2    C3    C4    response
  <chr> <chr> <chr> <chr>    <int>
1 F     F     M     M            2
2 M     M     F     M            1

You can of course use the pipes to do this all in one step.

您当然可以使用管道一步完成所有操作。

df <- df %>%
  mutate(id = row_number(C1)) %>%
  gather(key = "key", value = "value",C1:C4) %>%
  group_by(id) %>%
  mutate(response = sum(value=="F",na.rm=TRUE)) %>%
  ungroup() %>%
  spread(key,value) %>%
  select(-id) %>%
  select(C1:C4,everything())