在r中使用for / nested循环创建一个新列

时间:2020-12-22 14:30:17

Just getting started using R and I need some help in understanding the application of for/nested loop.

刚刚开始使用R,我需要一些帮助来理解for / nested循环的应用。

StudyID<-c(1:5)
SubjectID<-c(1:5)

df<-data.frame(StudyID=rep(StudyID, each=5), SubjectID=rep(SubjectID, each=1))

How can I create a new column called as ID, which would use the combination of studyID and subjectID to create a unique ID ?

如何创建一个名为ID的新列,它将使用studyID和subjectID的组合来创建唯一ID?

So for this data, unique ID should be from 1:25.

因此,对于此数据,唯一ID应为1:25。

So the final data looks like this:

所以最终数据看起来像这样:

UniqueID<- c(1:25)

df<-cbind(df,UniqueID)

View(df)

Is there any other way which is faster and more efficient that looping ?

有没有其他方法可以更快更有效地循环?

3 个解决方案

#1


2  

Using the dplyr package, you could do:

使用dplyr包,您可以:

library(dplyr)
df$Id = group_indices(df,StudyID,SubjectID)

This returns:

#StudyID   SubjectID   Id
#   1         1        1
#   1         2        2
#   1         3        3
#   1         4        4
#   1         5        5
#   2         1        6
#   2         2        7
#   2         3        8
#   2         4        9
#   2         5       10
#   3         1       11
#   3         3       13
#   3         4       14
#   3         5       15
#   4         1       16
#   4         2       17
#   4         3       18
#   4         4       19
#   4         5       20
#   5         1       21
#   5         2       22
#   5         3       23
#   5         4       24
#   5         5       25

#2


2  

Another method to achieve that without loading any library (base R) would be this (assuming data frame is sorted based on the two columns):

在没有加载任何库(基础R)的情况下实现该方法的另一种方法是(假设数据框基于两列进行排序):

StudyID<-c(1:5)
SubjectID<-c(1:5)
df<-data.frame(StudyID=rep(StudyID, each=5), SubjectID=rep(SubjectID, each=1))

df$uniqueID <- cumsum(!duplicated(df[1:2]))

or you can use this solution, mentioned in the comments (I prefer this over the first solution):

或者你可以使用评论中提到的这个解决方案(我更倾向于第一个解决方案):

df$uniqueID <- as.numeric(factor(do.call(paste, df)))

The output would be:

输出将是:

> print(df, row.names = FALSE)
#StudyID  SubjectID  uniqueID
#   1         1          1
#   1         2          2
#   1         3          3
#   1         4          4
#   1         5          5
#   2         1          6
#   2         2          7
#   2         3          8
#   2         4          9
#   2         5         10
#   3         1         11
#   3         2         12
#   3         3         13
#   3         4         14
#   3         5         15
#   4         1         16
#   4         2         17
#   4         3         18
#   4         4         19
#   4         5         20
#   5         1         21
#   5         2         22
#   5         3         23
#   5         4         24
#   5         5         25

#3


1  

You could go for interaction in base R:

您可以在基地R中进行互动:

df$uniqueID <- with(df, as.integer(interaction(StudyID,SubjectID)))

For example (this example expresses better what you are after):

例如(这个例子表达了你所追求的更好):

set.seed(10)
df <- data.frame(StudyID=sample(5,10,replace = T), SubjectID=rep(1:5,times=2))
df$uniqueID <- with(df, as.integer(interaction(StudyID,SubjectID)))

     # StudyID SubjectID uniqueID
# 1        3         1        3
# 2        2         2        6
# 3        3         3       11
# 4        4         4       16
# 5        1         5       17
# 6        2         1        2
# 7        2         2        6
# 8        2         3       10
# 9        4         4       16
# 10       3         5       19

#1


2  

Using the dplyr package, you could do:

使用dplyr包,您可以:

library(dplyr)
df$Id = group_indices(df,StudyID,SubjectID)

This returns:

#StudyID   SubjectID   Id
#   1         1        1
#   1         2        2
#   1         3        3
#   1         4        4
#   1         5        5
#   2         1        6
#   2         2        7
#   2         3        8
#   2         4        9
#   2         5       10
#   3         1       11
#   3         3       13
#   3         4       14
#   3         5       15
#   4         1       16
#   4         2       17
#   4         3       18
#   4         4       19
#   4         5       20
#   5         1       21
#   5         2       22
#   5         3       23
#   5         4       24
#   5         5       25

#2


2  

Another method to achieve that without loading any library (base R) would be this (assuming data frame is sorted based on the two columns):

在没有加载任何库(基础R)的情况下实现该方法的另一种方法是(假设数据框基于两列进行排序):

StudyID<-c(1:5)
SubjectID<-c(1:5)
df<-data.frame(StudyID=rep(StudyID, each=5), SubjectID=rep(SubjectID, each=1))

df$uniqueID <- cumsum(!duplicated(df[1:2]))

or you can use this solution, mentioned in the comments (I prefer this over the first solution):

或者你可以使用评论中提到的这个解决方案(我更倾向于第一个解决方案):

df$uniqueID <- as.numeric(factor(do.call(paste, df)))

The output would be:

输出将是:

> print(df, row.names = FALSE)
#StudyID  SubjectID  uniqueID
#   1         1          1
#   1         2          2
#   1         3          3
#   1         4          4
#   1         5          5
#   2         1          6
#   2         2          7
#   2         3          8
#   2         4          9
#   2         5         10
#   3         1         11
#   3         2         12
#   3         3         13
#   3         4         14
#   3         5         15
#   4         1         16
#   4         2         17
#   4         3         18
#   4         4         19
#   4         5         20
#   5         1         21
#   5         2         22
#   5         3         23
#   5         4         24
#   5         5         25

#3


1  

You could go for interaction in base R:

您可以在基地R中进行互动:

df$uniqueID <- with(df, as.integer(interaction(StudyID,SubjectID)))

For example (this example expresses better what you are after):

例如(这个例子表达了你所追求的更好):

set.seed(10)
df <- data.frame(StudyID=sample(5,10,replace = T), SubjectID=rep(1:5,times=2))
df$uniqueID <- with(df, as.integer(interaction(StudyID,SubjectID)))

     # StudyID SubjectID uniqueID
# 1        3         1        3
# 2        2         2        6
# 3        3         3       11
# 4        4         4       16
# 5        1         5       17
# 6        2         1        2
# 7        2         2        6
# 8        2         3       10
# 9        4         4       16
# 10       3         5       19