I have the following data frame in R:
我在R中有以下数据框:
ID Information
1 Yes
1 NA
1 NA
1 Yes
2 No
2 NA
2 NA
3 NA
3 NA
3 Maybe
3 NA
I need to fill out the rows that contain NA's with whatever information is contained in one of the rows corresponding to that ID. I would like to have this:
我需要填写包含NA的行,其中包含与该ID对应的一行中包含的任何信息。我想要这个:
ID Information
1 Yes
1 Yes
1 Yes
1 Yes
2 No
2 No
2 No
3 Maybe
3 Maybe
3 Maybe
3 Maybe
As far as I know, the information (ie Yes/No/Maybe) is not conflicting within an ID but it may be repeated.(Sorry about the ugly format- I am a newbie and may not post pictures).
据我所知,信息(即是/否/可能)在ID中没有冲突,但可能会重复。(抱歉丑陋的格式 - 我是新手,可能不会发布图片)。
Thank you!
5 个解决方案
#1
5
One option is using data.table
. We convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'ID', we assign (:=
) 'Information' as the unique
non-NA element.
一种选择是使用data.table。我们将'data.frame'转换为'data.table'(setDT(df1)),按'ID'分组,我们将(:=)'Information'指定为唯一的非NA元素。
library(data.table)#v1.9.5+
setDT(df1)[, Information:=unique(Information[!is.na(Information)]), by = ID]
df1
# ID Information
# 1: 1 Yes
# 2: 1 Yes
# 3: 1 Yes
# 4: 1 Yes
# 5: 2 No
# 6: 2 No
# 7: 2 No
# 8: 3 Maybe
# 9: 3 Maybe
# 10: 3 Maybe
# 11: 3 Maybe
Or we can join the dataset with the unique rows of dataset after removing the 'NA' rows. Here, I use the devel
version of data.table
或者,我们可以在删除“NA”行后将数据集与唯一的数据集行连接起来。在这里,我使用了data.table的devel版本
setDT(unique(na.omit(df1)))[df1['ID'], on='ID']
Or we use dplyr
, grouped by 'ID', we arrange
the 'Information' so that 'NA' will be the last, create the 'Information' as the first value of 'Information'.
或者我们使用dplyr,按'ID'分组,我们安排'信息'以便'NA'将是最后一个,创建'信息'作为'信息'的第一个值。
library(dplyr)
df1 %>%
group_by(ID) %>%
arrange(Information) %>%
mutate(Information= first(Information))
#2
3
Here is an option using na.locf
with ddply
这是一个使用na.locf和ddply的选项
library(zoo)
library(plyr)
ddply(d, .(ID), mutate, Information = na.locf(Information))
# ID Information
#1 1 Yes
#2 1 Yes
#3 1 Yes
#4 1 Yes
#5 2 No
#6 2 No
#7 2 No
#8 3 Maybe
#9 3 Maybe
#10 3 Maybe
#11 3 Maybe
#3
2
Or in base R:
或者在基地R:
uniqueCombns <- unique(dat[complete.cases(dat),])
merge(dat["ID"], uniqueCombns, by="ID", all.x=T)
where dat is your dataframe
其中dat是您的数据帧
#4
1
Since DF$information
is a valid "factor" and there are no conflictions, you could, also, do (unless I'm ignoring something):
由于DF $信息是一个有效的“因素”并且没有冲突,你也可以这样做(除非我忽略了一些事情):
levels(DF$Information)[approxfun(DF$ID, DF$Information, method = "constant")(DF$ID)]
# [1] "Yes" "Yes" "Yes" "Yes" "No" "No" "No" "Maybe" "Maybe" "Maybe" "Maybe"
#5
1
Assuming there is exactly one non-NA in each group we can simply omit the NAs and assign the remaining value to all the others doing this by group. No packages are used:
假设每组中只有一个非NA,我们可以简单地省略NA并将剩余的值分配给按组执行此操作的所有其他值。没有包使用:
transform(df, Information = ave(Information, ID, FUN = na.omit))
giving:
ID Information
1 1 Yes
2 1 Yes
3 1 Yes
4 1 Yes
5 2 No
6 2 No
7 2 No
8 3 Maybe
9 3 Maybe
10 3 Maybe
11 3 Maybe
If there can be more than one non-NA in each group but they are all the same then replace na.omit
with function(x) na.omit(x)[1]
.
如果每组中可以有多个非NA,但它们都相同,则将na.omit替换为函数(x)na.omit(x)[1]。
#1
5
One option is using data.table
. We convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'ID', we assign (:=
) 'Information' as the unique
non-NA element.
一种选择是使用data.table。我们将'data.frame'转换为'data.table'(setDT(df1)),按'ID'分组,我们将(:=)'Information'指定为唯一的非NA元素。
library(data.table)#v1.9.5+
setDT(df1)[, Information:=unique(Information[!is.na(Information)]), by = ID]
df1
# ID Information
# 1: 1 Yes
# 2: 1 Yes
# 3: 1 Yes
# 4: 1 Yes
# 5: 2 No
# 6: 2 No
# 7: 2 No
# 8: 3 Maybe
# 9: 3 Maybe
# 10: 3 Maybe
# 11: 3 Maybe
Or we can join the dataset with the unique rows of dataset after removing the 'NA' rows. Here, I use the devel
version of data.table
或者,我们可以在删除“NA”行后将数据集与唯一的数据集行连接起来。在这里,我使用了data.table的devel版本
setDT(unique(na.omit(df1)))[df1['ID'], on='ID']
Or we use dplyr
, grouped by 'ID', we arrange
the 'Information' so that 'NA' will be the last, create the 'Information' as the first value of 'Information'.
或者我们使用dplyr,按'ID'分组,我们安排'信息'以便'NA'将是最后一个,创建'信息'作为'信息'的第一个值。
library(dplyr)
df1 %>%
group_by(ID) %>%
arrange(Information) %>%
mutate(Information= first(Information))
#2
3
Here is an option using na.locf
with ddply
这是一个使用na.locf和ddply的选项
library(zoo)
library(plyr)
ddply(d, .(ID), mutate, Information = na.locf(Information))
# ID Information
#1 1 Yes
#2 1 Yes
#3 1 Yes
#4 1 Yes
#5 2 No
#6 2 No
#7 2 No
#8 3 Maybe
#9 3 Maybe
#10 3 Maybe
#11 3 Maybe
#3
2
Or in base R:
或者在基地R:
uniqueCombns <- unique(dat[complete.cases(dat),])
merge(dat["ID"], uniqueCombns, by="ID", all.x=T)
where dat is your dataframe
其中dat是您的数据帧
#4
1
Since DF$information
is a valid "factor" and there are no conflictions, you could, also, do (unless I'm ignoring something):
由于DF $信息是一个有效的“因素”并且没有冲突,你也可以这样做(除非我忽略了一些事情):
levels(DF$Information)[approxfun(DF$ID, DF$Information, method = "constant")(DF$ID)]
# [1] "Yes" "Yes" "Yes" "Yes" "No" "No" "No" "Maybe" "Maybe" "Maybe" "Maybe"
#5
1
Assuming there is exactly one non-NA in each group we can simply omit the NAs and assign the remaining value to all the others doing this by group. No packages are used:
假设每组中只有一个非NA,我们可以简单地省略NA并将剩余的值分配给按组执行此操作的所有其他值。没有包使用:
transform(df, Information = ave(Information, ID, FUN = na.omit))
giving:
ID Information
1 1 Yes
2 1 Yes
3 1 Yes
4 1 Yes
5 2 No
6 2 No
7 2 No
8 3 Maybe
9 3 Maybe
10 3 Maybe
11 3 Maybe
If there can be more than one non-NA in each group but they are all the same then replace na.omit
with function(x) na.omit(x)[1]
.
如果每组中可以有多个非NA,但它们都相同,则将na.omit替换为函数(x)na.omit(x)[1]。