使用R在患者ID中包含其中一行的信息填充数据框中的NA

I have the following data frame in R:

我在R中有以下数据框:

ID  Information
1    Yes
1    NA
1    NA
1    Yes
2    No
2    NA
2    NA
3    NA
3    NA
3    Maybe
3    NA

I need to fill out the rows that contain NA's with whatever information is contained in one of the rows corresponding to that ID. I would like to have this:

我需要填写包含NA的行,其中包含与该ID对应的一行中包含的任何信息。我想要这个:

ID  Information
1   Yes
1   Yes
1   Yes
1   Yes
2   No
2   No
2   No
3   Maybe
3   Maybe
3   Maybe
3   Maybe

As far as I know, the information (ie Yes/No/Maybe) is not conflicting within an ID but it may be repeated.(Sorry about the ugly format- I am a newbie and may not post pictures).

据我所知,信息(即是/否/可能)在ID中没有冲突,但可能会重复。(抱歉丑陋的格式 - 我是新手,可能不会发布图片)。

Thank you!

5 个解决方案

#1

One option is using data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'ID', we assign (:=) 'Information' as the unique non-NA element.

一种选择是使用data.table。我们将'data.frame'转换为'data.table'(setDT(df1)),按'ID'分组,我们将(:=)'Information'指定为唯一的非NA元素。

library(data.table)#v1.9.5+
setDT(df1)[, Information:=unique(Information[!is.na(Information)]), by = ID]
df1
#     ID Information
#  1:  1         Yes
#  2:  1         Yes
#  3:  1         Yes
#  4:  1         Yes
#  5:  2          No
#  6:  2          No
#  7:  2          No
#  8:  3       Maybe
#  9:  3       Maybe
# 10:  3       Maybe
# 11:  3       Maybe

Or we can join the dataset with the unique rows of dataset after removing the 'NA' rows. Here, I use the devel version of data.table

或者,我们可以在删除“NA”行后将数据集与唯一的数据集行连接起来。在这里,我使用了data.table的devel版本

 setDT(unique(na.omit(df1)))[df1['ID'], on='ID']

Or we use dplyr, grouped by 'ID', we arrange the 'Information' so that 'NA' will be the last, create the 'Information' as the first value of 'Information'.

或者我们使用dplyr,按'ID'分组,我们安排'信息'以便'NA'将是最后一个,创建'信息'作为'信息'的第一个值。

 library(dplyr)
 df1 %>%
    group_by(ID) %>% 
    arrange(Information) %>% 
    mutate(Information= first(Information))

#2

Here is an option using na.locf with ddply

这是一个使用na.locf和ddply的选项

library(zoo)
library(plyr)

ddply(d, .(ID), mutate, Information = na.locf(Information))

#   ID Information
#1   1         Yes
#2   1         Yes
#3   1         Yes
#4   1         Yes
#5   2          No
#6   2          No
#7   2          No
#8   3       Maybe
#9   3       Maybe
#10  3       Maybe
#11  3       Maybe

#3

Or in base R:

或者在基地R:

uniqueCombns <- unique(dat[complete.cases(dat),])
merge(dat["ID"], uniqueCombns, by="ID", all.x=T)

where dat is your dataframe

其中dat是您的数据帧

#4

Since DF$information is a valid "factor" and there are no conflictions, you could, also, do (unless I'm ignoring something):

由于DF $信息是一个有效的“因素”并且没有冲突,你也可以这样做(除非我忽略了一些事情):

levels(DF$Information)[approxfun(DF$ID, DF$Information, method = "constant")(DF$ID)]
# [1] "Yes"   "Yes"   "Yes"   "Yes"   "No"    "No"    "No"    "Maybe" "Maybe" "Maybe" "Maybe"

#5

Assuming there is exactly one non-NA in each group we can simply omit the NAs and assign the remaining value to all the others doing this by group. No packages are used:

假设每组中只有一个非NA,我们可以简单地省略NA并将剩余的值分配给按组执行此操作的所有其他值。没有包使用:

transform(df,  Information = ave(Information, ID, FUN = na.omit))

giving:

   ID Information
1   1         Yes
2   1         Yes
3   1         Yes
4   1         Yes
5   2          No
6   2          No
7   2          No
8   3       Maybe
9   3       Maybe
10  3       Maybe
11  3       Maybe

If there can be more than one non-NA in each group but they are all the same then replace na.omit with function(x) na.omit(x)[1] .

如果每组中可以有多个非NA,但它们都相同,则将na.omit替换为函数(x)na.omit(x)[1]。

#1