I have two columns in a dataset after merging two seperate datasets. I would like to merge these columns into one column, BNR.x.
For the cases listed below my prefered outcomes would be:
1. Nothing. BNR.x has data, that's fine.
2. Nothing. Data in both colums is the same, that's fine.
3. Data from BNR.y is copied to BNR.x
4. Nothing. Same as 2.
5. Data in colums is different. Preferably i'd get an extra column with a 1 of FALSE as warning in this row.
6. No Data. Preferably i'd get a warning here aswell to notify me that i dont have any data for this item.
在合并两个分离的数据集之后,我在一个数据集中有两个列。我想把这些列合并成一个列,bnr。x。对于下列情况,我的首选结果是:1。什么都没有。方向。x有数据,没问题。2。什么都没有。两列的数据都是一样的,很好。3所示。方向的数据。y被复制到BNR。x 4。什么都没有。与2相同。5。列中的数据是不同的。最好我得到一个额外的列,其中1为FALSE作为警告。6。没有数据。我最好在这里得到一个警告,告诉我我没有这个项目的任何数据。
+----+-------+-------+
| ID | BNR.x | BNR.y |
+----+-------+-------+
| 1 | 123 | NA |
| 2 | 234 | 234 |
| 3 | NA | 345 |
| 4 | 456 | 456 |
| 5 | 678 | 677 |
| 6 | NA | NA |
+----+-------+-------+
Is there a method or package that will do this for me?
有什么方法或方案可以帮我做这件事吗?
3 个解决方案
#1
1
If your data are in a data frame called d
, you can do :
如果你的数据在一个称为d的数据框架中,你可以:
## Copy BNR.y if BNR.x is missing
d$BNR.x[is.na(d$BNR.x)] <- d$BNR.y[is.na(d$BNR.x)]
## List the indices of BNR.x that are still missing
which(is.na(d$BNR.x))
## List the indices where BNR.x is different from BNR.y
which(d$BNR.x != d$BNR.y)
#2
2
Here's a proposal. dat
is the name of the data frame:
这里有一个建议。数据框的名称:
idx <- is.na(dat$BNR.x) # create logical index for NAs in BNR.x
dat$BNR.x[idx] <- dat$BNR.y[idx] # replace NAs with values from BNR.y
# Add a logical column:
dat <- transform(dat, warn = is.na(BNR.x) | (BNR.x != BNR.y & !is.na(BNR.y)))
The result:
结果:
ID BNR.x BNR.y warn
1 1 123 NA FALSE
2 2 234 234 FALSE
3 3 345 345 FALSE
4 4 456 456 FALSE
5 5 678 677 TRUE
6 6 NA NA TRUE
#3
0
From:
来自:
df
V1 V2 V3
1 1 123 NA
...
df[which(is.na(df$V2)),]$V2 <- df[which(is.na(df$V2)),]$V3
df$warn <- 0
df[which(is.na(df$V2)),]$warn <- 1
df[which(df$V2 != df$V3 & !is.na(df$V3)),]$warn <- 1
Ok, overuse of which and transform is nicer, but I have to start somewhere :)
好的,过度使用它和变换会更好,但我必须从某个地方开始:)
ps. am I wrong or
我说错了吗
d$BNR.x[is.na(d$BNR.x)] <- d$BNR.y
won't work because it will place "wrongly aligned" BNR$y values in correspondence to BNR$x NAs?
不能工作,因为它将“错误对齐”BNR$y值与BNR$x NAs对应?
#1
1
If your data are in a data frame called d
, you can do :
如果你的数据在一个称为d的数据框架中,你可以:
## Copy BNR.y if BNR.x is missing
d$BNR.x[is.na(d$BNR.x)] <- d$BNR.y[is.na(d$BNR.x)]
## List the indices of BNR.x that are still missing
which(is.na(d$BNR.x))
## List the indices where BNR.x is different from BNR.y
which(d$BNR.x != d$BNR.y)
#2
2
Here's a proposal. dat
is the name of the data frame:
这里有一个建议。数据框的名称:
idx <- is.na(dat$BNR.x) # create logical index for NAs in BNR.x
dat$BNR.x[idx] <- dat$BNR.y[idx] # replace NAs with values from BNR.y
# Add a logical column:
dat <- transform(dat, warn = is.na(BNR.x) | (BNR.x != BNR.y & !is.na(BNR.y)))
The result:
结果:
ID BNR.x BNR.y warn
1 1 123 NA FALSE
2 2 234 234 FALSE
3 3 345 345 FALSE
4 4 456 456 FALSE
5 5 678 677 TRUE
6 6 NA NA TRUE
#3
0
From:
来自:
df
V1 V2 V3
1 1 123 NA
...
df[which(is.na(df$V2)),]$V2 <- df[which(is.na(df$V2)),]$V3
df$warn <- 0
df[which(is.na(df$V2)),]$warn <- 1
df[which(df$V2 != df$V3 & !is.na(df$V3)),]$warn <- 1
Ok, overuse of which and transform is nicer, but I have to start somewhere :)
好的,过度使用它和变换会更好,但我必须从某个地方开始:)
ps. am I wrong or
我说错了吗
d$BNR.x[is.na(d$BNR.x)] <- d$BNR.y
won't work because it will place "wrongly aligned" BNR$y values in correspondence to BNR$x NAs?
不能工作,因为它将“错误对齐”BNR$y值与BNR$x NAs对应?