消除字符向量中的非唯一元素的歧义

时间:2021-04-04 04:44:52

Given a vector of non-unique patient initials:

给出非唯一患者姓名首字母的向量:

init = c("AA", "AB", "AB", "AB", "AC")

Looking for disambiguation as follows:

寻找消歧如下:

init1 = c("AA", "AB01", "AB02", "AB03", "AC")

i.e. unique initials should be left unchanged, non-unique are disambiguated by adding two-digit numbers.

即,唯一的首字母应保持不变,通过添加两位数字来消除非唯一的歧义。

1 个解决方案

#1


4  

Use the indicated function with ave:

使用指示功能ave:

uniquify <- function(x) if (length(x) == 1) x else sprintf("%s%02d", x, seq_along(x))
ave(init, init, FUN = uniquify)
## [1] "AA"   "AB01" "AB02" "AB03" "AC"  

If the basic requirement is just to ensure unique output then make.unique(x) or make.unique(x, sep = "0") as discussed by another answer and a comment are concise but if the requirement is that the output be exactly as in the question then they do not give the same result. If there are 10 or more duplicates the output of those answers vary even more; however, the solution here does give the same answer. Here is a further example illustrating 10 or more duplicates.

如果基本要求只是为了确保唯一的输出,那么make.unique(x)或make.unique(x,sep =“0”)正如另一个答案和评论所讨论的那样简洁,但如果要求输出是准确的在问题中,他们没有给出相同的结果。如果有10个或更多重复,那些答案的输出变化甚至更多;但是,这里的解决方案确实给出了相同的答案。这是另一个示例,示出了10个或更多个重复项。

xx <- rep(c("A", "B", "C"), c(1, 10, 2))
ave(xx, xx, FUN = uniquify)
## [1] "A"   "B01" "B02" "B03" "B04" "B05" "B06" "B07" "B08" "B09" "B10" "C01" "C02"

The make.unique solution could be rescued like this:

make.unique解决方案可以像这样获得救援:

#1


4  

Use the indicated function with ave:

使用指示功能ave:

uniquify <- function(x) if (length(x) == 1) x else sprintf("%s%02d", x, seq_along(x))
ave(init, init, FUN = uniquify)
## [1] "AA"   "AB01" "AB02" "AB03" "AC"  

If the basic requirement is just to ensure unique output then make.unique(x) or make.unique(x, sep = "0") as discussed by another answer and a comment are concise but if the requirement is that the output be exactly as in the question then they do not give the same result. If there are 10 or more duplicates the output of those answers vary even more; however, the solution here does give the same answer. Here is a further example illustrating 10 or more duplicates.

如果基本要求只是为了确保唯一的输出,那么make.unique(x)或make.unique(x,sep =“0”)正如另一个答案和评论所讨论的那样简洁,但如果要求输出是准确的在问题中,他们没有给出相同的结果。如果有10个或更多重复,那些答案的输出变化甚至更多;但是,这里的解决方案确实给出了相同的答案。这是另一个示例,示出了10个或更多个重复项。

xx <- rep(c("A", "B", "C"), c(1, 10, 2))
ave(xx, xx, FUN = uniquify)
## [1] "A"   "B01" "B02" "B03" "B04" "B05" "B06" "B07" "B08" "B09" "B10" "C01" "C02"

The make.unique solution could be rescued like this:

make.unique解决方案可以像这样获得救援: