For a sample dataframe:
对于示例数据框:
df <- structure(list(code = c("a1", "a1", "b2", "v4", "f5", "f5", "h7",
"a1"), name = c("katie", "katie", "sally", "tom", "amy", "amy",
"ash", "james"), number = c(3.5, 3.5, 2, 6, 4, 4, 7, 3)), .Names = c("code",
"name", "number"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-8L), spec = structure(list(cols = structure(list(code = structure(list(), class = c("collector_character",
"collector")), name = structure(list(), class = c("collector_character",
"collector")), number = structure(list(), class = c("collector_double",
"collector"))), .Names = c("code", "name", "number")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
I want to highlight all the records which are have two or more values of 'code' which are the same. I know I could use:
我想强调所有具有两个或更多“代码”值的记录是相同的。我知道我可以使用:
df[duplicated(df$name), ]
But this only highlights the duplicated records, but I want all of the code values which are duplicated (i.e. 3 a1s and 2 f5s).
但这仅突出显示重复的记录,但我想要所有重复的代码值(即3 a1s和2 f5s)。
Any ideas?
3 个解决方案
#1
8
df[duplicated(df$code) | duplicated(df$code, fromLast=TRUE), ]
code name number
1 a1 katie 3.5
2 a1 katie 3.5
5 f5 amy 4.0
6 f5 amy 4.0
8 a1 james 3.0
Another solution inspired by Alok VS:
另一个受Alok VS启发的解决方案:
ta <- table(df$code)
df[df$code %in% names(ta[ta > 1]), ]
# Edit: this should be faster:
df[df$code %in% names(ta)[ta > 1], ]
Edit: If you are ok with leaving base R then gdata::duplicated2()
allows for more concision.
编辑:如果你可以离开基础R,那么gdata :: duplicated2()允许更简洁。
library(gdata)
df[duplicated2(df$code), ]
#2
1
I've come up with a crude solution,
我想出了一个粗糙的解决方案,
temp<-aggregate(df$code, by=list(df$code), FUN=length)
temp<-temp[temp$x>1,]
df[df$code %in% temp$Group.1,]
#3
1
turn the indexes to values - and then check if 'code' fits this values:
将索引转换为值 - 然后检查'code'是否符合以下值:
df[df$code %in% df$code[duplicated(df$code)], ]
code name number
1 a1 katie 3.5
2 a1 katie 3.5
5 f5 amy 4.0
6 f5 amy 4.0
8 a1 james 3.0
#1
8
df[duplicated(df$code) | duplicated(df$code, fromLast=TRUE), ]
code name number
1 a1 katie 3.5
2 a1 katie 3.5
5 f5 amy 4.0
6 f5 amy 4.0
8 a1 james 3.0
Another solution inspired by Alok VS:
另一个受Alok VS启发的解决方案:
ta <- table(df$code)
df[df$code %in% names(ta[ta > 1]), ]
# Edit: this should be faster:
df[df$code %in% names(ta)[ta > 1], ]
Edit: If you are ok with leaving base R then gdata::duplicated2()
allows for more concision.
编辑:如果你可以离开基础R,那么gdata :: duplicated2()允许更简洁。
library(gdata)
df[duplicated2(df$code), ]
#2
1
I've come up with a crude solution,
我想出了一个粗糙的解决方案,
temp<-aggregate(df$code, by=list(df$code), FUN=length)
temp<-temp[temp$x>1,]
df[df$code %in% temp$Group.1,]
#3
1
turn the indexes to values - and then check if 'code' fits this values:
将索引转换为值 - 然后检查'code'是否符合以下值:
df[df$code %in% df$code[duplicated(df$code)], ]
code name number
1 a1 katie 3.5
2 a1 katie 3.5
5 f5 amy 4.0
6 f5 amy 4.0
8 a1 james 3.0