Sample data
sessionid qf Office
12 3 LON1,LON2,LON1,SEA2,SEA3,SEA3,SEA3
12 4 DEL2,DEL1,LON1,DEL1
13 5 MAn1,LON1,DEL1,LON1
Here i want to remove duplicate values in column "OFFICE" by each row.
在这里,我想删除每行“OFFICE”列中的重复值。
Expected Output
sessionid qf Office
12 3 LON1,LON2,SEA2,SEA3
12 4 DEL2,DEL1,LON1
13 5 MAN1,LON1,DEL1
2 个解决方案
#1
2
We could use tidyverse
. Split the 'Office' by the deimiter and expand to 'long' format, then get the distinct
rows, grouped by 'sessionid', and 'qf', paste
the contents of 'Office'
我们可以使用tidyverse。通过分隔符拆分“Office”并展开为“long”格式,然后获取由“sessionid”和“qf”分组的不同行,粘贴“Office”的内容
library(tidyverse)
separate_rows(df1, Office) %>%
distinct() %>%
group_by(sessionid, qf) %>%
summarise(Office = toString(Office))
# A tibble: 3 x 3
# Groups: sessionid [?]
# sessionid qf Office
# <int> <int> <chr>
#1 12 3 LON1, LON2, SEA2, SEA3
#2 12 4 DEL2, DEL1, LON1
#3 13 5 MAn1, LON1, DEL1
#2
2
Here is a base R way of doing it, it works as you'd expect, first split Office by the comma, remove duplicates, then paste back together again
这是一个基本的R方式,它按照你的预期工作,首先用逗号分割Office,删除重复项,然后再粘贴在一起
df$Office <- sapply(lapply(strsplit(df$Office, ","),
function(x) {
unique(x)
}),
function(x) {
paste(x, collapse = ",")
},
simplify = T)
or with %>%
或者%>%
df$Office <- df$Office %>%
strsplit(",") %>%
lapply(function(x){unique(x)}) %>%
sapply(function(x){paste(x,collapse = ",")},simplify = T)
#1
2
We could use tidyverse
. Split the 'Office' by the deimiter and expand to 'long' format, then get the distinct
rows, grouped by 'sessionid', and 'qf', paste
the contents of 'Office'
我们可以使用tidyverse。通过分隔符拆分“Office”并展开为“long”格式,然后获取由“sessionid”和“qf”分组的不同行,粘贴“Office”的内容
library(tidyverse)
separate_rows(df1, Office) %>%
distinct() %>%
group_by(sessionid, qf) %>%
summarise(Office = toString(Office))
# A tibble: 3 x 3
# Groups: sessionid [?]
# sessionid qf Office
# <int> <int> <chr>
#1 12 3 LON1, LON2, SEA2, SEA3
#2 12 4 DEL2, DEL1, LON1
#3 13 5 MAn1, LON1, DEL1
#2
2
Here is a base R way of doing it, it works as you'd expect, first split Office by the comma, remove duplicates, then paste back together again
这是一个基本的R方式,它按照你的预期工作,首先用逗号分割Office,删除重复项,然后再粘贴在一起
df$Office <- sapply(lapply(strsplit(df$Office, ","),
function(x) {
unique(x)
}),
function(x) {
paste(x, collapse = ",")
},
simplify = T)
or with %>%
或者%>%
df$Office <- df$Office %>%
strsplit(",") %>%
lapply(function(x){unique(x)}) %>%
sapply(function(x){paste(x,collapse = ",")},simplify = T)