如何在R中一次删除多个列的单个列中的重复值

Sample data

           sessionid             qf      Office
                12                3       LON1,LON2,LON1,SEA2,SEA3,SEA3,SEA3
                12                4       DEL2,DEL1,LON1,DEL1
                13                5       MAn1,LON1,DEL1,LON1

Here i want to remove duplicate values in column "OFFICE" by each row.

在这里,我想删除每行“OFFICE”列中的重复值。

Expected Output

            sessionid             qf      Office
                12                3       LON1,LON2,SEA2,SEA3
                12                4       DEL2,DEL1,LON1
                13                5       MAN1,LON1,DEL1

2 个解决方案

#1

We could use tidyverse. Split the 'Office' by the deimiter and expand to 'long' format, then get the distinct rows, grouped by 'sessionid', and 'qf', paste the contents of 'Office'

我们可以使用tidyverse。通过分隔符拆分“Office”并展开为“long”格式,然后获取由“sessionid”和“qf”分组的不同行,粘贴“Office”的内容

library(tidyverse)
separate_rows(df1, Office) %>%
      distinct() %>%
     group_by(sessionid, qf) %>% 
     summarise(Office = toString(Office))
# A tibble: 3 x 3
# Groups:   sessionid [?]
#  sessionid    qf                 Office
#      <int> <int>                  <chr>
#1        12     3 LON1, LON2, SEA2, SEA3
#2        12     4       DEL2, DEL1, LON1
#3        13     5       MAn1, LON1, DEL1

#2

Here is a base R way of doing it, it works as you'd expect, first split Office by the comma, remove duplicates, then paste back together again

这是一个基本的R方式,它按照你的预期工作,首先用逗号分割Office,删除重复项,然后再粘贴在一起

df$Office <- sapply(lapply(strsplit(df$Office, ","),
                           function(x) {
                             unique(x)
                           }),
                    function(x) {
                      paste(x, collapse = ",")
                    },
                    simplify = T)

or with %>%

或者%>%

df$Office <-  df$Office %>%
  strsplit(",") %>%
  lapply(function(x){unique(x)}) %>%
  sapply(function(x){paste(x,collapse = ",")},simplify = T)

#1