R在列中找到另一列的唯一值的相同值

时间:2022-11-06 10:04:29

here´s some sample data

这是一些样本数据

sample = data.frame("col1" = c("val1", "val1", "val1", "val1", "val2", "val2", "val2", "val3", "val3", "val3", "val3"),
                    "col2" = c("this", "that", "some", "thing", "thing", "that", "some", "diff", "some", "this", "that"))

I would like to determine each entry of column col2 which appears of every unique value of column col1. Is this somehow possible? This would be the result of the sample data:

我想确定列col2的每个条目,它出现在列col1的每个唯一值中。这有点可能吗?这将是样本数据的结果:

result = c("that", "some")

Thanks in advance.

提前致谢。

5 个解决方案

#1


1  

What you need is intersect. Here's a quick and dirty way:

你需要的是相交的。这是一个快速而肮脏的方式:

CODE

library(data.table)
dt <- as.data.table(sample) 

# Split data.table into different chunks based on unique values in col1
# output is a list where each entry is a data.table 
l <- split(dt, by = "col1")

# Find the intersection of all values in col2 
Reduce(intersect, lapply(1:length(l), function(z) as.character(l[[z]]$col2)))

OUTPUT

> Reduce(intersect, lapply(1:length(l), function(z) as.character(l[[z]]$col2)))
[1] "that" "some"

#2


1  

this is one way to do it using dplyr:

这是使用dplyr执行此操作的一种方法:

split(sample,sample$col1)%>%
Reduce(function(dtf1,dtf2) inner_join(dtf1,dtf2,by="col2"), .)%>%select(col2)%>%print()

  col2
1 that
2 some

#3


1  

A (quick and dirty) solution in base R:

基础R中的(快速和脏)解决方案:

sample_list <- split(sample, sample$col1)
for (i in 1:length(sample_list)) sample_list[[i]] <- sample_list[[i]]$col2
Reduce(intersect, sample_list)
[1] "that" "some"

EDIT:

A data.table solution inspired by Matt's dplyr answer:

一个data.table解决方案的灵感来自Matt的dplyr答案:

library(data.table)
setDT(sample)
n <- uniqueN(sample$col1)
sample[, .N, by = .(col1, col2)][, .N, by = col2][N == n, col2]
[1] that some

This solution will be fast on a big dataset.

这个解决方案将在大数据集上快速完成。

EDIT 2:

Playing around with dcast which is available in data.table:

使用data.table中提供的dcast:

present_in <- colSums(!is.na(dcast(sample, col1 ~ col2, value.var = "col2")))
names(present_in)[present_in == 3][-1]
[1] "some" "that"

#4


1  

Here a bit of a round about way using dplyr.

这里有一些关于使用dplyr的方法。

require(dplyr)

sets <- length(unique(sample$col1))

s <- sample %>%
    group_by(col2) %>%
    summarise(n = n_distinct()) %>%
    filter(n == sets)

result <- s$col2
[1] some that

#5


0  

Another dirty base R solution:

另一个脏基R解决方案:

names(which(table(unlist(aggregate(sample$col2, list(sample$col1), unique)[, 2])) == length(unique(sample$col1))))

[1] "some" "that"

#1


1  

What you need is intersect. Here's a quick and dirty way:

你需要的是相交的。这是一个快速而肮脏的方式:

CODE

library(data.table)
dt <- as.data.table(sample) 

# Split data.table into different chunks based on unique values in col1
# output is a list where each entry is a data.table 
l <- split(dt, by = "col1")

# Find the intersection of all values in col2 
Reduce(intersect, lapply(1:length(l), function(z) as.character(l[[z]]$col2)))

OUTPUT

> Reduce(intersect, lapply(1:length(l), function(z) as.character(l[[z]]$col2)))
[1] "that" "some"

#2


1  

this is one way to do it using dplyr:

这是使用dplyr执行此操作的一种方法:

split(sample,sample$col1)%>%
Reduce(function(dtf1,dtf2) inner_join(dtf1,dtf2,by="col2"), .)%>%select(col2)%>%print()

  col2
1 that
2 some

#3


1  

A (quick and dirty) solution in base R:

基础R中的(快速和脏)解决方案:

sample_list <- split(sample, sample$col1)
for (i in 1:length(sample_list)) sample_list[[i]] <- sample_list[[i]]$col2
Reduce(intersect, sample_list)
[1] "that" "some"

EDIT:

A data.table solution inspired by Matt's dplyr answer:

一个data.table解决方案的灵感来自Matt的dplyr答案:

library(data.table)
setDT(sample)
n <- uniqueN(sample$col1)
sample[, .N, by = .(col1, col2)][, .N, by = col2][N == n, col2]
[1] that some

This solution will be fast on a big dataset.

这个解决方案将在大数据集上快速完成。

EDIT 2:

Playing around with dcast which is available in data.table:

使用data.table中提供的dcast:

present_in <- colSums(!is.na(dcast(sample, col1 ~ col2, value.var = "col2")))
names(present_in)[present_in == 3][-1]
[1] "some" "that"

#4


1  

Here a bit of a round about way using dplyr.

这里有一些关于使用dplyr的方法。

require(dplyr)

sets <- length(unique(sample$col1))

s <- sample %>%
    group_by(col2) %>%
    summarise(n = n_distinct()) %>%
    filter(n == sets)

result <- s$col2
[1] some that

#5


0  

Another dirty base R solution:

另一个脏基R解决方案:

names(which(table(unlist(aggregate(sample$col2, list(sample$col1), unique)[, 2])) == length(unique(sample$col1))))

[1] "some" "that"