如何通过两列中的值匹配对并合并数据框?

时间:2022-09-08 12:19:21

Lets say I have a dataframe

可以说我有一个数据帧

x    y    val
A    B    5
A    C    3
B    A    7
B    C    9
C    A    1

As you can see there are two pairs matching by x and y:

如您所见,有两对匹配x和y:

Pair 1: A B 5 and B A 7

对1:A B 5和B A 7

Pair 2: A C 3 and C A 1

对2:A C 3和C A 1

I would like to merge them to A B 12 and A C 4 and leave the B C 9 as it doesn't have a pair (C B).

我想将它们合并到A B 12和A C 4并留下B C 9,因为它没有一对(C B)。

The final dataframe should look like this:

最终的数据框应如下所示:

x    y    val
A    B    12
A    C    4
B    C    9

How can I achieve this in R?

我怎样才能在R中实现这一目标?

3 个解决方案

#1


2  

Here's one solution with dplyr:

这是dplyr的一个解决方案:

library(dplyr)

df %>% 
  mutate(var = paste(pmin(x, y), pmax(x, y))) %>% 
  group_by(var) %>% 
  summarise(val = sum(val))
# A tibble: 3 x 2
  var     val
  <chr> <int>
1 A B      12
2 A C       4
3 B C       9

Add separate(var, c("x", "y")) to the end of the chain if you want the x and y columns as Melissa Key mentions.

如果您希望将x和y列作为Melissa Key提及,请将单独的(var,c(“x”,“y”))添加到链的末尾。

#2


2  

First ensure that x and y are character giving DF_c and then sort them giving DF_s. Finally perform the aggregation. No packages are used. The first line would not be needed if x and y were already character.

首先确保x和y是赋予DF_c的字符,然后对它们进行排序以给出DF_s。最后执行聚合。没有包使用。如果x和y已经是字符,则不需要第一行。

DF_c <- transform(DF, x = as.character(x), y = as.character(y))
DF_s <- transform(DF_c, x = pmin(x, y), y = pmax(x, y))
aggregate(val ~ x + y, DF_s, sum)

giving:

  x y val
1 A B  12
2 A C   4
3 B C   9

#3


0  

One can group by row_number() to sort and combine columns in sorted order to create a order independent pair.

可以按row_number()进行分组,以按排序顺序对列进行排序和组合,以创建与订单无关的对。

Note: Below solution can be evolve to work for more than 2 columns pairing as well. e.g.treating A B C, A C B or B C A as same group.

注意:以下解决方案可以演变为适用于超过2列的配对。例如,将A B C,A C B或B C A作为同一组进行处理。

library(dplyr)
library(tidyr)
df %>%
  group_by(row_number()) %>%
  mutate(xy = paste0(sort(c(x,y)),collapse=",")) %>%
  group_by(xy) %>%
  summarise(val = sum(val)) %>% 
  separate(xy, c("x","y"))

## A tibble: 3 x 3
#  x     y       val
#* <chr> <chr> <int>
#1 A     B        12
#2 A     C         4
#3 B     C         9

Data:

df <- read.table(text = 
"x    y    val
A    B    5
A    C    3
B    A    7
B    C    9
C    A    1",
header = TRUE, stringsAsFactors = FALSE)

#1


2  

Here's one solution with dplyr:

这是dplyr的一个解决方案:

library(dplyr)

df %>% 
  mutate(var = paste(pmin(x, y), pmax(x, y))) %>% 
  group_by(var) %>% 
  summarise(val = sum(val))
# A tibble: 3 x 2
  var     val
  <chr> <int>
1 A B      12
2 A C       4
3 B C       9

Add separate(var, c("x", "y")) to the end of the chain if you want the x and y columns as Melissa Key mentions.

如果您希望将x和y列作为Melissa Key提及,请将单独的(var,c(“x”,“y”))添加到链的末尾。

#2


2  

First ensure that x and y are character giving DF_c and then sort them giving DF_s. Finally perform the aggregation. No packages are used. The first line would not be needed if x and y were already character.

首先确保x和y是赋予DF_c的字符,然后对它们进行排序以给出DF_s。最后执行聚合。没有包使用。如果x和y已经是字符,则不需要第一行。

DF_c <- transform(DF, x = as.character(x), y = as.character(y))
DF_s <- transform(DF_c, x = pmin(x, y), y = pmax(x, y))
aggregate(val ~ x + y, DF_s, sum)

giving:

  x y val
1 A B  12
2 A C   4
3 B C   9

#3


0  

One can group by row_number() to sort and combine columns in sorted order to create a order independent pair.

可以按row_number()进行分组,以按排序顺序对列进行排序和组合,以创建与订单无关的对。

Note: Below solution can be evolve to work for more than 2 columns pairing as well. e.g.treating A B C, A C B or B C A as same group.

注意:以下解决方案可以演变为适用于超过2列的配对。例如,将A B C,A C B或B C A作为同一组进行处理。

library(dplyr)
library(tidyr)
df %>%
  group_by(row_number()) %>%
  mutate(xy = paste0(sort(c(x,y)),collapse=",")) %>%
  group_by(xy) %>%
  summarise(val = sum(val)) %>% 
  separate(xy, c("x","y"))

## A tibble: 3 x 3
#  x     y       val
#* <chr> <chr> <int>
#1 A     B        12
#2 A     C         4
#3 B     C         9

Data:

df <- read.table(text = 
"x    y    val
A    B    5
A    C    3
B    A    7
B    C    9
C    A    1",
header = TRUE, stringsAsFactors = FALSE)