Lets say I have a dataframe
可以说我有一个数据帧
x y val
A B 5
A C 3
B A 7
B C 9
C A 1
As you can see there are two pairs matching by x
and y
:
如您所见,有两对匹配x和y:
Pair 1: A B 5
and B A 7
对1:A B 5和B A 7
Pair 2: A C 3
and C A 1
对2:A C 3和C A 1
I would like to merge them to A B 12
and A C 4
and leave the B C 9
as it doesn't have a pair (C B
).
我想将它们合并到A B 12和A C 4并留下B C 9,因为它没有一对(C B)。
The final dataframe should look like this:
最终的数据框应如下所示:
x y val
A B 12
A C 4
B C 9
How can I achieve this in R?
我怎样才能在R中实现这一目标?
3 个解决方案
#1
2
Here's one solution with dplyr
:
这是dplyr的一个解决方案:
library(dplyr)
df %>%
mutate(var = paste(pmin(x, y), pmax(x, y))) %>%
group_by(var) %>%
summarise(val = sum(val))
# A tibble: 3 x 2 var val <chr> <int> 1 A B 12 2 A C 4 3 B C 9
Add separate(var, c("x", "y"))
to the end of the chain if you want the x
and y
columns as Melissa Key mentions.
如果您希望将x和y列作为Melissa Key提及,请将单独的(var,c(“x”,“y”))添加到链的末尾。
#2
2
First ensure that x
and y
are character giving DF_c
and then sort them giving DF_s
. Finally perform the aggregation. No packages are used. The first line would not be needed if x
and y
were already character.
首先确保x和y是赋予DF_c的字符,然后对它们进行排序以给出DF_s。最后执行聚合。没有包使用。如果x和y已经是字符,则不需要第一行。
DF_c <- transform(DF, x = as.character(x), y = as.character(y))
DF_s <- transform(DF_c, x = pmin(x, y), y = pmax(x, y))
aggregate(val ~ x + y, DF_s, sum)
giving:
x y val
1 A B 12
2 A C 4
3 B C 9
#3
0
One can group by row_number()
to sort
and combine columns in sorted order to create a order independent pair
.
可以按row_number()进行分组,以按排序顺序对列进行排序和组合,以创建与订单无关的对。
Note: Below solution can be evolve to work for more than 2 columns pairing as well. e.g.treating A B C
, A C B
or B C A
as same group.
注意:以下解决方案可以演变为适用于超过2列的配对。例如,将A B C,A C B或B C A作为同一组进行处理。
library(dplyr)
library(tidyr)
df %>%
group_by(row_number()) %>%
mutate(xy = paste0(sort(c(x,y)),collapse=",")) %>%
group_by(xy) %>%
summarise(val = sum(val)) %>%
separate(xy, c("x","y"))
## A tibble: 3 x 3
# x y val
#* <chr> <chr> <int>
#1 A B 12
#2 A C 4
#3 B C 9
Data:
df <- read.table(text =
"x y val
A B 5
A C 3
B A 7
B C 9
C A 1",
header = TRUE, stringsAsFactors = FALSE)
#1
2
Here's one solution with dplyr
:
这是dplyr的一个解决方案:
library(dplyr)
df %>%
mutate(var = paste(pmin(x, y), pmax(x, y))) %>%
group_by(var) %>%
summarise(val = sum(val))
# A tibble: 3 x 2 var val <chr> <int> 1 A B 12 2 A C 4 3 B C 9
Add separate(var, c("x", "y"))
to the end of the chain if you want the x
and y
columns as Melissa Key mentions.
如果您希望将x和y列作为Melissa Key提及,请将单独的(var,c(“x”,“y”))添加到链的末尾。
#2
2
First ensure that x
and y
are character giving DF_c
and then sort them giving DF_s
. Finally perform the aggregation. No packages are used. The first line would not be needed if x
and y
were already character.
首先确保x和y是赋予DF_c的字符,然后对它们进行排序以给出DF_s。最后执行聚合。没有包使用。如果x和y已经是字符,则不需要第一行。
DF_c <- transform(DF, x = as.character(x), y = as.character(y))
DF_s <- transform(DF_c, x = pmin(x, y), y = pmax(x, y))
aggregate(val ~ x + y, DF_s, sum)
giving:
x y val
1 A B 12
2 A C 4
3 B C 9
#3
0
One can group by row_number()
to sort
and combine columns in sorted order to create a order independent pair
.
可以按row_number()进行分组,以按排序顺序对列进行排序和组合,以创建与订单无关的对。
Note: Below solution can be evolve to work for more than 2 columns pairing as well. e.g.treating A B C
, A C B
or B C A
as same group.
注意:以下解决方案可以演变为适用于超过2列的配对。例如,将A B C,A C B或B C A作为同一组进行处理。
library(dplyr)
library(tidyr)
df %>%
group_by(row_number()) %>%
mutate(xy = paste0(sort(c(x,y)),collapse=",")) %>%
group_by(xy) %>%
summarise(val = sum(val)) %>%
separate(xy, c("x","y"))
## A tibble: 3 x 3
# x y val
#* <chr> <chr> <int>
#1 A B 12
#2 A C 4
#3 B C 9
Data:
df <- read.table(text =
"x y val
A B 5
A C 3
B A 7
B C 9
C A 1",
header = TRUE, stringsAsFactors = FALSE)