基于row.names合并dataframe中的行

时间:2020-12-22 07:22:52

Apologies if this question has already been answered but all the info. I have been able to find is to do with merging data-frames themselves or merging in a different way. I'd really appreciate any thoughts.

如果这个问题已经被回答,但所有的信息。我能够找到的是合并数据帧本身或以不同的方式合并。我很感激你的想法。

I have a very large but very simple data frame with approx. 22500 rows and 48 columns. I would like to merge some of the rows within the data frame based on the row names and am wondering if there is any way to do this.

我有一个非常大但非常简单的数据框架,大约。22500行48列。我想根据行名合并数据帧中的一些行,我想知道是否有什么方法可以做到这一点。

A portion of the data frame looks like this:

数据框的一部分看起来是这样的:

                         Treatment1 Treatment2 Treatment3 Treatment4 Treatment5
    Nasvi2EG000001t1         28         43         33         25         64
    Nasvi2EG000002t2          0          3          0          0          4
    Nasvi2EG000002t5          0          0          0          0          0
    Nasvi2EG000002t6          0          0          0          0          0
    Nasvi2EG000004t1          1          0          0          0          0
    Nasvi2EG000009t1          0          4          2          0          4
    Nasvi2EG000013t1         21          8         17         19          7
    Nasvi2EG000014t1          0          3          0          0          4
    Nasvi2EG000014t2          0          4          0          0          3

As you can see rows 2, 3 and 4 are identical in name until the digit after the "t" and same with rows 8 and 9. I'd like to merge the similarly named rows together...

如您所见,第2行、第3行和第4行在名称上是相同的,直到“t”后面的数字相同,第8行和第9行也是相同的。我想合并命名相似的行……

What I'd like to end up with is this:

最后我想说的是:

                     Treatment1 Treatment2 Treatment3 Treatment4 Treatment5
    Nasvi2EG000001t1         28         43         33         25         64
    Nasvi2EG000002            0          3          0          0          4
    Nasvi2EG000004t1          1          0          0          0          0
    Nasvi2EG000009t1          0          4          2          0          4
    Nasvi2EG000013t1         21          8         17         19          7
    Nasvi2EG000014            0          7          0          0          7

where the values in the rows that have been merged are summed.

将合并的行中的值相加。

Would be very grateful for any thoughts.

我会很感激你的任何想法。

Thanks!

谢谢!

1 个解决方案

#1


4  

Assuming your data.frame is called "SODF", create a vector from the row.names that strips out the "t+some digit" from the end of the row.names and use that as your aggregation variable.

假设您的data.frame被称为“SODF”,那么从row.names中创建一个向量,该名称从row.names末尾去掉“t+某个数字”,并将其用作聚合变量。

> aggvar <- gsub("(t[0-9]+$)", "", rownames(SODF))
> aggregate(. ~ aggvar, SODF, sum)
          aggvar Treatment1 Treatment2 Treatment3 Treatment4 Treatment5
1 Nasvi2EG000001         28         43         33         25         64
2 Nasvi2EG000002          0          3          0          0          4
3 Nasvi2EG000004          1          0          0          0          0
4 Nasvi2EG000009          0          4          2          0          4
5 Nasvi2EG000013         21          8         17         19          7
6 Nasvi2EG000014          0          7          0          0          7

#1


4  

Assuming your data.frame is called "SODF", create a vector from the row.names that strips out the "t+some digit" from the end of the row.names and use that as your aggregation variable.

假设您的data.frame被称为“SODF”,那么从row.names中创建一个向量,该名称从row.names末尾去掉“t+某个数字”,并将其用作聚合变量。

> aggvar <- gsub("(t[0-9]+$)", "", rownames(SODF))
> aggregate(. ~ aggvar, SODF, sum)
          aggvar Treatment1 Treatment2 Treatment3 Treatment4 Treatment5
1 Nasvi2EG000001         28         43         33         25         64
2 Nasvi2EG000002          0          3          0          0          4
3 Nasvi2EG000004          1          0          0          0          0
4 Nasvi2EG000009          0          4          2          0          4
5 Nasvi2EG000013         21          8         17         19          7
6 Nasvi2EG000014          0          7          0          0          7