Apologies if this question has already been answered but all the info. I have been able to find is to do with merging data-frames themselves or merging in a different way. I'd really appreciate any thoughts.
如果这个问题已经被回答,但所有的信息。我能够找到的是合并数据帧本身或以不同的方式合并。我很感激你的想法。
I have a very large but very simple data frame with approx. 22500 rows and 48 columns. I would like to merge some of the rows within the data frame based on the row names and am wondering if there is any way to do this.
我有一个非常大但非常简单的数据框架,大约。22500行48列。我想根据行名合并数据帧中的一些行,我想知道是否有什么方法可以做到这一点。
A portion of the data frame looks like this:
数据框的一部分看起来是这样的:
Treatment1 Treatment2 Treatment3 Treatment4 Treatment5
Nasvi2EG000001t1 28 43 33 25 64
Nasvi2EG000002t2 0 3 0 0 4
Nasvi2EG000002t5 0 0 0 0 0
Nasvi2EG000002t6 0 0 0 0 0
Nasvi2EG000004t1 1 0 0 0 0
Nasvi2EG000009t1 0 4 2 0 4
Nasvi2EG000013t1 21 8 17 19 7
Nasvi2EG000014t1 0 3 0 0 4
Nasvi2EG000014t2 0 4 0 0 3
As you can see rows 2, 3 and 4 are identical in name until the digit after the "t" and same with rows 8 and 9. I'd like to merge the similarly named rows together...
如您所见,第2行、第3行和第4行在名称上是相同的,直到“t”后面的数字相同,第8行和第9行也是相同的。我想合并命名相似的行……
What I'd like to end up with is this:
最后我想说的是:
Treatment1 Treatment2 Treatment3 Treatment4 Treatment5
Nasvi2EG000001t1 28 43 33 25 64
Nasvi2EG000002 0 3 0 0 4
Nasvi2EG000004t1 1 0 0 0 0
Nasvi2EG000009t1 0 4 2 0 4
Nasvi2EG000013t1 21 8 17 19 7
Nasvi2EG000014 0 7 0 0 7
where the values in the rows that have been merged are summed.
将合并的行中的值相加。
Would be very grateful for any thoughts.
我会很感激你的任何想法。
Thanks!
谢谢!
1 个解决方案
#1
4
Assuming your data.frame
is called "SODF", create a vector from the row.names
that strips out the "t+some digit" from the end of the row.names
and use that as your aggregation variable.
假设您的data.frame被称为“SODF”,那么从row.names中创建一个向量,该名称从row.names末尾去掉“t+某个数字”,并将其用作聚合变量。
> aggvar <- gsub("(t[0-9]+$)", "", rownames(SODF))
> aggregate(. ~ aggvar, SODF, sum)
aggvar Treatment1 Treatment2 Treatment3 Treatment4 Treatment5
1 Nasvi2EG000001 28 43 33 25 64
2 Nasvi2EG000002 0 3 0 0 4
3 Nasvi2EG000004 1 0 0 0 0
4 Nasvi2EG000009 0 4 2 0 4
5 Nasvi2EG000013 21 8 17 19 7
6 Nasvi2EG000014 0 7 0 0 7
#1
4
Assuming your data.frame
is called "SODF", create a vector from the row.names
that strips out the "t+some digit" from the end of the row.names
and use that as your aggregation variable.
假设您的data.frame被称为“SODF”,那么从row.names中创建一个向量,该名称从row.names末尾去掉“t+某个数字”,并将其用作聚合变量。
> aggvar <- gsub("(t[0-9]+$)", "", rownames(SODF))
> aggregate(. ~ aggvar, SODF, sum)
aggvar Treatment1 Treatment2 Treatment3 Treatment4 Treatment5
1 Nasvi2EG000001 28 43 33 25 64
2 Nasvi2EG000002 0 3 0 0 4
3 Nasvi2EG000004 1 0 0 0 0
4 Nasvi2EG000009 0 4 2 0 4
5 Nasvi2EG000013 21 8 17 19 7
6 Nasvi2EG000014 0 7 0 0 7