I have a data frame structured like this :
我有一个这样的数据框:
exdataframe <- data.frame(c(rep("ma1",4),rep("ma2",3),rep("ma3",2),rep("ma4",1)),
c(rep("1",4),rep("2",3),rep("3",2),rep("1",1)),
c(rep("xxx",4),rep("yyyy",3),rep("zz",2),rep("xxx",1)),
c("2018-05-27","2018-06-24", "2018-07-01" ,"2018-07-08","2018-06-24", "2018-07-01" ,"2018-07-08","2018-05-27","2018-06-24", "2018-07-01"),
c(112,1,3,0,0,0,3,19,45,9),
c(1000,0,0,0,200,300,8,90.9,0,1))
colnames(exdataframe) <- c("ID","classid","classname","date","x","y")
I want to group this data frame with by column "ID" while summing the columns x and y and keeping all of the columns. When I do :
我想用列“ID”对这个数据框进行分组,同时对列x和y进行求和并保留所有列。当我做 :
exdataframe_gr <- exdataframe %>% group_by(ID) %>% filter(x == sum(x),y == sum(y))
I am getting a data frame with only one row which is the row corresponding one entry in the original data frame. The output that I want is :
我得到的数据帧只有一行,这是与原始数据帧中的一个条目相对应的行。我想要的输出是:
ID ClassID Classname Date X Y
ma1 1 xxx "could be anything" 116 1000
ma2 2 yyyy "could be anything" 3 508
ma3 3 zz "could be anything" 64 90.9
ma4 1 xxx "could be anything" 9 1
The date column could be anyhting - I dont care about its value. My original data is much bigger than this - 2000 rows, 45 columns.
日期栏可能是任何 - 我不关心它的价值。我的原始数据比这大得多 - 2000行,45列。
I searched internet and here but could not find a similar example. Any help is appreciated as I can not find a solution.
我搜索了互联网和这里,但找不到类似的例子。任何帮助都表示赞赏,因为我找不到解决方案。
2 个解决方案
#1
0
library(tidyverse)
exdataframe %>% group_by(ID)%>% mutate_if(is.factor,as.character) %>% nest() %>%
mutate(classid = map_chr(data,function(x) as.character(x[,'classid'][1,])),
classname = map_chr(data,function(x) as.character(x[,'classname'][1,])),
date = map_chr(data, function(x) paste(x[,'date'][1], collapse = " | ")),
x = map_dbl(data,function(x)sum(x[,'x'])),
y = map_dbl(data,function(x)sum(x[,'y']))) %>%
select(-data)
# A tibble: 4 x 6
ID classid classname date
x y
<fct> <chr> <chr> <chr> <dbl> <dbl>
1 ma1 1 xxx "c(\"2018-05-27\", \"2018-06-24\", \"2018-~ 116 1.00e3
2 ma2 2 yyyy "c(\"2018-06-24\", \"2018-07-01\", \"2018-~ 3.00 5.08e2
3 ma3 3 zz "c(\"2018-05-27\", \"2018-06-24\")" 64.0 9.09e1
4 ma4 1 xxx 2018-07-01 9.00 1.00e0
#2
0
Tell me if that satysfying you. Unfortunately ther's no Date
column, however as I see it "could be anything"
so I suppose you don't need it.
告诉我,如果这让你感到满意。不幸的是,没有Date列,但是我认为它“可能是任何东西”所以我想你不需要它。
exdataframe %>%
group_by(ID, classid, classname) %>%
summarise(x = sum(x),y=sum(y))
# A tibble: 4 x 5
# Groups: ID, classid [?]
ID classid classname x y
<fct> <fct> <fct> <dbl> <dbl>
1 ma1 1 xxx 116 1000
2 ma2 2 yyyy 3 508
3 ma3 3 zz 64 90.9
4 ma4 1 xxx 9 1
Solution which would keep all columns:
保留所有列的解决方案:
exdataframe_gr <- exdataframe %>%
group_by(ID) %>%
mutate(x = sum(x),y=sum(y)) %>%
ungroup() %>%
distinct(ID, .keep_all = TRUE)
# A tibble: 4 x 6
ID classid classname date x y
<fct> <fct> <fct> <fct> <dbl> <dbl>
1 ma1 1 xxx 2018-05-27 116 1000
2 ma2 2 yyyy 2018-06-24 3 508
3 ma3 3 zz 2018-05-27 64 90.9
4 ma4 1 xxx 2018-07-01 9 1
#1
0
library(tidyverse)
exdataframe %>% group_by(ID)%>% mutate_if(is.factor,as.character) %>% nest() %>%
mutate(classid = map_chr(data,function(x) as.character(x[,'classid'][1,])),
classname = map_chr(data,function(x) as.character(x[,'classname'][1,])),
date = map_chr(data, function(x) paste(x[,'date'][1], collapse = " | ")),
x = map_dbl(data,function(x)sum(x[,'x'])),
y = map_dbl(data,function(x)sum(x[,'y']))) %>%
select(-data)
# A tibble: 4 x 6
ID classid classname date
x y
<fct> <chr> <chr> <chr> <dbl> <dbl>
1 ma1 1 xxx "c(\"2018-05-27\", \"2018-06-24\", \"2018-~ 116 1.00e3
2 ma2 2 yyyy "c(\"2018-06-24\", \"2018-07-01\", \"2018-~ 3.00 5.08e2
3 ma3 3 zz "c(\"2018-05-27\", \"2018-06-24\")" 64.0 9.09e1
4 ma4 1 xxx 2018-07-01 9.00 1.00e0
#2
0
Tell me if that satysfying you. Unfortunately ther's no Date
column, however as I see it "could be anything"
so I suppose you don't need it.
告诉我,如果这让你感到满意。不幸的是,没有Date列,但是我认为它“可能是任何东西”所以我想你不需要它。
exdataframe %>%
group_by(ID, classid, classname) %>%
summarise(x = sum(x),y=sum(y))
# A tibble: 4 x 5
# Groups: ID, classid [?]
ID classid classname x y
<fct> <fct> <fct> <dbl> <dbl>
1 ma1 1 xxx 116 1000
2 ma2 2 yyyy 3 508
3 ma3 3 zz 64 90.9
4 ma4 1 xxx 9 1
Solution which would keep all columns:
保留所有列的解决方案:
exdataframe_gr <- exdataframe %>%
group_by(ID) %>%
mutate(x = sum(x),y=sum(y)) %>%
ungroup() %>%
distinct(ID, .keep_all = TRUE)
# A tibble: 4 x 6
ID classid classname date x y
<fct> <fct> <fct> <fct> <dbl> <dbl>
1 ma1 1 xxx 2018-05-27 116 1000
2 ma2 2 yyyy 2018-06-24 3 508
3 ma3 3 zz 2018-05-27 64 90.9
4 ma4 1 xxx 2018-07-01 9 1