如何在dplyr中执行groupby并计算出现次数

时间:2021-11-21 09:08:47

I have following dataframe in R

我在R中有以下数据框

 truck_no     start_time         end_time           ctr_no    time     type
  ABC123      20-05-2016 06:53   20-05-2016 08:53   ERT09      1.67      D
  ABC123      20-05-2016 06:53   20-05-2016 08:53   TRT12      1.67      R
  ABC123      20-05-2016 06:53   20-05-2016 08:53   ERT34      1.67      R
  ABC123      20-05-2016 06:53   20-05-2016 08:53   ERT33      1.67      D
  ERT123      21-05-2016 06:53   21-05-2016 08:53   QRT34      2.67      R
  ERT123      21-05-2016 06:53   21-05-2016 08:53   PRT33      2.67      D

Now My desired data frame is

现在我想要的数据框是

 truck_no     start_time         end_time           ctr_no   time    type
  ABC123      20-05-2016 06:53   20-05-2016 08:53   ERT09    1.67    2D2R
  ABC123      20-05-2016 06:53   20-05-2016 08:53   TRT12    2.67    1R1D

I want to count D's and R's and paste it in above manner time is taken as a average. How can I do it in dplyr ?

我想算上D和R并以上面的方式粘贴时间作为平均值。我怎么能在dplyr中做到这一点?

1 个解决方案

#1


4  

Here's a dplyr approach:

这是一个dplyr方法:

foo <- function(x) {y <- table(x); paste(rbind(y, names(y)), collapse = "")}

df %>% 
  group_by(truck_no) %>% 
  mutate(type = foo(type)) %>% 
  summarise_all(first)

## A tibble: 2 x 5
#  truck_no       start_time         end_time ctr_no  type
#    <fctr>           <fctr>           <fctr> <fctr> <chr>
#1   ABC123 20-05-2016 06:53 20-05-2016 08:53  ERT09  2D2R
#2   ERT123 21-05-2016 06:53 21-05-2016 08:53  QRT34  1D1R

In case you want the start and end time to be the mean per truck_no you could use the following extension:

如果您希望开始和结束时间是每辆truck_no的平均值,您可以使用以下扩展名:

df %>% 
  group_by(truck_no) %>% 
  mutate_at(vars(ends_with("_time")), 
            ~mean(as.POSIXct(as.character(.), format="%d-%m-%Y %H:%M"))) %>% 
  mutate(type = foo(type)) %>% 
  summarise_all(first)
## A tibble: 2 x 5
#  truck_no          start_time            end_time ctr_no  type
#    <fctr>              <dttm>              <dttm> <fctr> <chr>
#1   ABC123 2016-05-20 06:53:00 2016-05-20 08:23:00  ERT09  2D2R
#2   ERT123 2016-05-21 06:53:00 2016-05-21 08:53:00  QRT34  1D1R

#1


4  

Here's a dplyr approach:

这是一个dplyr方法:

foo <- function(x) {y <- table(x); paste(rbind(y, names(y)), collapse = "")}

df %>% 
  group_by(truck_no) %>% 
  mutate(type = foo(type)) %>% 
  summarise_all(first)

## A tibble: 2 x 5
#  truck_no       start_time         end_time ctr_no  type
#    <fctr>           <fctr>           <fctr> <fctr> <chr>
#1   ABC123 20-05-2016 06:53 20-05-2016 08:53  ERT09  2D2R
#2   ERT123 21-05-2016 06:53 21-05-2016 08:53  QRT34  1D1R

In case you want the start and end time to be the mean per truck_no you could use the following extension:

如果您希望开始和结束时间是每辆truck_no的平均值,您可以使用以下扩展名:

df %>% 
  group_by(truck_no) %>% 
  mutate_at(vars(ends_with("_time")), 
            ~mean(as.POSIXct(as.character(.), format="%d-%m-%Y %H:%M"))) %>% 
  mutate(type = foo(type)) %>% 
  summarise_all(first)
## A tibble: 2 x 5
#  truck_no          start_time            end_time ctr_no  type
#    <fctr>              <dttm>              <dttm> <fctr> <chr>
#1   ABC123 2016-05-20 06:53:00 2016-05-20 08:23:00  ERT09  2D2R
#2   ERT123 2016-05-21 06:53:00 2016-05-21 08:53:00  QRT34  1D1R