分组和计数以获得一个closerate。

时间:2020-12-14 07:35:03

I want to count per country the number of times the status is open and the number of times the status is closed. Then calculate the closerate per country.

我想计算每个国家的状态打开的次数和状态关闭的次数。然后计算每个国家的紧密程度。

Data:

数据:

customer <- c(1,2,3,4,5,6,7,8,9)
country <- c('BE', 'NL', 'NL','NL','BE','NL','BE','BE','NL')
closeday <- c('2017-08-23', '2017-08-05', '2017-08-22', '2017-08-26', 
'2017-08-25', '2017-08-13', '2017-08-30', '2017-08-05', '2017-08-23')
closeday <- as.Date(closeday)

df <- data.frame(customer,country,closeday)

Adding status:

添加状态:

df$status <- ifelse(df$closeday < '2017-08-20', 'open', 'closed') 

  customer country   closeday status
1        1      BE 2017-08-23 closed
2        2      NL 2017-08-05   open
3        3      NL 2017-08-22 closed
4        4      NL 2017-08-26 closed
5        5      BE 2017-08-25 closed
6        6      NL 2017-08-13   open
7        7      BE 2017-08-30 closed
8        8      BE 2017-08-05   open
9        9      NL 2017-08-23 closed

Calculation closerate

计算closerate

closerate <- length(which(df$status == 'closed')) / 
(length(which(df$status == 'closed')) + length(which(df$status == 'open')))

[1] 0.6666667

Obviously, this is the closerate for the total. The challenge is to get the closerate per country. I tried adding the closerate calculation to df by:

显然,这是最接近的。我们面临的挑战是让每个国家都保持紧密联系。我试着将closerate计算添加到df中:

df$closerate <- length(which(df$status == 'closed')) / 
(length(which(df$status == 'closed')) + length(which(df$status == 'open')))

But it gives all lines a closerate of 0.66 because I'm not grouping. I believe I should not use the length function because counting can be done by grouping. I read some information about using dplyr to count logical outputs per group but this didn't work out.

但是它给了所有的线0。66,因为我没有分组。我认为我不应该使用长度函数,因为计数可以通过分组来完成。我阅读了一些关于使用dplyr计算每个组的逻辑输出的信息,但是这并没有成功。

This is the desired output:

这是期望的输出:

分组和计数以获得一个closerate。

5 个解决方案

#1


8  

aggregate(list(output = df$status == "closed"),
          list(country = df$country),
          function(x)
              c(close = sum(x),
                open = length(x) - sum(x),
                rate = mean(x)))
#  country output.close output.open output.rate
#1      BE         3.00        1.00        0.75
#2      NL         3.00        2.00        0.60

There was a solution using table in the comments which appears to have been deleted. Anyway, you could also use table

注释中有一个使用表的解决方案,似乎已被删除。不管怎样,你也可以用表格

output = as.data.frame.matrix(table(df$country, df$status))
output$closerate = output$closed/(output$closed + output$open)
output
#   closed open closerate
#BE      3    1      0.75
#NL      3    2      0.60

#2


4  

You can use tapply:

您可以使用tapply:

data.frame(open=tapply(df$status=="open", df$country, sum),
           closed=tapply(df$status=="closed", df$country, sum)
           closerate=tapply(df$status=="closed", df$country, mean))`

#3


4  

A data.table method would be.

一个数据。表法。

library(data.table)
setDT(df)[, {temp <- status=="closed"; # store temporary logical variable
            .(closed=sum(temp), open=sum(!temp), closeRate=mean(temp))}, # calculate stuff
          by=country] # by country

which returns

它返回

   country closed open closeRate
1:      BE      3    1      0.75
2:      NL      3    2      0.60

#4


2  

Here is a dplyr solution.

这是dplyr溶液。

output <- df %>%
  count(country, status) %>%
  group_by(country) %>%
  mutate(total = sum(n)) %>%
  mutate(percent = n/total)

Returns...

返回……

output
country status   n total percent
BE      closed   3  4    0.75
BE      open     1  4    0.25
NL      closed   3  5    0.60
NL      open     2  5    0.40

#5


1  

Here's a quick solution with tidyverse:

这里有一个关于tidyverse的快速解决方案:

library(dplyr)
df %>% group_by(country) %>% 
  mutate(status =ifelse(closeday < '2017-08-20', 'open', 'closed'),
         closerate=mean(status=="closed"))

Returning:

返回:

# A tibble: 9 x 5
# Groups:   country [2]
  customer country   closeday status closerate
     <dbl>  <fctr>     <date>  <chr>     <dbl>
1        1      BE 2017-08-23 closed      0.75
2        2      NL 2017-08-05   open      0.60
3        3      NL 2017-08-22 closed      0.60
4        4      NL 2017-08-26 closed      0.60
5        5      BE 2017-08-25 closed      0.75
6        6      NL 2017-08-13   open      0.60
7        7      BE 2017-08-30 closed      0.75
8        8      BE 2017-08-05   open      0.75
9        9      NL 2017-08-23 closed      0.60

Here I am utilizing the coercion of logicals into integer when the vector of TRUE/FALSE is put into the mean() function.

在这里,当将真/假的向量放入均值()函数时,我将把逻辑项强制化为整数。

Alternatively, with data.table:

另外,data.table:

library(data.table)
setDT(df)[,status:=ifelse(closeday < '2017-08-20', 'open', 'closed')]
df[, .(closerate=mean(status=="closed")), by=country]

#1


8  

aggregate(list(output = df$status == "closed"),
          list(country = df$country),
          function(x)
              c(close = sum(x),
                open = length(x) - sum(x),
                rate = mean(x)))
#  country output.close output.open output.rate
#1      BE         3.00        1.00        0.75
#2      NL         3.00        2.00        0.60

There was a solution using table in the comments which appears to have been deleted. Anyway, you could also use table

注释中有一个使用表的解决方案,似乎已被删除。不管怎样,你也可以用表格

output = as.data.frame.matrix(table(df$country, df$status))
output$closerate = output$closed/(output$closed + output$open)
output
#   closed open closerate
#BE      3    1      0.75
#NL      3    2      0.60

#2


4  

You can use tapply:

您可以使用tapply:

data.frame(open=tapply(df$status=="open", df$country, sum),
           closed=tapply(df$status=="closed", df$country, sum)
           closerate=tapply(df$status=="closed", df$country, mean))`

#3


4  

A data.table method would be.

一个数据。表法。

library(data.table)
setDT(df)[, {temp <- status=="closed"; # store temporary logical variable
            .(closed=sum(temp), open=sum(!temp), closeRate=mean(temp))}, # calculate stuff
          by=country] # by country

which returns

它返回

   country closed open closeRate
1:      BE      3    1      0.75
2:      NL      3    2      0.60

#4


2  

Here is a dplyr solution.

这是dplyr溶液。

output <- df %>%
  count(country, status) %>%
  group_by(country) %>%
  mutate(total = sum(n)) %>%
  mutate(percent = n/total)

Returns...

返回……

output
country status   n total percent
BE      closed   3  4    0.75
BE      open     1  4    0.25
NL      closed   3  5    0.60
NL      open     2  5    0.40

#5


1  

Here's a quick solution with tidyverse:

这里有一个关于tidyverse的快速解决方案:

library(dplyr)
df %>% group_by(country) %>% 
  mutate(status =ifelse(closeday < '2017-08-20', 'open', 'closed'),
         closerate=mean(status=="closed"))

Returning:

返回:

# A tibble: 9 x 5
# Groups:   country [2]
  customer country   closeday status closerate
     <dbl>  <fctr>     <date>  <chr>     <dbl>
1        1      BE 2017-08-23 closed      0.75
2        2      NL 2017-08-05   open      0.60
3        3      NL 2017-08-22 closed      0.60
4        4      NL 2017-08-26 closed      0.60
5        5      BE 2017-08-25 closed      0.75
6        6      NL 2017-08-13   open      0.60
7        7      BE 2017-08-30 closed      0.75
8        8      BE 2017-08-05   open      0.75
9        9      NL 2017-08-23 closed      0.60

Here I am utilizing the coercion of logicals into integer when the vector of TRUE/FALSE is put into the mean() function.

在这里,当将真/假的向量放入均值()函数时,我将把逻辑项强制化为整数。

Alternatively, with data.table:

另外,data.table:

library(data.table)
setDT(df)[,status:=ifelse(closeday < '2017-08-20', 'open', 'closed')]
df[, .(closerate=mean(status=="closed")), by=country]