计算因子在R中出现的次数,报告的次数为零

时间:2021-02-20 07:39:14

I want to count the number of occurrences of a factor in a data frame. For example, to count the number of events of a given type in the code below:

我想要计算数据框中某个因素出现的次数。例如,在下面的代码中计算给定类型的事件数:

library(plyr)
events <- data.frame(type = c('A', 'A', 'B'),
                       quantity = c(1, 2, 1))
ddply(events, .(type), summarise, quantity = sum(quantity))

The output is the following:

输出如下:

     type quantity
1    A        3
2    B        1

However, what if I know that there are three types of events A, B and C, and I also want to see the count for C which is 0? In other words, I want the output to be:

但是,如果我知道有三种类型的事件A、B和C,我还想看到C的计数为0呢?换句话说,我希望输出是:

     type quantity
1    A        3
2    B        1
3    C        0

How do I do this? It feels like there should be a function defined to do this somewhere.

我该怎么做呢?感觉应该有一个函数定义在某处做这个。

The following are my two not-so-good ideas about how to go about this.

以下是我关于如何做到这一点的两个不太好的想法。

Idea #1: I know I could do this by using a for loop, but I know that it is widely said that if you are using a for loop in R, then you are doing something wrong, there must be a better way to do it.

想法1:我知道我可以通过使用for循环来实现这一点,但是我知道大家都说如果你在R中使用for循环,那么你做错了什么,一定有更好的方法来实现它。

Idea #2: Add dummy entries to the original data frame. This solution works but it feels like there should be a more elegant solution.

想法2:向原始数据框架中添加虚拟条目。这个解决方案有效,但似乎应该有一个更优雅的解决方案。

events <- data.frame(type = c('A', 'A', 'B'),
                       quantity = c(1, 2, 1))
events <- rbind(events, data.frame(type = 'C', quantity = 0))
ddply(events, .(type), summarise, quantity = sum(quantity))

4 个解决方案

#1


20  

You get this for free if you define your events variable correctly as a factor with the desired three levels:

如果您正确地将事件变量定义为具有所需的三个级别的因子,您将免费获得该变量:

R> events <- data.frame(type = factor(c('A', 'A', 'B'), c('A','B','C')), 
+                       quantity = c(1, 2, 1))
R> events
  type quantity
1    A        1
2    A        2
3    B        1
R> table(events$type)

A B C 
2 1 0 
R> 

Simply calling table() on the factor already does the right thing, and ddply() can too if you tell it not to drop:

简单地调用因子上的table()已经做了正确的事情,如果您告诉它不要下降,那么ddply()也可以:

R> ddply(events, .(type), summarise, quantity = sum(quantity), .drop=FALSE)
  type quantity
1    A        3
2    B        1
3    C        0
R> 

#2


4  

> xtabs(quantity~type, events)
type
A B C 
3 1 0 

#3


2  

Using dplyr library

使用dplyr库

library(dplyr)
data <- data.frame(level = c('A', 'A', 'B', 'B', 'B', 'C'),
                   value = c(1:6))

data %>%
  group_by(level) %>%
  summarize(count = n()) %>%
  View

If you choose also to perform mean, min, max operations, try this

如果你选择执行均值,最小,最大运算,试试这个

data %>%
  group_by(level) %>%
  summarise(count = n(), Max_val = max(value), Min_val = min(value)) %>%
  View

#4


0  

Quite similar to @DWin's answer:

与@DWin的回答很相似:

> aggregate(quantity~type, events, FUN=sum)
  type quantity
1    A        3
2    B        1
3    C        0

#1


20  

You get this for free if you define your events variable correctly as a factor with the desired three levels:

如果您正确地将事件变量定义为具有所需的三个级别的因子,您将免费获得该变量:

R> events <- data.frame(type = factor(c('A', 'A', 'B'), c('A','B','C')), 
+                       quantity = c(1, 2, 1))
R> events
  type quantity
1    A        1
2    A        2
3    B        1
R> table(events$type)

A B C 
2 1 0 
R> 

Simply calling table() on the factor already does the right thing, and ddply() can too if you tell it not to drop:

简单地调用因子上的table()已经做了正确的事情,如果您告诉它不要下降,那么ddply()也可以:

R> ddply(events, .(type), summarise, quantity = sum(quantity), .drop=FALSE)
  type quantity
1    A        3
2    B        1
3    C        0
R> 

#2


4  

> xtabs(quantity~type, events)
type
A B C 
3 1 0 

#3


2  

Using dplyr library

使用dplyr库

library(dplyr)
data <- data.frame(level = c('A', 'A', 'B', 'B', 'B', 'C'),
                   value = c(1:6))

data %>%
  group_by(level) %>%
  summarize(count = n()) %>%
  View

If you choose also to perform mean, min, max operations, try this

如果你选择执行均值,最小,最大运算,试试这个

data %>%
  group_by(level) %>%
  summarise(count = n(), Max_val = max(value), Min_val = min(value)) %>%
  View

#4


0  

Quite similar to @DWin's answer:

与@DWin的回答很相似:

> aggregate(quantity~type, events, FUN=sum)
  type quantity
1    A        3
2    B        1
3    C        0