This question already has an answer here:
这个问题已经有了答案:
- Fastest way to count occurrences of each unique element 2 answers
- 计算每个唯一元素2答案出现次数的最快方法
- Counting the number of elements with the values of x in a vector 12 answers
- 用向量12中的x的值来计算元素的个数。
I have a vector say
我有一个向量。
c(1,1,1,1,1,1,2,3,4,5,7,7,5,7,7,7)
How do I find the counts of each element and return the 3 most often occurring elements, i.e. 1, 7, 5?
如何找到每个元素的计数并返回最常发生的3个元素,即1、7、5?
I think this should be really simple but I am having trouble with this.
我认为这应该很简单,但我有问题。
4 个解决方案
#1
57
I'm sure this is a duplicate, but the answer is simple:
我肯定这是一份副本,但答案很简单:
sort(table(variable),decreasing=TRUE)[1:3]
#2
8
I don't know if this is better than the table approach, but if your list is already a factor then its summary method will give you frequency counts:
我不知道这是否比表格方法更好,但如果你的列表已经是一个因素,那么它的总结方法将会给你频率计数:
> summary(as.factor(c(1,1,1,1,1,1,2,3,4,5,7,7,5,7,7,7)))
1 2 3 4 5 7
6 1 1 1 2 5
And then you can get the top 3 most frequent like so:
然后你可以得到前三种最常见的情况如下:
> names(sort(summary(as.factor(c(1,1,1,1,1,1,2,3,4,5,7,7,5,7,7,7))), decreasing=T)[1:3])
[1] "1" "7" "5"
#3
8
If your vector contains only integers, tabulate
will be much faster than anything else. There are a couple of catches to be aware of:
如果你的向量只包含整数,表格将比其他任何东西都快。有几个问题需要注意:
- It'll by default return the count for numbers from 1 to N.
- 默认情况下,它会返回从1到N的计数。
- It'll return an unnamed vector.
- 它会返回一个未命名的向量。
That means, if your x = c(1,1,1,3)
then tabulate(x)
will return (3, 0, 1)
. Note that the counts are for 1 to max(x)
by default.
这意味着,如果你的x = c(1,1,1,1,3)那么表化(x)将返回(3,0,1)。
How can you use tabulate
to make sure that you can pass any numbers?
如何使用表格来确保可以传递任何数字?
set.seed(45)
x <- sample(-5:5, 25, TRUE)
# [1] 1 -2 -3 -1 -2 -2 -3 1 -3 -5 -1 4 -2 0 -1 -1 5 -4 -1 -3 -4 -2 1 2 4
Just add abs(min(x))+1
when min(x) <= 0
to make sure that the values start from 1. If min(x) > 0
, then just use tabulate
directly.
只要在min(x) <= 0时添加abs(min(x))+1,确保值从1开始。如果min(x) >,则直接使用表格。
sort(setNames(tabulate(x + ifelse(min(x) <= 0, abs(min(x))+1, 0)),
seq(min(x), max(x))), decreasing=TRUE)[1:3]
If your vector does contain NA
, then you can use table
with useNA="always"
parameter.
如果向量确实包含NA,那么可以使用带有useNA="always"参数的表。
#4
2
you can use table() function to get a tabulation of the frequency of values in an array/vector and then sort this table.
您可以使用table()函数来获得数组/vector中值的频率列表,然后对该表进行排序。
x = c(1, 1, 1, 2, 2)
sort(table(x))
2 1
2 3
#1
57
I'm sure this is a duplicate, but the answer is simple:
我肯定这是一份副本,但答案很简单:
sort(table(variable),decreasing=TRUE)[1:3]
#2
8
I don't know if this is better than the table approach, but if your list is already a factor then its summary method will give you frequency counts:
我不知道这是否比表格方法更好,但如果你的列表已经是一个因素,那么它的总结方法将会给你频率计数:
> summary(as.factor(c(1,1,1,1,1,1,2,3,4,5,7,7,5,7,7,7)))
1 2 3 4 5 7
6 1 1 1 2 5
And then you can get the top 3 most frequent like so:
然后你可以得到前三种最常见的情况如下:
> names(sort(summary(as.factor(c(1,1,1,1,1,1,2,3,4,5,7,7,5,7,7,7))), decreasing=T)[1:3])
[1] "1" "7" "5"
#3
8
If your vector contains only integers, tabulate
will be much faster than anything else. There are a couple of catches to be aware of:
如果你的向量只包含整数,表格将比其他任何东西都快。有几个问题需要注意:
- It'll by default return the count for numbers from 1 to N.
- 默认情况下,它会返回从1到N的计数。
- It'll return an unnamed vector.
- 它会返回一个未命名的向量。
That means, if your x = c(1,1,1,3)
then tabulate(x)
will return (3, 0, 1)
. Note that the counts are for 1 to max(x)
by default.
这意味着,如果你的x = c(1,1,1,1,3)那么表化(x)将返回(3,0,1)。
How can you use tabulate
to make sure that you can pass any numbers?
如何使用表格来确保可以传递任何数字?
set.seed(45)
x <- sample(-5:5, 25, TRUE)
# [1] 1 -2 -3 -1 -2 -2 -3 1 -3 -5 -1 4 -2 0 -1 -1 5 -4 -1 -3 -4 -2 1 2 4
Just add abs(min(x))+1
when min(x) <= 0
to make sure that the values start from 1. If min(x) > 0
, then just use tabulate
directly.
只要在min(x) <= 0时添加abs(min(x))+1,确保值从1开始。如果min(x) >,则直接使用表格。
sort(setNames(tabulate(x + ifelse(min(x) <= 0, abs(min(x))+1, 0)),
seq(min(x), max(x))), decreasing=TRUE)[1:3]
If your vector does contain NA
, then you can use table
with useNA="always"
parameter.
如果向量确实包含NA,那么可以使用带有useNA="always"参数的表。
#4
2
you can use table() function to get a tabulation of the frequency of values in an array/vector and then sort this table.
您可以使用table()函数来获得数组/vector中值的频率列表,然后对该表进行排序。
x = c(1, 1, 1, 2, 2)
sort(table(x))
2 1
2 3