返回聚合的唯一实例的计数

时间:2022-02-24 07:36:02

Apologies for not inserting code fragments, I'm still too junior on this site at this stage so it blocks me from doing so.

不插入代码片段的道歉,在这个阶段,我在这个网站上还是太初级,所以它阻止我这样做。

Long story short, I have a large dataset of over 60000 entries.

长话短说,我有一个超过60000个条目的大型数据集。

I'm aggregating over a variety of different factors (14 different aggregates, over three different sections of the report each).

我聚集了各种不同的因素(14个不同的聚合,每个报告的三个不同部分)。

I'm doing the aggregates based on mean score.

我正在根据平均分数进行聚合。

For example, one sample would be:

例如,一个样本将是:

rurageeth3 <- aggregate(rural$Q8, by=list(Age = rural$Age, Ethnicity= rural$Ethnicity), mean, na.rm=TRUE)

rurageeth3 <- rurageeth3[order(rurageeth3$x, decreasing=T),]

rurageeth3
        Age Ethnicity         x
6    Eleven     Black 10.000000
11  Fifteen     Mixed  9.500000
10   Eleven     Mixed  9.375000
1    Eleven     Asian  9.000000
2  Fourteen     Asian  9.000000
7   Fifteen     Black  9.000000
8  Fourteen     Black  9.000000
16   Eleven     Other  9.000000
17 Fourteen     Other  9.000000
21   Eleven     White  8.978799
26   Twelve     White  8.860465
25 Thirteen     White  8.841026
12 Fourteen     Mixed  8.666667
19 Thirteen     Other  8.666667
24  Sixteen     White  8.644444
23 Fourteen     White  8.623288
5    Twelve     Asian  8.600000
15   Twelve     Mixed  8.583333
22  Fifteen     White  8.576087
9  Thirteen     Black  8.500000
14 Thirteen     Mixed  8.300000
13  Sixteen     Mixed  8.000000
18  Sixteen     Other  8.000000
20   Twelve     Other  8.000000
3   Sixteen     Asian  7.000000
4  Thirteen     Asian  6.000000

Now that I have rurageeth initialized, I want to know how many, for instance, Fourteen year old mixed race children were included in the sample.

现在我已经初步确定了rurageeth,我想知道有多少,例如,样本中包括了14个混合种族的孩子。

Any idea of how I can see this data, without having to recreate all 72 aggregates from scratch?

知道如何看到这些数据,而不必从头开始重新创建所有72个聚合?

1 个解决方案

#1


1  

Assuming your data has one row per subject, you would need to count the number of rows for each combination of categories. You can do it separately or at the same time you calculate the means.

假设您的数据每个主题有一行,您需要计算每个类别组合的行数。您可以单独执行此操作,也可以在计算均值时同时执行此操作。

Using aggregate:

使用聚合:

aggregate(rural$Q8, by=list(Age = rural$Age, Ethnicity= rural$Ethnicity), 
          FUN = function(x) c("Mean"=mean(x, na.rm=TRUE), "Count"=sum(!is.na(x))))

sum(!is.na(x)) counts the number of non-missing values. If you want the total number of values, use length(x).

sum(!is.na(x))计算非缺失值的数量。如果需要总值数,请使用长度(x)。

If you're willing to try other options, both dplyr and data.table are very fast. Here's a dplyr example:

如果你愿意尝试其他选项,dplyr和data.table都非常快。这是一个dplyr示例:

library(dplyr)

# This will count the number of rows for each combination of Age and Ethnicity
rural %>% group_by(Age, Ethnicity) %>% tally()

#1


1  

Assuming your data has one row per subject, you would need to count the number of rows for each combination of categories. You can do it separately or at the same time you calculate the means.

假设您的数据每个主题有一行,您需要计算每个类别组合的行数。您可以单独执行此操作,也可以在计算均值时同时执行此操作。

Using aggregate:

使用聚合:

aggregate(rural$Q8, by=list(Age = rural$Age, Ethnicity= rural$Ethnicity), 
          FUN = function(x) c("Mean"=mean(x, na.rm=TRUE), "Count"=sum(!is.na(x))))

sum(!is.na(x)) counts the number of non-missing values. If you want the total number of values, use length(x).

sum(!is.na(x))计算非缺失值的数量。如果需要总值数,请使用长度(x)。

If you're willing to try other options, both dplyr and data.table are very fast. Here's a dplyr example:

如果你愿意尝试其他选项,dplyr和data.table都非常快。这是一个dplyr示例:

library(dplyr)

# This will count the number of rows for each combination of Age and Ethnicity
rural %>% group_by(Age, Ethnicity) %>% tally()