Apologies for not inserting code fragments, I'm still too junior on this site at this stage so it blocks me from doing so.
不插入代码片段的道歉,在这个阶段,我在这个网站上还是太初级,所以它阻止我这样做。
Long story short, I have a large dataset of over 60000 entries.
长话短说,我有一个超过60000个条目的大型数据集。
I'm aggregating over a variety of different factors (14 different aggregates, over three different sections of the report each).
我聚集了各种不同的因素(14个不同的聚合,每个报告的三个不同部分)。
I'm doing the aggregates based on mean score.
我正在根据平均分数进行聚合。
For example, one sample would be:
例如,一个样本将是:
rurageeth3 <- aggregate(rural$Q8, by=list(Age = rural$Age, Ethnicity= rural$Ethnicity), mean, na.rm=TRUE)
rurageeth3 <- rurageeth3[order(rurageeth3$x, decreasing=T),]
rurageeth3
Age Ethnicity x
6 Eleven Black 10.000000
11 Fifteen Mixed 9.500000
10 Eleven Mixed 9.375000
1 Eleven Asian 9.000000
2 Fourteen Asian 9.000000
7 Fifteen Black 9.000000
8 Fourteen Black 9.000000
16 Eleven Other 9.000000
17 Fourteen Other 9.000000
21 Eleven White 8.978799
26 Twelve White 8.860465
25 Thirteen White 8.841026
12 Fourteen Mixed 8.666667
19 Thirteen Other 8.666667
24 Sixteen White 8.644444
23 Fourteen White 8.623288
5 Twelve Asian 8.600000
15 Twelve Mixed 8.583333
22 Fifteen White 8.576087
9 Thirteen Black 8.500000
14 Thirteen Mixed 8.300000
13 Sixteen Mixed 8.000000
18 Sixteen Other 8.000000
20 Twelve Other 8.000000
3 Sixteen Asian 7.000000
4 Thirteen Asian 6.000000
Now that I have rurageeth
initialized, I want to know how many, for instance, Fourteen year old mixed race children were included in the sample.
现在我已经初步确定了rurageeth,我想知道有多少,例如,样本中包括了14个混合种族的孩子。
Any idea of how I can see this data, without having to recreate all 72 aggregates from scratch?
知道如何看到这些数据,而不必从头开始重新创建所有72个聚合?
1 个解决方案
#1
1
Assuming your data has one row per subject, you would need to count the number of rows for each combination of categories. You can do it separately or at the same time you calculate the means.
假设您的数据每个主题有一行,您需要计算每个类别组合的行数。您可以单独执行此操作,也可以在计算均值时同时执行此操作。
Using aggregate
:
使用聚合:
aggregate(rural$Q8, by=list(Age = rural$Age, Ethnicity= rural$Ethnicity),
FUN = function(x) c("Mean"=mean(x, na.rm=TRUE), "Count"=sum(!is.na(x))))
sum(!is.na(x))
counts the number of non-missing values. If you want the total number of values, use length(x)
.
sum(!is.na(x))计算非缺失值的数量。如果需要总值数,请使用长度(x)。
If you're willing to try other options, both dplyr
and data.table
are very fast. Here's a dplyr
example:
如果你愿意尝试其他选项,dplyr和data.table都非常快。这是一个dplyr示例:
library(dplyr)
# This will count the number of rows for each combination of Age and Ethnicity
rural %>% group_by(Age, Ethnicity) %>% tally()
#1
1
Assuming your data has one row per subject, you would need to count the number of rows for each combination of categories. You can do it separately or at the same time you calculate the means.
假设您的数据每个主题有一行,您需要计算每个类别组合的行数。您可以单独执行此操作,也可以在计算均值时同时执行此操作。
Using aggregate
:
使用聚合:
aggregate(rural$Q8, by=list(Age = rural$Age, Ethnicity= rural$Ethnicity),
FUN = function(x) c("Mean"=mean(x, na.rm=TRUE), "Count"=sum(!is.na(x))))
sum(!is.na(x))
counts the number of non-missing values. If you want the total number of values, use length(x)
.
sum(!is.na(x))计算非缺失值的数量。如果需要总值数,请使用长度(x)。
If you're willing to try other options, both dplyr
and data.table
are very fast. Here's a dplyr
example:
如果你愿意尝试其他选项,dplyr和data.table都非常快。这是一个dplyr示例:
library(dplyr)
# This will count the number of rows for each combination of Age and Ethnicity
rural %>% group_by(Age, Ethnicity) %>% tally()