在R中为多因子列创建频率表

I am a novice in R. I am compiling a separate manual on the syntax for the common functions/features for my work. My sample dataframe as follows:

我是R的新手。我正在编写一本关于我工作的常用功能/特性的语法的单独手册。我的示例数据框如下:

x.sample <-
structure(list(Q9_A = structure(c(5L, 3L, 5L, 3L, 5L, 3L, 1L, 
5L, 5L, 5L), .Label = c("Impt", "Neutral", "Not Impt at all", 
"Somewhat Impt", "Very Impt"), class = "factor"), Q9_B = structure(c(5L, 
5L, 5L, 3L, 5L, 5L, 3L, 5L, 3L, 3L), .Label = c("Impt", "Neutral", 
"Not Impt at all", "Somewhat Impt", "Very Impt"), class = "factor"), 
Q9_C = structure(c(3L, 5L, 5L, 3L, 5L, 5L, 3L, 5L, 5L, 3L
), .Label = c("Impt", "Neutral", "Not Impt at all", "Somewhat Impt", 
"Very Impt"), class = "factor")), .Names = c("Q9_A", "Q9_B", 
"Q9_C"), row.names = c(NA, 10L), class = "data.frame")

> x.sample
          Q9_A            Q9_B            Q9_C
1        Very Impt       Very Impt Not Impt at all
2  Not Impt at all       Very Impt       Very Impt
3        Very Impt       Very Impt       Very Impt
4  Not Impt at all Not Impt at all Not Impt at all
5        Very Impt       Very Impt       Very Impt
6  Not Impt at all       Very Impt       Very Impt
7             Impt Not Impt at all Not Impt at all
8        Very Impt       Very Impt       Very Impt
9        Very Impt Not Impt at all       Very Impt
10       Very Impt Not Impt at all Not Impt at all

My original dataframe has 21 columns.

我的原始数据框有21列。

If I want to find the mean (treating this as an ordinal variable):

如果我想找到平均值(将其视为序数变量):

> sapply(x.sample,function(x) mean(as.numeric(x), na.rm=TRUE))
Q9_A Q9_B Q9_C 
 4.0  4.2  4.2

I would like to tabulate a frequency table for ALL the variables in my dataframe. I searched the internet and many forums and saw that the nearest command to do this is using sapply. But when I did it, it gave all 0s.

我想将数据帧中所有变量的频率表制成表格。我搜索了互联网和许多论坛,并看到最近的命令是使用sapply。但是当我这样做时,它给了所有的0。

> sapply(x.sample,function(x) table(factor(x.sample, levels=c("Not Impt at all", "Somewhat Impt",            "Neutral", "Impt", "Very Impt"), ordered=TRUE)))
                Q9_A Q9_B Q9_C
Not Impt at all    0    0    0
Somewhat Impt      0    0    0
Neutral            0    0    0
Impt               0    0    0
Very Impt          0    0    0

QUESTION How can I make use of sapply to tabulate a frequency chart as per the above table for all the columns (that are factors) in a dataframe?

问题如何根据上表对数据帧中的所有列(即因子)制作一个频率表来制作频率表?

PS So sorry if this seems trivia but I have searched for 2 days without an answer and trying all possible combinations. Maybe I didn't search hard enough =(

PS很抱歉,如果这似乎是琐事,但我搜索了2天没有答案,并尝试所有可能的组合。也许我没有足够的搜索=(

Thanks very much.

非常感谢。

3 个解决方案

#1

You were nearly there. Just one small change in your function would have got you there. The x in function(x) ... needs to be passed through to the table() call:

你快到了。只需对你的功能进行一次小改动就能让你在那里。函数(x)中的x ...需要传递给table()调用:

levs <- c("Not Impt at all", "Somewhat Impt", "Neutral", "Impt", "Very Impt")
sapply(x.sample, function(x) table(factor(x, levels=levs, ordered=TRUE)))

A little re-jig of the code might make it a bit easier to read too:

稍微重复一下代码可能会让它更易于阅读:

sapply(lapply(x.sample,factor,levels=levs,ordered=TRUE), table)

#                Q9_A Q9_B Q9_C
#Not Impt at all    3    4    4
#Somewhat Impt      0    0    0
#Neutral            0    0    0
#Impt               1    0    0
#Very Impt          6    6    6

#2

Coming a bit late, but here's a reshape2 possible solution. It could have been very straightforward with recast but we need to handle empty factor levels here so we need to specify both factorsAsStrings = FALSE within melt and drop = FALSE within dcast, while recast can't pass arguments to melt (only to dcast), so here goes

有点晚了,但这是一个重塑2可能的解决方案。重铸可能非常简单,但我们需要在这里处理空因子水平,所以我们需要指定两个因素在融化中的AsStrings = FALSE和dcast中的drop = FALSE,而重铸不能将参数传递给融化(仅限于dcast),所以这里

library(reshape2)
x.sample$indx <- 1 
dcast(melt(x.sample, "indx", factorsAsStrings = FALSE), value ~ variable, drop = FALSE)
#             value Q9_A Q9_B Q9_C
# 1            Impt    1    0    0
# 2         Neutral    0    0    0
# 3 Not Impt at all    3    4    4
# 4   Somewhat Impt    0    0    0
# 5       Very Impt    6    6    6

If we wouldn't care about empty levels a quick solution would be just

如果我们不关心空白水平,那么快速解决方案就是这样

recast(x.sample, value ~ variable, id.var = "indx")
#             value Q9_A Q9_B Q9_C
# 1            Impt    1    0    0
# 2 Not Impt at all    3    4    4
# 3       Very Impt    6    6    6

Alternatively, if speed is a concern, we can do the same using data.atble

或者,如果速度是一个问题,我们可以使用data.atble做同样的事情

library(data.table)
dcast(melt(setDT(x.sample), measure.vars = names(x.sample), value.factor = TRUE), 
           value ~ variable, drop = FALSE)
#              value Q9_A Q9_B Q9_C
# 1:            Impt    1    0    0
# 2:         Neutral    0    0    0
# 3: Not Impt at all    3    4    4
# 4:   Somewhat Impt    0    0    0
# 5:       Very Impt    6    6    6

#3

Why not just:

为什么不呢:

> sapply(x.sample, table)
                Q9_A Q9_B Q9_C
Impt               1    0    0
Neutral            0    0    0
Not Impt at all    3    4    4
Somewhat Impt      0    0    0
Very Impt          6    6    6

Let's call it 'tbl';

我们称之为'tbl';

tbl[ order(match(rownames(tbl), c("Not Impt at all", "Somewhat Impt", 
                                  "Neutral", "Impt", "Very Impt")) )   , ]
                Q9_A Q9_B Q9_C
Not Impt at all    3    4    4
Somewhat Impt      0    0    0
Neutral            0    0    0
Impt               1    0    0
Very Impt          6    6    6

#1