I am a novice in R. I am compiling a separate manual on the syntax for the common functions/features for my work. My sample dataframe as follows:
我是R的新手。我正在编写一本关于我工作的常用功能/特性的语法的单独手册。我的示例数据框如下:
x.sample <-
structure(list(Q9_A = structure(c(5L, 3L, 5L, 3L, 5L, 3L, 1L,
5L, 5L, 5L), .Label = c("Impt", "Neutral", "Not Impt at all",
"Somewhat Impt", "Very Impt"), class = "factor"), Q9_B = structure(c(5L,
5L, 5L, 3L, 5L, 5L, 3L, 5L, 3L, 3L), .Label = c("Impt", "Neutral",
"Not Impt at all", "Somewhat Impt", "Very Impt"), class = "factor"),
Q9_C = structure(c(3L, 5L, 5L, 3L, 5L, 5L, 3L, 5L, 5L, 3L
), .Label = c("Impt", "Neutral", "Not Impt at all", "Somewhat Impt",
"Very Impt"), class = "factor")), .Names = c("Q9_A", "Q9_B",
"Q9_C"), row.names = c(NA, 10L), class = "data.frame")
> x.sample
Q9_A Q9_B Q9_C
1 Very Impt Very Impt Not Impt at all
2 Not Impt at all Very Impt Very Impt
3 Very Impt Very Impt Very Impt
4 Not Impt at all Not Impt at all Not Impt at all
5 Very Impt Very Impt Very Impt
6 Not Impt at all Very Impt Very Impt
7 Impt Not Impt at all Not Impt at all
8 Very Impt Very Impt Very Impt
9 Very Impt Not Impt at all Very Impt
10 Very Impt Not Impt at all Not Impt at all
My original dataframe has 21 columns.
我的原始数据框有21列。
If I want to find the mean (treating this as an ordinal variable):
如果我想找到平均值(将其视为序数变量):
> sapply(x.sample,function(x) mean(as.numeric(x), na.rm=TRUE))
Q9_A Q9_B Q9_C
4.0 4.2 4.2
I would like to tabulate a frequency table for ALL the variables in my dataframe. I searched the internet and many forums and saw that the nearest command to do this is using sapply. But when I did it, it gave all 0s.
我想将数据帧中所有变量的频率表制成表格。我搜索了互联网和许多论坛,并看到最近的命令是使用sapply。但是当我这样做时,它给了所有的0。
> sapply(x.sample,function(x) table(factor(x.sample, levels=c("Not Impt at all", "Somewhat Impt", "Neutral", "Impt", "Very Impt"), ordered=TRUE)))
Q9_A Q9_B Q9_C
Not Impt at all 0 0 0
Somewhat Impt 0 0 0
Neutral 0 0 0
Impt 0 0 0
Very Impt 0 0 0
QUESTION How can I make use of sapply to tabulate a frequency chart as per the above table for all the columns (that are factors) in a dataframe?
问题如何根据上表对数据帧中的所有列(即因子)制作一个频率表来制作频率表?
PS So sorry if this seems trivia but I have searched for 2 days without an answer and trying all possible combinations. Maybe I didn't search hard enough =(
PS很抱歉,如果这似乎是琐事,但我搜索了2天没有答案,并尝试所有可能的组合。也许我没有足够的搜索=(
Thanks very much.
非常感谢。
3 个解决方案
#1
8
You were nearly there. Just one small change in your function would have got you there. The x
in function(x) ...
needs to be passed through to the table()
call:
你快到了。只需对你的功能进行一次小改动就能让你在那里。函数(x)中的x ...需要传递给table()调用:
levs <- c("Not Impt at all", "Somewhat Impt", "Neutral", "Impt", "Very Impt")
sapply(x.sample, function(x) table(factor(x, levels=levs, ordered=TRUE)))
A little re-jig of the code might make it a bit easier to read too:
稍微重复一下代码可能会让它更易于阅读:
sapply(lapply(x.sample,factor,levels=levs,ordered=TRUE), table)
# Q9_A Q9_B Q9_C
#Not Impt at all 3 4 4
#Somewhat Impt 0 0 0
#Neutral 0 0 0
#Impt 1 0 0
#Very Impt 6 6 6
#2
8
Coming a bit late, but here's a reshape2
possible solution. It could have been very straightforward with recast
but we need to handle empty factor levels here so we need to specify both factorsAsStrings = FALSE
within melt
and drop = FALSE
within dcast
, while recast
can't pass arguments to melt
(only to dcast
), so here goes
有点晚了,但这是一个重塑2可能的解决方案。重铸可能非常简单,但我们需要在这里处理空因子水平,所以我们需要指定两个因素在融化中的AsStrings = FALSE和dcast中的drop = FALSE,而重铸不能将参数传递给融化(仅限于dcast),所以这里
library(reshape2)
x.sample$indx <- 1
dcast(melt(x.sample, "indx", factorsAsStrings = FALSE), value ~ variable, drop = FALSE)
# value Q9_A Q9_B Q9_C
# 1 Impt 1 0 0
# 2 Neutral 0 0 0
# 3 Not Impt at all 3 4 4
# 4 Somewhat Impt 0 0 0
# 5 Very Impt 6 6 6
If we wouldn't care about empty levels a quick solution would be just
如果我们不关心空白水平,那么快速解决方案就是这样
recast(x.sample, value ~ variable, id.var = "indx")
# value Q9_A Q9_B Q9_C
# 1 Impt 1 0 0
# 2 Not Impt at all 3 4 4
# 3 Very Impt 6 6 6
Alternatively, if speed is a concern, we can do the same using data.atble
或者,如果速度是一个问题,我们可以使用data.atble做同样的事情
library(data.table)
dcast(melt(setDT(x.sample), measure.vars = names(x.sample), value.factor = TRUE),
value ~ variable, drop = FALSE)
# value Q9_A Q9_B Q9_C
# 1: Impt 1 0 0
# 2: Neutral 0 0 0
# 3: Not Impt at all 3 4 4
# 4: Somewhat Impt 0 0 0
# 5: Very Impt 6 6 6
#3
5
Why not just:
为什么不呢:
> sapply(x.sample, table)
Q9_A Q9_B Q9_C
Impt 1 0 0
Neutral 0 0 0
Not Impt at all 3 4 4
Somewhat Impt 0 0 0
Very Impt 6 6 6
Let's call it 'tbl';
我们称之为'tbl';
tbl[ order(match(rownames(tbl), c("Not Impt at all", "Somewhat Impt",
"Neutral", "Impt", "Very Impt")) ) , ]
Q9_A Q9_B Q9_C
Not Impt at all 3 4 4
Somewhat Impt 0 0 0
Neutral 0 0 0
Impt 1 0 0
Very Impt 6 6 6
#1
8
You were nearly there. Just one small change in your function would have got you there. The x
in function(x) ...
needs to be passed through to the table()
call:
你快到了。只需对你的功能进行一次小改动就能让你在那里。函数(x)中的x ...需要传递给table()调用:
levs <- c("Not Impt at all", "Somewhat Impt", "Neutral", "Impt", "Very Impt")
sapply(x.sample, function(x) table(factor(x, levels=levs, ordered=TRUE)))
A little re-jig of the code might make it a bit easier to read too:
稍微重复一下代码可能会让它更易于阅读:
sapply(lapply(x.sample,factor,levels=levs,ordered=TRUE), table)
# Q9_A Q9_B Q9_C
#Not Impt at all 3 4 4
#Somewhat Impt 0 0 0
#Neutral 0 0 0
#Impt 1 0 0
#Very Impt 6 6 6
#2
8
Coming a bit late, but here's a reshape2
possible solution. It could have been very straightforward with recast
but we need to handle empty factor levels here so we need to specify both factorsAsStrings = FALSE
within melt
and drop = FALSE
within dcast
, while recast
can't pass arguments to melt
(only to dcast
), so here goes
有点晚了,但这是一个重塑2可能的解决方案。重铸可能非常简单,但我们需要在这里处理空因子水平,所以我们需要指定两个因素在融化中的AsStrings = FALSE和dcast中的drop = FALSE,而重铸不能将参数传递给融化(仅限于dcast),所以这里
library(reshape2)
x.sample$indx <- 1
dcast(melt(x.sample, "indx", factorsAsStrings = FALSE), value ~ variable, drop = FALSE)
# value Q9_A Q9_B Q9_C
# 1 Impt 1 0 0
# 2 Neutral 0 0 0
# 3 Not Impt at all 3 4 4
# 4 Somewhat Impt 0 0 0
# 5 Very Impt 6 6 6
If we wouldn't care about empty levels a quick solution would be just
如果我们不关心空白水平,那么快速解决方案就是这样
recast(x.sample, value ~ variable, id.var = "indx")
# value Q9_A Q9_B Q9_C
# 1 Impt 1 0 0
# 2 Not Impt at all 3 4 4
# 3 Very Impt 6 6 6
Alternatively, if speed is a concern, we can do the same using data.atble
或者,如果速度是一个问题,我们可以使用data.atble做同样的事情
library(data.table)
dcast(melt(setDT(x.sample), measure.vars = names(x.sample), value.factor = TRUE),
value ~ variable, drop = FALSE)
# value Q9_A Q9_B Q9_C
# 1: Impt 1 0 0
# 2: Neutral 0 0 0
# 3: Not Impt at all 3 4 4
# 4: Somewhat Impt 0 0 0
# 5: Very Impt 6 6 6
#3
5
Why not just:
为什么不呢:
> sapply(x.sample, table)
Q9_A Q9_B Q9_C
Impt 1 0 0
Neutral 0 0 0
Not Impt at all 3 4 4
Somewhat Impt 0 0 0
Very Impt 6 6 6
Let's call it 'tbl';
我们称之为'tbl';
tbl[ order(match(rownames(tbl), c("Not Impt at all", "Somewhat Impt",
"Neutral", "Impt", "Very Impt")) ) , ]
Q9_A Q9_B Q9_C
Not Impt at all 3 4 4
Somewhat Impt 0 0 0
Neutral 0 0 0
Impt 1 0 0
Very Impt 6 6 6