Trying to use dplyr
to group_by
the stud_ID
variable in the following data frame, as in this SO question:
尝试在以下数据框架中使用dplyr到group_by的stud_ID变量,如在这个SO问题中:
> str(df)
'data.frame': 4136 obs. of 4 variables:
$ stud_ID : chr "ABB112292" "ABB112292" "ABB112292" "ABB112292" ...
$ behavioral_scale: num 3.5 4 3.5 3 3.5 2 NA NA 1 2 ...
$ cognitive_scale : num 3.5 3 3 3 3.5 2 NA NA 1 1 ...
$ affective_scale : num 2.5 3.5 3 3 2.5 2 NA NA 1 1.5 ...
I tried the following to obtain scale scores by student (rather than scale scores for observations across all students):
我尝试了以下方法来获得学生的分数(而不是所有学生的观察分数):
scaled_data <-
df %>%
group_by(stud_ID) %>%
mutate(behavioral_scale_ind = scale(behavioral_scale),
cognitive_scale_ind = scale(cognitive_scale),
affective_scale_ind = scale(affective_scale))
Here is the result:
这里是结果:
> str(scaled_data)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 4136 obs. of 7 variables:
$ stud_ID : chr "ABB112292" "ABB112292" "ABB112292" "ABB112292" ...
$ behavioral_scale : num 3.5 4 3.5 3 3.5 2 NA NA 1 2 ...
$ cognitive_scale : num 3.5 3 3 3 3.5 2 NA NA 1 1 ...
$ affective_scale : num 2.5 3.5 3 3 2.5 2 NA NA 1 1.5 ...
$ behavioral_scale_ind: num [1:12, 1] 0.64 1.174 0.64 0.107 0.64 ...
..- attr(*, "scaled:center")= num 2.9
..- attr(*, "scaled:scale")= num 0.937
$ cognitive_scale_ind : num [1:12, 1] 1.17 0.64 0.64 0.64 1.17 ...
..- attr(*, "scaled:center")= num 2.4
..- attr(*, "scaled:scale")= num 0.937
$ affective_scale_ind : num [1:12, 1] 0 1.28 0.64 0.64 0 ...
..- attr(*, "scaled:center")= num 2.5
..- attr(*, "scaled:scale")= num 0.782
The three scaled variables (behavioral_scale
, cognitive_scale
, and affective_scale
) have only 12 observations - the same number of observations for the first student, ABB112292
.
这三个尺度变量(行为尺度、认知尺度和情感尺度)只有12个观测值——第一个学生ABB112292的观测值相同。
What's going on here? How can I obtain scaled scores by individual?
这是怎么回事?我如何获得个别的评分?
2 个解决方案
#1
25
The problem seems to be in the base scale()
function, which expects a matrix. Try writing your own.
问题似乎出现在base scale()函数中,它需要一个矩阵。尝试编写自己的。
scale_this <- function(x){
(x - mean(x, na.rm=TRUE)) / sd(x, na.rm=TRUE)
}
Then this works:
那么这个工作原理:
library("dplyr")
# reproducible sample data
set.seed(123)
n = 1000
df <- data.frame(stud_ID = sample(LETTERS, size=n, replace=TRUE),
behavioral_scale = runif(n, 0, 10),
cognitive_scale = runif(n, 1, 20),
affective_scale = runif(n, 0, 1) )
scaled_data <-
df %>%
group_by(stud_ID) %>%
mutate(behavioral_scale_ind = scale_this(behavioral_scale),
cognitive_scale_ind = scale_this(cognitive_scale),
affective_scale_ind = scale_this(affective_scale))
Or, if you're open to a data.table
solution:
或者,如果你对数据开放的话。表解决方案:
library("data.table")
setDT(df)
cols_to_scale <- c("behavioral_scale","cognitive_scale","affective_scale")
df[, lapply(.SD, scale_this), .SDcols = cols_to_scale, keyby = factor(stud_ID)]
#2
7
This was a known problem in dplyr, a fix has been merged to the development version, which you can install via
这在dplyr中是一个已知的问题,补丁已经被合并到开发版本中,您可以通过开发版本进行安装
# install.packages("devtools")
devtools::install_github("hadley/dplyr")
In the stable version, the following should work, too:
在稳定版本中,以下内容也应适用:
scale_this <- function(x) as.vector(scale(x))
#1
25
The problem seems to be in the base scale()
function, which expects a matrix. Try writing your own.
问题似乎出现在base scale()函数中,它需要一个矩阵。尝试编写自己的。
scale_this <- function(x){
(x - mean(x, na.rm=TRUE)) / sd(x, na.rm=TRUE)
}
Then this works:
那么这个工作原理:
library("dplyr")
# reproducible sample data
set.seed(123)
n = 1000
df <- data.frame(stud_ID = sample(LETTERS, size=n, replace=TRUE),
behavioral_scale = runif(n, 0, 10),
cognitive_scale = runif(n, 1, 20),
affective_scale = runif(n, 0, 1) )
scaled_data <-
df %>%
group_by(stud_ID) %>%
mutate(behavioral_scale_ind = scale_this(behavioral_scale),
cognitive_scale_ind = scale_this(cognitive_scale),
affective_scale_ind = scale_this(affective_scale))
Or, if you're open to a data.table
solution:
或者,如果你对数据开放的话。表解决方案:
library("data.table")
setDT(df)
cols_to_scale <- c("behavioral_scale","cognitive_scale","affective_scale")
df[, lapply(.SD, scale_this), .SDcols = cols_to_scale, keyby = factor(stud_ID)]
#2
7
This was a known problem in dplyr, a fix has been merged to the development version, which you can install via
这在dplyr中是一个已知的问题,补丁已经被合并到开发版本中,您可以通过开发版本进行安装
# install.packages("devtools")
devtools::install_github("hadley/dplyr")
In the stable version, the following should work, too:
在稳定版本中,以下内容也应适用:
scale_this <- function(x) as.vector(scale(x))