如何在多个维度上聚合data.table

时间:2021-08-16 00:14:39

I have a data table for which I want to aggregate the data based on multiple fields. Here is a simplified example of my data:

我有一个数据表,我想根据多个字段聚合数据。以下是我的数据的简化示例:

# each record is the number of pages read
# by a student in a given day
pages_per_day <- data.table(
  student_id = c(1,1,1,2,2,2),
  week_of_semester = c(1,1,2,1,2,2),
  pages_read = c(8,6,4,7,8,7)
)

I would like to aggregate this data based on both student_id and week to show the average number of pages per each student read during a given week of the semester. I tried the following:

我想根据student_id和week聚合这些数据,以显示在学期的某一周内每位学生阅读的平均页数。我尝试了以下方法:

avg_weekly_pages_read <- grades[,list(
  avg_pages = sum(pages_read) / .N,
  by = c('student_id','week')
)]

This gives me a two column data table with columns: avg_pages, by.

这给了我一个包含列的两列数据表:avg_pages,by。

I was hoping to have a table more like:

我希望有一张桌子更像:

student_id, week, avg_pages
1,1,7
1,2,4
2,1,7
2,2,7.5

Any guidance is greatly appreciated.

非常感谢任何指导。

2 个解决方案

#1


You are looking for

你在找

pages_per_day[, .(avg_pages = mean(pages_read)), by = .(student_id, week_of_semester)]
#    student_id week_of_semester avg_pages
# 1:          1                1       7.0
# 2:          1                2       4.0
# 3:          2                1       7.0
# 4:          2                2       7.5

Btw, no need to reinvent the wheel. There is a mean functions in R

顺便说一句,没有必要重新发明*。 R中有一个平均函数

#2


 aggregate(pages_read~student_id+week_of_semester,pages_per_day,mean)
student_id week_of_semester pages_read
# 1          1                1        7.0
# 2          2                1        7.0
# 3          1                2        4.0
# 4          2                2        7.5

#1


You are looking for

你在找

pages_per_day[, .(avg_pages = mean(pages_read)), by = .(student_id, week_of_semester)]
#    student_id week_of_semester avg_pages
# 1:          1                1       7.0
# 2:          1                2       4.0
# 3:          2                1       7.0
# 4:          2                2       7.5

Btw, no need to reinvent the wheel. There is a mean functions in R

顺便说一句,没有必要重新发明*。 R中有一个平均函数

#2


 aggregate(pages_read~student_id+week_of_semester,pages_per_day,mean)
student_id week_of_semester pages_read
# 1          1                1        7.0
# 2          2                1        7.0
# 3          1                2        4.0
# 4          2                2        7.5