R:使用基于多列的dataframe中的排序函数

时间:2021-03-13 22:34:24

I am a cardiologist and love coding in R - i am having a real issue with sorting a data frame and i suspect the solution is really easy!

我是一名心脏科医生,喜欢在R中编码——我对数据框的排序有一个真正的问题,我怀疑这个解决方案真的很简单!

I have a data frame with summary values from multiple studies df$study. Most studies have only one summary value (df$summary). However as you can see Study A has three summary values (df$no.of.estimate). See below

我有一个数据框架,包含多个研究的总结价值。大多数研究只有一个汇总值(df$summary)。然而,正如您可以看到的,研究A有三个摘要值(df$no. estimate)。见下文

study <- c("E", "A", "F", "A", "B", "A", "C", "D")
no.of.estimate <- c(1, 2, 1, 3, 1, 1, 1, 1)
summary <- c(1, 2, 3, 5, 6 ,7 ,8 ,9)
df <- data.frame(study, no.of.estimate, summary)

So i want to sort the dataframe by df$summary - which is easy. However, if each study has more than one estimate then i want to group these studies together and appear in order using the "no.of.estimates" column.

所以我想用df$summary对dataframe进行排序——这很简单。然而,如果每个研究都有不止一个估计,那么我想把这些研究放在一起,并使用“no.of.estimate”一栏来排序。

So essentially the desired output is

本质上,期望的输出是

study <- c("E", "A", "A", "A", "F", "B", "C", "D")
no.of.estimate <- c(1, 1, 2, 3, 1, 1, 1, 1)
summary <- c(1, 7, 2, 5, 3 ,6 ,8 ,9)
df <- data.frame(study, no.of.estimate, summary)

2 个解决方案

#1


2  

You could try

你可以试试

library(dplyr)
df %>% 
     mutate(study=factor(study, levels=unique(study))) %>%
     arrange(study,no.of.estimate)
  #  study no.of.estimate summary
  #1     E              1       1
  #2     A              1       7
  #3     A              2       2
  #4     A              3       5
  #5     F              1       3
  #6     B              1       6
  #7     C              1       8
  #8     D              1       9

Or a base R approach

或者基R方法

df$study <- factor(df$study, levels=unique(df$study))
df[with(df, order(study, no.of.estimate)), ]

data

df <- structure(list(study = structure(c(5L, 1L, 6L, 1L, 2L, 1L, 3L, 
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), 
no.of.estimate = c(1, 2, 1, 3, 1, 1, 1, 1), summary = c(1, 
2, 3, 5, 6, 7, 8, 9)), .Names = c("study", "no.of.estimate", 
"summary"), row.names = c(NA, -8L), class = "data.frame")

The expected dataset is

预期的数据集

df1 <- structure(list(study = structure(c(5L, 1L, 1L, 1L, 6L, 2L, 3L, 
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), 
no.of.estimate = c(1, 1, 2, 3, 1, 1, 1, 1), summary = c(1, 
7, 2, 5, 3, 6, 8, 9)), .Names = c("study", "no.of.estimate", 
"summary"), row.names = c(NA, -8L), class = "data.frame")

#2


2  

Here's my data.table attempt while leaving your columns as is and creating a new index (though see my comment first). It's main advantage that you will update your data set by reference rather than creating new copies

这是我的数据。在保持列不变并创建新索引时尝试表(不过请先查看我的注释)。您将根据引用更新数据集,而不是创建新的副本,这是主要的优势

library(data.table)
setorder(setDT(df)[, indx := .GRP, study], indx, no.of.estimate)[]
#    study no.of.estimate summary indx
# 1:     E              1       1    1
# 2:     A              1       7    2
# 3:     A              2       2    2
# 4:     A              3       5    2
# 5:     F              1       3    3
# 6:     B              1       6    4
# 7:     C              1       8    5
# 8:     D              1       9    6

#1


2  

You could try

你可以试试

library(dplyr)
df %>% 
     mutate(study=factor(study, levels=unique(study))) %>%
     arrange(study,no.of.estimate)
  #  study no.of.estimate summary
  #1     E              1       1
  #2     A              1       7
  #3     A              2       2
  #4     A              3       5
  #5     F              1       3
  #6     B              1       6
  #7     C              1       8
  #8     D              1       9

Or a base R approach

或者基R方法

df$study <- factor(df$study, levels=unique(df$study))
df[with(df, order(study, no.of.estimate)), ]

data

df <- structure(list(study = structure(c(5L, 1L, 6L, 1L, 2L, 1L, 3L, 
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), 
no.of.estimate = c(1, 2, 1, 3, 1, 1, 1, 1), summary = c(1, 
2, 3, 5, 6, 7, 8, 9)), .Names = c("study", "no.of.estimate", 
"summary"), row.names = c(NA, -8L), class = "data.frame")

The expected dataset is

预期的数据集

df1 <- structure(list(study = structure(c(5L, 1L, 1L, 1L, 6L, 2L, 3L, 
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), 
no.of.estimate = c(1, 1, 2, 3, 1, 1, 1, 1), summary = c(1, 
7, 2, 5, 3, 6, 8, 9)), .Names = c("study", "no.of.estimate", 
"summary"), row.names = c(NA, -8L), class = "data.frame")

#2


2  

Here's my data.table attempt while leaving your columns as is and creating a new index (though see my comment first). It's main advantage that you will update your data set by reference rather than creating new copies

这是我的数据。在保持列不变并创建新索引时尝试表(不过请先查看我的注释)。您将根据引用更新数据集,而不是创建新的副本,这是主要的优势

library(data.table)
setorder(setDT(df)[, indx := .GRP, study], indx, no.of.estimate)[]
#    study no.of.estimate summary indx
# 1:     E              1       1    1
# 2:     A              1       7    2
# 3:     A              2       2    2
# 4:     A              3       5    2
# 5:     F              1       3    3
# 6:     B              1       6    4
# 7:     C              1       8    5
# 8:     D              1       9    6