R:如何对列表中的所有数据帧进行子集化?

时间:2020-12-03 18:39:15

I have a list of data-frames called WaFramesCosts. I want to simply subset it to show specific columns so that I can then export them. I have tried:

我有一个名为WaFramesCosts的数据框列表。我想简单地将其子集化以显示特定列,以便我可以将它们导出。我试过了:

    for (i in names(WaFramesCosts)) {
      WaFramesCosts[[i]][,c("Cost_Center","Domestic_Anytime_Min_Used","Department",
"Domestic_Anytime_Min_Used")]

    }

but it returns the error of

但它返回错误

Error in `[.data.frame`(WaFramesCosts[[i]], , c("Cost_Center", "Department",  : 
  undefined columns selected

I also tried:

我也尝试过:

for (i in seq_along(WaFramesCosts)){
WaFramesCosts[[i]][ , -which(names(WaFramesCosts[[i]]) %in% c("Cost_Center","Domestic_Anytime_Min_Used","Department",
    "Domestic_Anytime_Min_Used"))]

but I get the same error. Can anyone see what I am doing wrong?

但我得到了同样的错误。谁能看到我做错了什么?

Side Note: For reference, I used this:

附注:作为参考,我使用了这个:

for (i in seq_along(WaFramesCosts)) {
    t <- WaFramesCosts[[i]][ , grepl( "Domestic" , names( WaFramesCosts[[i]] ) )] 
    q <- subset(WaFramesCosts[[i]], select = c("Cost_Center","Domestic_Anytime_Min_Used","Department","Domestic_Anytime_Min_Used"))  

    WaFramesCosts[[i]] <- merge(q,t)
  }

while attempting the same goal with a different approach and seemed to get closer.

用不同的方法尝试相同的目标,似乎越来越接近。

2 个解决方案

#1


1  

Welcome back, Kootseeahknee. You are still incorrectly assuming that the last command of a for loop is implicitly returned at the end. If you want that behavior, perhaps you want lapply:

欢迎回来,Kootseeahknee。您仍然错误地认为最后会隐式返回for循环的最后一个命令。如果你想要这种行为,也许你想要lapply:

myoutput <- lapply(names(WaFramesCosts)), function(i) {
  WaFramesCosts[[i]][,c("Cost_Center","Domestic_Anytime_Min_Used","Department","Domestic_Anytime_Min_Used")]
})

The undefined columns selected error tells me that your assumptions of the datasets are not correct: at least one is missing at least one of the columns. From your previous question (How to do a complex edit of columns of all data frames in a list?), I'm inferring that you want columns that match, not assuming that it is in everything. From that, you could/should be using grep or some variant:

选定的未定义列错误告诉我您对数据集的假设不正确:至少有一个列缺少至少一个列。从您之前的问题(如何对列表中所有数据框的列进行复杂编辑?),我推断您希望列匹配,而不是假设它在所有内容中。从那,你可以/应该使用grep或一些变体:

myoutput <- lapply(names(WaFramesCosts)), function(i) {
  WaFramesCosts[[i]][,grep("(Cost_Center|Domestic_Anytime_Min_Used|Department)", 
                           colnames(WaFramesCosts)),drop=FALSE]
})

This will match column names that contain any of those strings. You can be a lot more precise by ensuring whole strings or start/end matches occur by using regular expressions. For instance, changing from (Cost|Dom) (anything that contains "Cost" or "Dom") to (^Cost|Dom) means anything that starts with "Cost" or contains "Dom"; similarly, (Cost|ment$) matches anything that contains "Cost" or ends with "ment". If, however, you always want exact matches and just need those that exist, then something like this will work:

这将匹配包含任何字符串的列名称。通过使用正则表达式确保整个字符串或开始/结束匹配发生,您可以更加精确。例如,从(Cost | Dom)(包含“Cost”或“Dom”的任何内容)更改为(^ Cost | Dom)表示以“Cost”开头或包含“Dom”的任何内容;类似地,(Cost | ment $)匹配包含“Cost”或以“ment”结尾的任何内容。但是,如果你总是想要完全匹配并且只需要那些存在的东西,那么这样的东西就会起作用:

myoutput <- lapply(names(WaFramesCosts)), function(i) {
  WaFramesCosts[[i]][,intersect(c("Cost_Center","Domestic_Anytime_Min_Used","Department"),
                                colnames(WaFramesCosts)),drop=FALSE]
})

Note, in that last example: notice the difference between mtcars[,2] (returns a vector) and mtcars[,2,drop=FALSE] (returns a data.frame with 1 column). Defensive programming, if you think it at all possible that your filtering will return a single-column, make sure you do not inadvertently convert to a vector by appending ,drop=FALSE to your bracket-subsetting.

注意,在最后一个例子中:注意mtcars [,2](返回向量)和mtcars [,2,drop = FALSE]之间的区别(返回带有1列的data.frame)。防御性编程,如果您认为您的过滤将返回单列,请确保您不会无意中转换为向量,方法是将drop = FALSE附加到括号子集。

#2


1  

Based on your description, this is an example of using library dplyr to achieve combining a list of data frames for a given set of columns. This doesn't require all data frames to have identical columns (Providing your data in a reproducible example would be better)

根据您的描述,这是使用库dplyr实现组合给定列集的数据帧列表的示例。这并不要求所有数据帧都具有相同的列(在可重现的示例中提供数据会更好)

# test data

df1 = read.table(text = "
c1 c2 c3
a 1 101
b 2 102
", header = TRUE, stringsAsFactors = FALSE)

df2 = read.table(text = "
c1 c2 c3
w 11 201
x 12 202
", header = TRUE, stringsAsFactors = FALSE)

# dfs is a list of data frames
dfs <- list(df1, df2)

# use dplyr::bind_rows
library(dplyr)

cols <- c("c1", "c3")
result <- bind_rows(dfs)[cols]

result

# c1  c3
# 1  a 101
# 2  b 102
# 3  w 201
# 4  x 202

#1


1  

Welcome back, Kootseeahknee. You are still incorrectly assuming that the last command of a for loop is implicitly returned at the end. If you want that behavior, perhaps you want lapply:

欢迎回来,Kootseeahknee。您仍然错误地认为最后会隐式返回for循环的最后一个命令。如果你想要这种行为,也许你想要lapply:

myoutput <- lapply(names(WaFramesCosts)), function(i) {
  WaFramesCosts[[i]][,c("Cost_Center","Domestic_Anytime_Min_Used","Department","Domestic_Anytime_Min_Used")]
})

The undefined columns selected error tells me that your assumptions of the datasets are not correct: at least one is missing at least one of the columns. From your previous question (How to do a complex edit of columns of all data frames in a list?), I'm inferring that you want columns that match, not assuming that it is in everything. From that, you could/should be using grep or some variant:

选定的未定义列错误告诉我您对数据集的假设不正确:至少有一个列缺少至少一个列。从您之前的问题(如何对列表中所有数据框的列进行复杂编辑?),我推断您希望列匹配,而不是假设它在所有内容中。从那,你可以/应该使用grep或一些变体:

myoutput <- lapply(names(WaFramesCosts)), function(i) {
  WaFramesCosts[[i]][,grep("(Cost_Center|Domestic_Anytime_Min_Used|Department)", 
                           colnames(WaFramesCosts)),drop=FALSE]
})

This will match column names that contain any of those strings. You can be a lot more precise by ensuring whole strings or start/end matches occur by using regular expressions. For instance, changing from (Cost|Dom) (anything that contains "Cost" or "Dom") to (^Cost|Dom) means anything that starts with "Cost" or contains "Dom"; similarly, (Cost|ment$) matches anything that contains "Cost" or ends with "ment". If, however, you always want exact matches and just need those that exist, then something like this will work:

这将匹配包含任何字符串的列名称。通过使用正则表达式确保整个字符串或开始/结束匹配发生,您可以更加精确。例如,从(Cost | Dom)(包含“Cost”或“Dom”的任何内容)更改为(^ Cost | Dom)表示以“Cost”开头或包含“Dom”的任何内容;类似地,(Cost | ment $)匹配包含“Cost”或以“ment”结尾的任何内容。但是,如果你总是想要完全匹配并且只需要那些存在的东西,那么这样的东西就会起作用:

myoutput <- lapply(names(WaFramesCosts)), function(i) {
  WaFramesCosts[[i]][,intersect(c("Cost_Center","Domestic_Anytime_Min_Used","Department"),
                                colnames(WaFramesCosts)),drop=FALSE]
})

Note, in that last example: notice the difference between mtcars[,2] (returns a vector) and mtcars[,2,drop=FALSE] (returns a data.frame with 1 column). Defensive programming, if you think it at all possible that your filtering will return a single-column, make sure you do not inadvertently convert to a vector by appending ,drop=FALSE to your bracket-subsetting.

注意,在最后一个例子中:注意mtcars [,2](返回向量)和mtcars [,2,drop = FALSE]之间的区别(返回带有1列的data.frame)。防御性编程,如果您认为您的过滤将返回单列,请确保您不会无意中转换为向量,方法是将drop = FALSE附加到括号子集。

#2


1  

Based on your description, this is an example of using library dplyr to achieve combining a list of data frames for a given set of columns. This doesn't require all data frames to have identical columns (Providing your data in a reproducible example would be better)

根据您的描述,这是使用库dplyr实现组合给定列集的数据帧列表的示例。这并不要求所有数据帧都具有相同的列(在可重现的示例中提供数据会更好)

# test data

df1 = read.table(text = "
c1 c2 c3
a 1 101
b 2 102
", header = TRUE, stringsAsFactors = FALSE)

df2 = read.table(text = "
c1 c2 c3
w 11 201
x 12 202
", header = TRUE, stringsAsFactors = FALSE)

# dfs is a list of data frames
dfs <- list(df1, df2)

# use dplyr::bind_rows
library(dplyr)

cols <- c("c1", "c3")
result <- bind_rows(dfs)[cols]

result

# c1  c3
# 1  a 101
# 2  b 102
# 3  w 201
# 4  x 202