R - 在循环中使用子集和get()

时间:2022-07-05 21:21:37

If I have a list of dataframes like here - with reproducible example:

如果我有一个像这里的数据框列表 - 可重复的示例:

df1 <- data.frame(
    'Country' = sample(c("United States", "Canada"), 10, replace = TRUE),
    'Region' = sample(c("Unknown"), 10, replace = TRUE)
)
df2 <- data.frame(
    'Country' = sample(c("United States", "Canada"), 10, replace = TRUE),
    'Region' = sample(c("Unknown"), 10, replace = TRUE)
)
df3 <- data.frame(
    'Country' = sample(c("United States", "Canada"), 10, replace = TRUE),
    'Region' = sample(c("Unknown"), 10, replace = TRUE)
)

dflist <- c('df1', 'df2', 'df3')

When I loop through the DFs like below, I get an error for the subset line.

当我循环通过下面的DF时,我得到子集行的错误。

for (i in unique(dflist)) {
  print(paste(i, nrow(get(i)), sep = ','))
  subset(get(i), site_country_code == 'United States')$Region <<- 'NA'
}

I get this:

我明白了:

[1] "df1,10"

Error in subset(get(i), site_country_code == "United States")$Region <<- "North America" : 
  object 'i' not found

The print line seems to work - returns the name of the df and the row count. However the subset fails with this 'i' not found error. Doesn't subset understand get(i)? is there a way around this?

打印行似乎有效 - 返回df的名称和行数。但是,该子集因“找不到”错误而失败。子集是否理解get(i)?有没有解决的办法?

1 个解决方案

#1


1  

Try

尝试

  lst1 <- lapply(mget(dflist), function(x) {
           x$Region <- as.character(x$Region)
           x$Region[x$Country == "United States"] <- "NA"
           x
         })

In the above code, mget returns the values of the vector dflist in a list. Used lapply to process the list. Converted the factor column Region to character class before assigning Region code to NA for the United States Country. Then use list2env to reflect the changes in the original dataset.

在上面的代码中,mget在列表中返回向量dflist的值。使用lapply处理列表。在将区域代码分配给美国国家/地区的NA之前,将因子列Region转换为字符类。然后使用list2env反映原始数据集中的更改。

  list2env(lst1, envir=.GlobalEnv)
  #<environment: R_GlobalEnv>

  head(df1,4)
  #      Country  Region
 #1        Canada Unknown
 #2        Canada Unknown
 #3        Canada Unknown
 #4 United States      NA

If you don't want to change the column to character, you could first create a NA level for the Region before doing the assignment.

如果您不想将列更改为字符,则可以在执行分配之前首先为Region创建NA级别。

 lst1 <- lapply(mget(dflist), function(x) {
      levels(x$Region) <- c(levels(x$Region), "NA")
      x$Region[x$Country == "United States"] <- "NA"
      x
    })

and then use list2env

然后使用list2env

#1


1  

Try

尝试

  lst1 <- lapply(mget(dflist), function(x) {
           x$Region <- as.character(x$Region)
           x$Region[x$Country == "United States"] <- "NA"
           x
         })

In the above code, mget returns the values of the vector dflist in a list. Used lapply to process the list. Converted the factor column Region to character class before assigning Region code to NA for the United States Country. Then use list2env to reflect the changes in the original dataset.

在上面的代码中,mget在列表中返回向量dflist的值。使用lapply处理列表。在将区域代码分配给美国国家/地区的NA之前,将因子列Region转换为字符类。然后使用list2env反映原始数据集中的更改。

  list2env(lst1, envir=.GlobalEnv)
  #<environment: R_GlobalEnv>

  head(df1,4)
  #      Country  Region
 #1        Canada Unknown
 #2        Canada Unknown
 #3        Canada Unknown
 #4 United States      NA

If you don't want to change the column to character, you could first create a NA level for the Region before doing the assignment.

如果您不想将列更改为字符,则可以在执行分配之前首先为Region创建NA级别。

 lst1 <- lapply(mget(dflist), function(x) {
      levels(x$Region) <- c(levels(x$Region), "NA")
      x$Region[x$Country == "United States"] <- "NA"
      x
    })

and then use list2env

然后使用list2env