重命名多个数据帧中的列,R

时间:2022-08-07 10:47:17

I am trying to rename columns of multiple data.frames.

我正在尝试重命名多个data.frames的列。

To give an example, let's say I've a list of data.frames dfA, dfB and dfC. I wrote a function changeNames to set names accordingly and then used lapply as follows:

举个例子,假设我有一个data.frames dfA,dfB和dfC列表。我写了一个函数changeNames来相应地设置名称,然后使用lapply,如下所示:

dfs <- list(dfA, dfB, dfC)
ChangeNames <- function(x) {
    names(x) <- c("A", "B", "C" )  
}
lapply(dfs, ChangeNames)

However, this doesn't work as expected. It seems that I am not assigning the new names to the data.frame, rather only creating the new names. What am I doing wrong here?

但是,这不能按预期工作。似乎我没有将新名称分配给data.frame,而只是创建新名称。我在这做错了什么?

Thank you in advance!

先谢谢你!

3 个解决方案

#1


12  

There are two things here:

这里有两件事:

  • 1) You should return the value you want from your function. Else, the last value will be returned. In your case, that's names(x). So, instead you should add as the final line, return(x) or simply x. So, your function would look like:

    1)您应该从函数返回所需的值。否则,将返回最后一个值。在你的情况下,这是名称(x)。所以,你应该添加最后一行,返回(x)或简单地x。所以,你的功能看起来像:

    ChangeNames <- function(x) {
        names(x) <- c("A", "B", "C" )
        return(x)
    }
    
  • 2) lapply does not modify your input objects by reference. It works on a copy. So, you'll have to assign the results back. Or another alternative is to use for-loops instead of lapply:

    2)lapply不会通过引用修改输入对象。它适用于副本。因此,您必须重新分配结果。或者另一种方法是使用for循环而不是lapply:

    # option 1
    dfs <- lapply(dfs, ChangeNames)
    
    # option 2
    for (i in seq_along(dfs)) {
        names(dfs[[i]]) <- c("A", "B", "C")
    }
    

Even using the for-loop, you'll still make a copy (because names(.) <- . does). You can verify this by using tracemem.

即使使用for循环,你仍然会复制(因为名称(。)< - 。)。您可以使用tracemem验证这一点。

df <- data.frame(x=1:5, y=6:10, z=11:15)
tracemem(df)
# [1] "<0x7f98ec24a480>"
names(df) <- c("A", "B", "C")
tracemem(df)
# [1] "<0x7f98e7f9e318>"

If you want to modify by reference, you can use data.table package's setnames function:

如果要通过引用进行修改,可以使用data.table包的setnames函数:

df <- data.frame(x=1:5, y=6:10, z=11:15)
require(data.table)
tracemem(df)
# [1] "<0x7f98ec76d7b0>"
setnames(df, c("A", "B", "C"))
tracemem(df)
# [1] "<0x7f98ec76d7b0>"

You see that the memory location df is mapped to hasn't changed. The names have been modified by reference.

您看到映射到的内存位置df未更改。名称已通过参考修改。

#2


7  

If the dataframes were not in a list but just in the global environment, you could refer to them using a vector of string names.

如果数据帧不在列表中但仅在全局环境中,则可以使用字符串名称向量来引用它们。

dfs <- c("dfA", "dfB", "dfC")

for(df in dfs) {
  df.tmp <- get(df)
  names(df.tmp) <- c("A", "B", "C" ) 
  assign(df, df.tmp)
}

There is probably a way to simplify this without having to resort to creating a temporary dataset, but I haven't worked it out!

可能有一种方法可以简化这一过程,而无需求助于创建临时数据集,但我还没有解决这个问题!

#3


-1  

I had the problem of importing a public data set and having to rename each dataframe and rename each column in each dataframe to trim whitespaces, lowercase, and replace internal spaces with periods.

我遇到了导入公共数据集并且必须重命名每个数据帧并重命名每个数据帧中的每一列以修剪空格,小写和用句点替换内部空格的问题。

Combining the above methods got me:

结合上述方法让我:

for (eachdf in dfs)
  df.tmp <- get(eachdf) 
    for (eachcol in 1:length(df.tmp))
      colnames(df.tmp)[eachcol] <-
      str_trim(str_to_lower(str_replace_all(colnames(df.tmp)[eachcol], " ", ".")))
      }
  assign(eachdf, df.tmp) 
}

#1


12  

There are two things here:

这里有两件事:

  • 1) You should return the value you want from your function. Else, the last value will be returned. In your case, that's names(x). So, instead you should add as the final line, return(x) or simply x. So, your function would look like:

    1)您应该从函数返回所需的值。否则,将返回最后一个值。在你的情况下,这是名称(x)。所以,你应该添加最后一行,返回(x)或简单地x。所以,你的功能看起来像:

    ChangeNames <- function(x) {
        names(x) <- c("A", "B", "C" )
        return(x)
    }
    
  • 2) lapply does not modify your input objects by reference. It works on a copy. So, you'll have to assign the results back. Or another alternative is to use for-loops instead of lapply:

    2)lapply不会通过引用修改输入对象。它适用于副本。因此,您必须重新分配结果。或者另一种方法是使用for循环而不是lapply:

    # option 1
    dfs <- lapply(dfs, ChangeNames)
    
    # option 2
    for (i in seq_along(dfs)) {
        names(dfs[[i]]) <- c("A", "B", "C")
    }
    

Even using the for-loop, you'll still make a copy (because names(.) <- . does). You can verify this by using tracemem.

即使使用for循环,你仍然会复制(因为名称(。)< - 。)。您可以使用tracemem验证这一点。

df <- data.frame(x=1:5, y=6:10, z=11:15)
tracemem(df)
# [1] "<0x7f98ec24a480>"
names(df) <- c("A", "B", "C")
tracemem(df)
# [1] "<0x7f98e7f9e318>"

If you want to modify by reference, you can use data.table package's setnames function:

如果要通过引用进行修改,可以使用data.table包的setnames函数:

df <- data.frame(x=1:5, y=6:10, z=11:15)
require(data.table)
tracemem(df)
# [1] "<0x7f98ec76d7b0>"
setnames(df, c("A", "B", "C"))
tracemem(df)
# [1] "<0x7f98ec76d7b0>"

You see that the memory location df is mapped to hasn't changed. The names have been modified by reference.

您看到映射到的内存位置df未更改。名称已通过参考修改。

#2


7  

If the dataframes were not in a list but just in the global environment, you could refer to them using a vector of string names.

如果数据帧不在列表中但仅在全局环境中,则可以使用字符串名称向量来引用它们。

dfs <- c("dfA", "dfB", "dfC")

for(df in dfs) {
  df.tmp <- get(df)
  names(df.tmp) <- c("A", "B", "C" ) 
  assign(df, df.tmp)
}

There is probably a way to simplify this without having to resort to creating a temporary dataset, but I haven't worked it out!

可能有一种方法可以简化这一过程,而无需求助于创建临时数据集,但我还没有解决这个问题!

#3


-1  

I had the problem of importing a public data set and having to rename each dataframe and rename each column in each dataframe to trim whitespaces, lowercase, and replace internal spaces with periods.

我遇到了导入公共数据集并且必须重命名每个数据帧并重命名每个数据帧中的每一列以修剪空格,小写和用句点替换内部空格的问题。

Combining the above methods got me:

结合上述方法让我:

for (eachdf in dfs)
  df.tmp <- get(eachdf) 
    for (eachcol in 1:length(df.tmp))
      colnames(df.tmp)[eachcol] <-
      str_trim(str_to_lower(str_replace_all(colnames(df.tmp)[eachcol], " ", ".")))
      }
  assign(eachdf, df.tmp) 
}