R data.table更改R名称

时间:2021-05-27 23:23:18

I created small data.table DT = data.table(a=1:2, a=1:2).

我创建了小数据。表DT = data.table(a = 1:2,a = 1:2)。

If I use names(DT) <- c("b","b")

如果我使用名字(DT)< - c(“b”,“b”)

I get a warning

我收到警告

In `names<-.data.table`(`*tmp*`, value = c("b", "b")) :
  The names(x)<-value syntax copies the whole table. This is due to <- in R itself. Please change to setnames(x,old,new) which does not copy and is faster. See help('setnames'). You can safely ignore this warning if it is inconvenient to change right now. Setting options(warn=2) turns this warning into an error, so you can then use traceback() to find and change your names<- calls.

But if i use setnames(DT, names(DT), c("b","b"), then I get error

但如果我使用setnames(DT,名称(DT),c(“b”,“b”),那么我得到错误

Error in setnames(DT, names(DT), c("b", "b")) : 
  Some duplicates exist in 'old': a

If the same example do with data.frame than DT = data.frame(a=1:2, a=1:2) and use names(DT) <- c("b","b") then I get no error.

如果相同的例子使用data.frame而不是DT = data.frame(a = 1:2,a = 1:2)并且使用名称(DT)< - c(“b”,“b”)那么我得不到错误。

2 个解决方案

#1


25  

Don't provide old and new and you won't have a problem. However, that's not the issue. In base::data.frame you can't have columns of the same name so...

不提供新旧,你不会有问题。但是,这不是问题。在base :: data.frame中,你不能有相同名称的列,所以......

#  What you actually get...
DT = data.frame(a=1:2, a=1:2); names(DT)
#[1] "a"   "a.1"

But it seems that in data.table you can have columns of the same name...

但似乎在data.table中你可以拥有相同名称的列......

DT = data.table(a=1:2, a=1:2); names(DT)
[1] "a" "a"

But setnames throws an error, I guess because it doesn't know which column a refers to when both columns are called a. You get no error when going the data.frame to data.table route because you do not have duplicated column names.

但是setnames会抛出一个错误,我猜是因为当两个列都被称为a时,它不知道a引用了哪一列。将data.frame转到data.table路由时没有错误,因为您没有重复的列名。

Firstly I'd say don't make columns with the same name, this is a really bad thing if you plan to use your data.table programmatically (but as @MatthewDowle points out in the comments, this is a design choice to give the user maximum freedom in data.table).

首先我要说的是不要创建具有相同名称的列,如果您计划以编程方式使用data.table,这是一件非常糟糕的事情(但正如@MatthewDowle在评论中指出的那样,这是一个设计选择用户在data.table中的最大*度)。

If you need to do it then use setnames with just the old argument given, which will actually be treated as the new names when new is not given. If you pass in old names and a vector of new names the old names are found and those changed to the corresponding new name (so old and new have to be the same length when setnames is used with 3 parameters). setnames will catch any ambiguities via:

如果你需要这样做,那么只使用给定的旧参数使用setnames,当没有给出new时,它实际上将被视为新名称。如果传入旧名称和新名称的向量,则会找到旧名称,并将其更改为相应的新名称(因此,当使用3个参数设置名称时,旧的和​​新的名称必须相同)。 setnames将通过以下方式捕捉任何歧义:

if (any(duplicated(old))) 
           stop("Some duplicates exist in 'old': ", paste(old[duplicated(old)],
                collapse = ","))
if (any(duplicated(names(x)))) 
           stop("'old' is character but there are duplicate column names: ", 
                paste(names(x)[duplicated(names(x))], collapse = ",")) 

When just old is supplied setnames will reassign the names from old to the columns of DT column-wise using .Call(Csetcharvec, names(x), seq_along(names(x)), old), so from first to last...

当只提供old时,setnames将使用.Call(Csetcharvec,names(x),seq_along(names(x)),old)从旧列到DT列重新分配名称,所以从头到尾......

DT = data.table(a=1:2, a=1:2)
setnames(DT, c("b","b") )
DT
#   b b
#1: 1 1
#2: 2 2

Addition from Matthew as requested. In ?setnames there's some background :

根据要求增加马修。在?setnames中有一些背景:

It isn't good programming practice, in general, to use column numbers rather than names. This is why setkey and setkeyv only accept column names, and why old in setnames() is recommended to be names. If you use column numbers then bugs (possibly silent) can more easily creep into your code as time progresses if changes are made elsewhere in your code; e.g., if you add, remove or reorder columns in a few months time, a setkey by column number will then refer to a different column, possibly returning incorrect results with no warning. (A similar concept exists in SQL, where "select * from ..." is considered poor programming style when a robust, maintainable system is required.) If you really wish to use column numbers, it's possible but deliberately a little harder; e.g., setkeyv(DT,colnames(DT)[1:2]).

一般来说,使用列号而不是名称是不好的编程习惯。这就是为什么setkey和setkeyv只接受列名,以及为什么建议将setnames()中的旧名称作为名称。如果使用列号,那么随着时间的推移,如果代码中的其他地方发生了更改,则错误(可能是静默的)可能会更容易蔓延到您的代码中;例如,如果您在几个月内添加,删除或重新排序列,则按列号的setkey将引用不同的列,可能返回不正确的结果且没有警告。 (在SQL中存在类似的概念,当需要一个健壮,可维护的系统时,“select * from ...”被认为是糟糕的编程风格。)如果你真的希望使用列号,那么它可能会有意但有点困难;例如,setkeyv(DT,colnames(DT)[1:2])。

[As of July 2017, the note above no longer appears in ?setnames, but the issue is discussed near the top of the FAQ, vignette('datatable-faq').]

[截至2017年7月,上述注释不再出现在?setnames中,但问题在FAQ的顶部附近讨论,插图('datatable-faq')。

So the idea of setnames is to change one column name really easily, by name.

所以setnames的想法是通过名称很容易地改变一个列名。

setnames(DT, "oldname", "newname")

If "oldname" is not a column name or there's any ambiguity over what you intend (either in the data now or in a few months time after your colleagues have changed the source database or other code upstream or have passed their own data to your module) then data.table will catch it for you. That's actually quite hard to do in base as easily and as well as setnames does it (including the safety checks).

如果“oldname”不是列名,或者对您的意图有任何歧义(现在或者在您的同事更改源数据库或上游其他代码或将他们自己的数据传递到您的模块后的几个月内) )然后data.table将为您捕获它。这实际上很难在基地进行,也很容易做到(包括安全检查)。

#2


1  

And, setnames can be used for changing multiple column names at once:

并且,setnames可用于一次更改多个列名:

setnames(DT, old = c("oldname1", "oldname2", "oldname3"), new = c("newname1", "newname2", "newname3"))

setnames(DT,old = c(“oldname1”,“oldname2”,“oldname3”),new = c(“newname1”,“newname2”,“newname3”))

#1


25  

Don't provide old and new and you won't have a problem. However, that's not the issue. In base::data.frame you can't have columns of the same name so...

不提供新旧,你不会有问题。但是,这不是问题。在base :: data.frame中,你不能有相同名称的列,所以......

#  What you actually get...
DT = data.frame(a=1:2, a=1:2); names(DT)
#[1] "a"   "a.1"

But it seems that in data.table you can have columns of the same name...

但似乎在data.table中你可以拥有相同名称的列......

DT = data.table(a=1:2, a=1:2); names(DT)
[1] "a" "a"

But setnames throws an error, I guess because it doesn't know which column a refers to when both columns are called a. You get no error when going the data.frame to data.table route because you do not have duplicated column names.

但是setnames会抛出一个错误,我猜是因为当两个列都被称为a时,它不知道a引用了哪一列。将data.frame转到data.table路由时没有错误,因为您没有重复的列名。

Firstly I'd say don't make columns with the same name, this is a really bad thing if you plan to use your data.table programmatically (but as @MatthewDowle points out in the comments, this is a design choice to give the user maximum freedom in data.table).

首先我要说的是不要创建具有相同名称的列,如果您计划以编程方式使用data.table,这是一件非常糟糕的事情(但正如@MatthewDowle在评论中指出的那样,这是一个设计选择用户在data.table中的最大*度)。

If you need to do it then use setnames with just the old argument given, which will actually be treated as the new names when new is not given. If you pass in old names and a vector of new names the old names are found and those changed to the corresponding new name (so old and new have to be the same length when setnames is used with 3 parameters). setnames will catch any ambiguities via:

如果你需要这样做,那么只使用给定的旧参数使用setnames,当没有给出new时,它实际上将被视为新名称。如果传入旧名称和新名称的向量,则会找到旧名称,并将其更改为相应的新名称(因此,当使用3个参数设置名称时,旧的和​​新的名称必须相同)。 setnames将通过以下方式捕捉任何歧义:

if (any(duplicated(old))) 
           stop("Some duplicates exist in 'old': ", paste(old[duplicated(old)],
                collapse = ","))
if (any(duplicated(names(x)))) 
           stop("'old' is character but there are duplicate column names: ", 
                paste(names(x)[duplicated(names(x))], collapse = ",")) 

When just old is supplied setnames will reassign the names from old to the columns of DT column-wise using .Call(Csetcharvec, names(x), seq_along(names(x)), old), so from first to last...

当只提供old时,setnames将使用.Call(Csetcharvec,names(x),seq_along(names(x)),old)从旧列到DT列重新分配名称,所以从头到尾......

DT = data.table(a=1:2, a=1:2)
setnames(DT, c("b","b") )
DT
#   b b
#1: 1 1
#2: 2 2

Addition from Matthew as requested. In ?setnames there's some background :

根据要求增加马修。在?setnames中有一些背景:

It isn't good programming practice, in general, to use column numbers rather than names. This is why setkey and setkeyv only accept column names, and why old in setnames() is recommended to be names. If you use column numbers then bugs (possibly silent) can more easily creep into your code as time progresses if changes are made elsewhere in your code; e.g., if you add, remove or reorder columns in a few months time, a setkey by column number will then refer to a different column, possibly returning incorrect results with no warning. (A similar concept exists in SQL, where "select * from ..." is considered poor programming style when a robust, maintainable system is required.) If you really wish to use column numbers, it's possible but deliberately a little harder; e.g., setkeyv(DT,colnames(DT)[1:2]).

一般来说,使用列号而不是名称是不好的编程习惯。这就是为什么setkey和setkeyv只接受列名,以及为什么建议将setnames()中的旧名称作为名称。如果使用列号,那么随着时间的推移,如果代码中的其他地方发生了更改,则错误(可能是静默的)可能会更容易蔓延到您的代码中;例如,如果您在几个月内添加,删除或重新排序列,则按列号的setkey将引用不同的列,可能返回不正确的结果且没有警告。 (在SQL中存在类似的概念,当需要一个健壮,可维护的系统时,“select * from ...”被认为是糟糕的编程风格。)如果你真的希望使用列号,那么它可能会有意但有点困难;例如,setkeyv(DT,colnames(DT)[1:2])。

[As of July 2017, the note above no longer appears in ?setnames, but the issue is discussed near the top of the FAQ, vignette('datatable-faq').]

[截至2017年7月,上述注释不再出现在?setnames中,但问题在FAQ的顶部附近讨论,插图('datatable-faq')。

So the idea of setnames is to change one column name really easily, by name.

所以setnames的想法是通过名称很容易地改变一个列名。

setnames(DT, "oldname", "newname")

If "oldname" is not a column name or there's any ambiguity over what you intend (either in the data now or in a few months time after your colleagues have changed the source database or other code upstream or have passed their own data to your module) then data.table will catch it for you. That's actually quite hard to do in base as easily and as well as setnames does it (including the safety checks).

如果“oldname”不是列名,或者对您的意图有任何歧义(现在或者在您的同事更改源数据库或上游其他代码或将他们自己的数据传递到您的模块后的几个月内) )然后data.table将为您捕获它。这实际上很难在基地进行,也很容易做到(包括安全检查)。

#2


1  

And, setnames can be used for changing multiple column names at once:

并且,setnames可用于一次更改多个列名:

setnames(DT, old = c("oldname1", "oldname2", "oldname3"), new = c("newname1", "newname2", "newname3"))

setnames(DT,old = c(“oldname1”,“oldname2”,“oldname3”),new = c(“newname1”,“newname2”,“newname3”))