This came up just in an answer to another question here. When you rbind
two data frames, it matches columns by name rather than index, which can lead to unexpected behavior:
这是对另一个问题的回答。当您绑定两个数据帧时,它根据名称而不是索引匹配列,这可能导致意外的行为:
> df<-data.frame(x=1:2,y=3:4)
> df
x y
1 1 3
2 2 4
> rbind(df,df[,2:1])
x y
1 1 3
2 2 4
3 1 3
4 2 4
Of course, there are workarounds. For example:
当然,也有解决办法。例如:
rbind(df,rename(df[,2:1],names(df)))
data.frame(rbind(as.matrix(df),as.matrix(df[,2:1])))
On edit: rename
from the plyr
package doesn't actually work this way (although I thought I had it working when I originally wrote this...). The way to do this by renaming is to use SimonO101's solution:
在编辑:从plyr包中重命名实际上不以这种方式工作(尽管我原以为我写这篇文章的时候它已经工作了…)重命名的方法是使用SimonO101的解决方案:
rbind(df,setNames(df[,2:1],names(df)))
Also, maybe surprisingly,
另外,也许令人惊讶的是,
data.frame(rbindlist(list(df,df[,2:1])))
works by index (and if we don't mind a data table, then it's pretty concise), so this is a difference between do.call(rbind)
.
按索引工作(如果我们不介意数据表,那么它是相当简洁的),所以这是do.call(rbind)之间的区别。
The question is, what is the most concise way to rbind
two data frames where the names don't match? I know this seems trivial, but this kind of thing can end up cluttering code. And I don't want to have to write a new function called rbindByIndex
. Ideally it would be something like rbind(df,df[,2:1],byIndex=T)
.
问题是,在名称不匹配的情况下,最简洁的方法是什么?我知道这看起来很琐碎,但这类事情最终会导致代码混乱。我不想写一个叫rbindByIndex的新函数。理想情况下,它应该类似于rbind(df,df[,2:1],byIndex=T)。
2 个解决方案
#1
37
You might find setNames
handy here...
你可能会在这里找到合适的名字……
rbind(df, setNames(rev(df), names(df)))
# x y
#1 1 3
#2 2 4
#3 3 1
#4 4 2
I suspect your real use-case is somewhat more complex. You can of course reorder columns in the first argument of setNames
as you wish, just use names(df)
in the second argument, so that the names of the reordered columns match the original.
我怀疑您的实际用例有些复杂。当然,您可以按照自己的意愿在setNames的第一个参数中重新排序列,只需在第二个参数中使用names(df),以便重新排序列的名称与原来的列匹配。
#2
7
This seems pretty easy:
这似乎很简单:
mapply(c,df,df[,2:1])
x y
[1,] 1 3
[2,] 2 4
[3,] 3 1
[4,] 4 2
For this simple case, though, you have to turn it back into a dataframe (because mapply
simplifies it to a matrix):
但是,对于这个简单的例子,您必须将它转换为dataframe(因为mapply将它简化为一个矩阵):
as.data.frame(mapply(c,df,df[,2:1]))
x y
1 1 3
2 2 4
3 3 1
4 4 2
Important note 1: There appears to be a downside of type coercion when your dataframe contains vectors of different types:
重要注意1:当您的dataframe包含不同类型的向量时,类型强制似乎有一个缺点:
df<-data.frame(x=1:2,y=3:4,z=c('a','b'))
mapply(c,df,df[,c(2:1,3)])
x y z
[1,] 1 3 2
[2,] 2 4 1
[3,] 3 1 2
[4,] 4 2 1
Important note 2: It also is terrible if you have factors.
重要提示2:如果你有一些因素,那也很糟糕。
df<-data.frame(x=factor(1:2),y=factor(3:4))
mapply(c,df[,1:2],df[,2:1])
x y
[1,] 1 1
[2,] 2 2
[3,] 1 1
[4,] 2 2
So, as long as you have all numeric data, it's okay.
所以,只要你有所有的数值数据就可以了。
#1
37
You might find setNames
handy here...
你可能会在这里找到合适的名字……
rbind(df, setNames(rev(df), names(df)))
# x y
#1 1 3
#2 2 4
#3 3 1
#4 4 2
I suspect your real use-case is somewhat more complex. You can of course reorder columns in the first argument of setNames
as you wish, just use names(df)
in the second argument, so that the names of the reordered columns match the original.
我怀疑您的实际用例有些复杂。当然,您可以按照自己的意愿在setNames的第一个参数中重新排序列,只需在第二个参数中使用names(df),以便重新排序列的名称与原来的列匹配。
#2
7
This seems pretty easy:
这似乎很简单:
mapply(c,df,df[,2:1])
x y
[1,] 1 3
[2,] 2 4
[3,] 3 1
[4,] 4 2
For this simple case, though, you have to turn it back into a dataframe (because mapply
simplifies it to a matrix):
但是,对于这个简单的例子,您必须将它转换为dataframe(因为mapply将它简化为一个矩阵):
as.data.frame(mapply(c,df,df[,2:1]))
x y
1 1 3
2 2 4
3 3 1
4 4 2
Important note 1: There appears to be a downside of type coercion when your dataframe contains vectors of different types:
重要注意1:当您的dataframe包含不同类型的向量时,类型强制似乎有一个缺点:
df<-data.frame(x=1:2,y=3:4,z=c('a','b'))
mapply(c,df,df[,c(2:1,3)])
x y z
[1,] 1 3 2
[2,] 2 4 1
[3,] 3 1 2
[4,] 4 2 1
Important note 2: It also is terrible if you have factors.
重要提示2:如果你有一些因素,那也很糟糕。
df<-data.frame(x=factor(1:2),y=factor(3:4))
mapply(c,df[,1:2],df[,2:1])
x y
[1,] 1 1
[2,] 2 2
[3,] 1 1
[4,] 2 2
So, as long as you have all numeric data, it's okay.
所以,只要你有所有的数值数据就可以了。