I have two data.table X and Y.
我有两个数据。表X和Y。
columns in X: area, id, value
columns in Y: ID, price, sales
X中的列:区域、id、Y中的值列:id、价格、销售
Create the two data.tables:
创建两个data.tables:
X = data.table(area=c('US', 'UK', 'EU'),
id=c('c001', 'c002', 'c003'),
value=c(100, 200, 300)
)
Y = data.table(ID=c('c001', 'c002', 'c003'),
price=c(500, 200, 400),
sales=c(20, 30, 15)
)
And I set keys for X and Y:
我为X和Y设置键:
setkey(X, id)
setkey(Y, ID)
Now I try to join X and Y by id
in X and ID
in Y:
现在我试着用X和Y来表示X和Y Y:
merge(X, Y)
merge(X, Y, by=c('id', 'ID'))
merge(X, Y, by.x='id', by.y='ID')
All raised error saying that column names in the by
argument invalid.
所有引起的错误都说by参数中的列名无效。
I referred to the manual of data.table and found the merge
function not supporting by.x
and by.y
arguments.
我查阅了数据手册。表中发现合并函数不支持by。x和。y参数。
How could I join two data.tables by different column names without changing the column names?
如何连接两个数据。表使用不同的列名而不更改列名?
Append:
I managed to join the two tables by X[Y]
, but why merge
function fails in data.table?
Append:我通过X[Y]连接了两个表,但是为什么在data.table中合并函数失败?
3 个解决方案
#1
12
Use this operation:
使用此操作:
X[Y]
# area id value price sales
# 1: US c001 100 500 20
# 2: UK c002 200 200 30
# 3: EU c003 300 400 15
or this operation:
或者这个操作:
Y[X]
# ID price sales area value
# 1: c001 500 20 US 100
# 2: c002 200 30 UK 200
# 3: c003 400 15 EU 300
Edit after you edited your question, I read Section 1.12 of the FAQ: "What is the didifference between X[Y] and merge(X,Y)?", which led me to checkout ?merge
and I discovered there are two different merge functions depending upon which package you are using. The default is merge.data.frame
but data.table uses merge.data.table
. Compare
编辑您的问题之后,我阅读了FAQ的第1.12节:“X[Y]和merge(X,Y)的didifference是什么?”,这导致我签出?merge,我发现有两个不同的合并函数,取决于您使用的包。默认值是merge.data.frame,而不是data。表使用merge.data.table。比较
merge(X, Y, by.x = "id", by.y = "ID") # which is merge.data.table
# Error in merge.data.table(X, Y, by.x = "id", by.y = "ID") :
# A non-empty vector of column names for `by` is required.
with
与
merge.data.frame(X, Y, by.x = "id", by.y = "ID")
# id area value price sales
# 1 c001 US 100 500 20
# 2 c002 UK 200 200 30
# 3 c003 EU 300 400 15
Edit for completeness based upon a comment by @Michael Bernsteiner, it looks like the data.table
team is planning on implementing by.x
and by.y
into the merge.data.table
function, but hasn't done so yet.
根据@Michael Bernsteiner的评论进行完整性编辑,它看起来像数据。table团队正在计划实现by。x和。merge.data y。表函数,但还没有。
#2
18
As of data.table
version 1.9.6 (on CRAN on sep 2015) you can specify the by.x
and by.y
arguments in data.table::merge
的数据。表1.9.6(在2015年9月的CRAN上)您可以指定by。x和。y参数data.table:合并
merge(x=X, y=Y, by.x="id", by.y="ID")[]
# id area value price sales
#1: c001 US 100 500 20
#2: c002 UK 200 200 30
#3: c003 EU 300 400 15
However, in data.table 1.9.6 you can also specfy the on
argument in the X[Y]
notation
然而,在数据。表1.9.6你也可以用X[Y]符号考察on参数
X[Y] syntax can now join without having to set keys by using the new on argument. For example: DT1[DT2, on=c(x = "y")] would join column "y" of DT2 with "x" of DT1. DT1[DT2, on="y"] would join column "y" of both data.tables.
X[Y]语法现在可以通过使用新的on参数来连接,而不必设置键。例如:DT1[DT2, on=c(x = "y")]将DT2的“y”列与DT1的“x”列连接。DT1[DT2, on="y"]将连接两个data.tables的列"y"。
X[Y, on=c(id = "ID")]
# area id value price sales
#1: US c001 100 500 20
#2: UK c002 200 200 30
#3: EU c003 300 400 15
this answer by the data.table
author has more details
这个答案是根据数据得出的。表作者将提供更多细节
#3
3
Merge fails when you use by.x
and by.y
with data.table
. Taking your data:
使用by时合并失败。x和。与data.table y。把你的数据:
> merge(X,Y, by.x='id', by.y='ID')
Error in merge.data.table(X, Y, by.x = "id", by.y = "ID")
You can use data.table
with merge , but you need to use by
argument for joining (so rename the columns to have the same colnames
)
您可以使用数据。具有merge的表,但是需要使用by参数进行连接(因此,将列重命名为具有相同的colname)
Y = setNames(Y,c('id','price','sales'))
This will still not work:
这仍然行不通:
merge(X,Y, by.x='id', by.y='id')
Error in merge.data.table(X, Y, by.x = "id", by.y = "id") :
But this will work:
但这将工作:
> merge(X,Y, by='id')
# id area value price sales
#1: c001 US 100 500 20
#2: c002 UK 200 200 30
#3: c003 EU 300 400 15
Alternatively, you would need to convert data.table
to data.frame
in order to use merge
with by.x
and by.y
arguments:
或者,您需要转换数据。表to data.frame,用于与by合并。x和。y参数:
merge(data.frame(X), data.frame(Y), by.x='id', by.y='ID')
#1
12
Use this operation:
使用此操作:
X[Y]
# area id value price sales
# 1: US c001 100 500 20
# 2: UK c002 200 200 30
# 3: EU c003 300 400 15
or this operation:
或者这个操作:
Y[X]
# ID price sales area value
# 1: c001 500 20 US 100
# 2: c002 200 30 UK 200
# 3: c003 400 15 EU 300
Edit after you edited your question, I read Section 1.12 of the FAQ: "What is the didifference between X[Y] and merge(X,Y)?", which led me to checkout ?merge
and I discovered there are two different merge functions depending upon which package you are using. The default is merge.data.frame
but data.table uses merge.data.table
. Compare
编辑您的问题之后,我阅读了FAQ的第1.12节:“X[Y]和merge(X,Y)的didifference是什么?”,这导致我签出?merge,我发现有两个不同的合并函数,取决于您使用的包。默认值是merge.data.frame,而不是data。表使用merge.data.table。比较
merge(X, Y, by.x = "id", by.y = "ID") # which is merge.data.table
# Error in merge.data.table(X, Y, by.x = "id", by.y = "ID") :
# A non-empty vector of column names for `by` is required.
with
与
merge.data.frame(X, Y, by.x = "id", by.y = "ID")
# id area value price sales
# 1 c001 US 100 500 20
# 2 c002 UK 200 200 30
# 3 c003 EU 300 400 15
Edit for completeness based upon a comment by @Michael Bernsteiner, it looks like the data.table
team is planning on implementing by.x
and by.y
into the merge.data.table
function, but hasn't done so yet.
根据@Michael Bernsteiner的评论进行完整性编辑,它看起来像数据。table团队正在计划实现by。x和。merge.data y。表函数,但还没有。
#2
18
As of data.table
version 1.9.6 (on CRAN on sep 2015) you can specify the by.x
and by.y
arguments in data.table::merge
的数据。表1.9.6(在2015年9月的CRAN上)您可以指定by。x和。y参数data.table:合并
merge(x=X, y=Y, by.x="id", by.y="ID")[]
# id area value price sales
#1: c001 US 100 500 20
#2: c002 UK 200 200 30
#3: c003 EU 300 400 15
However, in data.table 1.9.6 you can also specfy the on
argument in the X[Y]
notation
然而,在数据。表1.9.6你也可以用X[Y]符号考察on参数
X[Y] syntax can now join without having to set keys by using the new on argument. For example: DT1[DT2, on=c(x = "y")] would join column "y" of DT2 with "x" of DT1. DT1[DT2, on="y"] would join column "y" of both data.tables.
X[Y]语法现在可以通过使用新的on参数来连接,而不必设置键。例如:DT1[DT2, on=c(x = "y")]将DT2的“y”列与DT1的“x”列连接。DT1[DT2, on="y"]将连接两个data.tables的列"y"。
X[Y, on=c(id = "ID")]
# area id value price sales
#1: US c001 100 500 20
#2: UK c002 200 200 30
#3: EU c003 300 400 15
this answer by the data.table
author has more details
这个答案是根据数据得出的。表作者将提供更多细节
#3
3
Merge fails when you use by.x
and by.y
with data.table
. Taking your data:
使用by时合并失败。x和。与data.table y。把你的数据:
> merge(X,Y, by.x='id', by.y='ID')
Error in merge.data.table(X, Y, by.x = "id", by.y = "ID")
You can use data.table
with merge , but you need to use by
argument for joining (so rename the columns to have the same colnames
)
您可以使用数据。具有merge的表,但是需要使用by参数进行连接(因此,将列重命名为具有相同的colname)
Y = setNames(Y,c('id','price','sales'))
This will still not work:
这仍然行不通:
merge(X,Y, by.x='id', by.y='id')
Error in merge.data.table(X, Y, by.x = "id", by.y = "id") :
But this will work:
但这将工作:
> merge(X,Y, by='id')
# id area value price sales
#1: c001 US 100 500 20
#2: c002 UK 200 200 30
#3: c003 EU 300 400 15
Alternatively, you would need to convert data.table
to data.frame
in order to use merge
with by.x
and by.y
arguments:
或者,您需要转换数据。表to data.frame,用于与by合并。x和。y参数:
merge(data.frame(X), data.frame(Y), by.x='id', by.y='ID')