如何通过数据创建python风格的字典。在R表吗?

I'm looking for a python-like dictionary structure in R to replace values in a large dataset (>100 MB) and I think data.table package can help me do this. However, I cannot find out an easy way to solve the problem.

我正在寻找R中的类python字典结构来替换大型数据集(>100 MB)中的值，我认为是数据。桌包可以帮助我做到这一点。然而，我找不到一个简单的方法来解决这个问题。

For example, I have two data.table:

例如，我有两个数据。

Table A:

表一:

   V1 V2
1:  A  B
2:  C  D
3:  C  D
4:  B  C
5:  D  A

Table B:

表2:

   V3 V4
1:  A  1
2:  B  2
3:  C  3
4:  D  4

I want to use B as a dictionary to replace the values in A. So the result I want to get is:

我想用B作为字典来替换a中的值，所以我想得到的结果是:

Table R:

表R:

What I did is:

我所做的是:

c2=tB[tA[,list(V2)],list(V4)]
c1=tB[tA[,list(V1)],list(V4)]

Although I specified j=list(V4), it still returned me with the values of V3. I don't know why.

虽然我指定了j=list(V4)，但它仍然返回了V3的值。我不知道为什么。

c2:

   V3 V4
1:  B  2
2:  D  4
3:  D  4
4:  C  3
5:  A  1

c1:

   V3 V4
1:  A  1
2:  C  3
3:  C  3
4:  B  2
5:  D  4

Finally, I combined the two V4 columns and got the result I want.

最后，我结合了两个V4列并得到了我想要的结果。

But I think there should be a much easier way to do this. Any ideas?

但是我认为应该有一个更简单的方法。什么好主意吗?

2 个解决方案

#1

Here's an alternative way:

这里的一个替代方法:

setkey(B, V3)
for (i in seq_len(length(A))) {
    thisA = A[[i]]
    set(A, j=i, value=B[thisA]$V4)
}
#    V1 V2
# 1:  1  2
# 2:  3  4
# 3:  3  4
# 4:  2  3
# 5:  4  1

Since thisA is character column, we don't need the J() (for convenience). Here, A's columns are replaced by reference, and is therefore also memory efficient. But if you don't want to replace A, then you can just use cA <- copy(A) and replace cA's columns.

由于thisA是字符列，我们不需要J()(为了方便)。在这里，A的列被引用所取代，因此也具有内存效率。但是如果您不想替换A，那么您可以使用cA <- copy(A)并替换cA的列。

Alternatively, using :=:

另外,使用:=:

A[, names(A) := lapply(.SD, function(x) B[J(x)]$V4)]
# or
ans = copy(A)[, names(A) := lapply(.SD, function(x) B[J(x)]$V4)]

(Following user2923419's comment): You can drop the J() if the lookup is a single column of type character (just for convenience).

(根据user2923419的注释):如果查找是字符类型的单个列(只是为了方便)，可以删除J()。

In 1.9.3, when j is a single column, it returns a vector (based on user request). So, it's a bit more natural data.table syntax:

在1.9.3中，当j是单个列时，它返回一个向量(基于用户请求)。这是更自然的数据。表的语法:

setkey(B, V3)
for (i in seq_len(length(A))) {
    thisA = A[[i]]
    set(A, j=i, value=B[thisA, V4])
}

#2

I am not sure how fast this is with big data, but chmatch is supposed to be fast.

我不确定大数据的速度有多快，但chmatch应该很快。

tA[ , lapply(.SD,function(x) tB$V4[chmatch(x,tB$V3)])]

   V1 V2
1:  1  2
2:  3  4
3:  3  4
4:  2  3
5:  4  1

#1