I have two large vectors:
我有两个大向量:
A: https://dl.dropbox.com/u/22681355/A.csv
B: https://dl.dropbox.com/u/22681355/B.csv
A has over 20000 entries but only 1350 unique entries. B is a random number generated from 1 to 9 exactly 1350 times
A有超过20000个条目,但只有1350个唯一条目。 B是从1到9生成1350倍的随机数
I would like to assign values from B to A such that the same values in A get the same values in B. e.g. if there are multiple 1's each 1 should get the same number from B.
我想将B中的值分配给A,使得A中的相同值在B中获得相同的值,例如:如果有多个1,则每个1应从B获得相同的数字。
I have been using the A[B] command but after the 18000th entry I get NAs
我一直在使用A [B]命令但是在第18000次输入后我得到了NA
What is the proper way of doing this?
这样做的正确方法是什么?
code:
码:
A<-read.csv("A.csv")
B<-read.csv("B.csv")
A[B]
1 个解决方案
#1
1
-
read.csv()
creates a data frame, not a vector. - read.csv()创建一个数据框,而不是矢量。
- You probably mean
B[A]
which for each element in A gets the value of B at the index of that element's value. Since A's values range from 1 to 1899 it exceeds B's size of 1349. For those elements outside the bounds of B, NAs get introduced. - 你可能意味着B [A],对于A中的每个元素,它在该元素的值的索引处获得B的值。由于A的值范围从1到1899,它超过了B的大小1349.对于那些超出B界限的元素,引入了NAs。
The correct way to doing what you want to achieve is
做你想做的事的正确方法是
A = read.table("http://dl.dropbox.com/u/22681355/A.csv")
B = read.table("http://dl.dropbox.com/u/22681355/B.csv")
A = A$V1
B = B$V1
A = as.factor(A)
B[match(A,levels(A))]
match(A,levels(A))
will return a vector of the same length as A that for each element contains the position of the element of A in its factor's levels, i.e. a number between 1 and 1350 (1350 distinct values). If A was as.factor(c(1,1,3,5,5,7))
, levels(A)
would be c(1,3,5,7)
and match(A,levels(A))
would be c(1,1,2,3,3,4)
, i.e. the position of the element in it's levels.
match(A,levels(A))将返回与A相同长度的向量,其中每个元素包含A元素在其因子级别中的位置,即1到1350之间的数字(1350个不同的值)。如果A是as.factor(c(1,1,3,5,5,7)),等级(A)将是c(1,3,5,7)并且匹配(A,等级(A))将是c(1,1,2,3,3,4),即元素在其中的位置。
#1
1
-
read.csv()
creates a data frame, not a vector. - read.csv()创建一个数据框,而不是矢量。
- You probably mean
B[A]
which for each element in A gets the value of B at the index of that element's value. Since A's values range from 1 to 1899 it exceeds B's size of 1349. For those elements outside the bounds of B, NAs get introduced. - 你可能意味着B [A],对于A中的每个元素,它在该元素的值的索引处获得B的值。由于A的值范围从1到1899,它超过了B的大小1349.对于那些超出B界限的元素,引入了NAs。
The correct way to doing what you want to achieve is
做你想做的事的正确方法是
A = read.table("http://dl.dropbox.com/u/22681355/A.csv")
B = read.table("http://dl.dropbox.com/u/22681355/B.csv")
A = A$V1
B = B$V1
A = as.factor(A)
B[match(A,levels(A))]
match(A,levels(A))
will return a vector of the same length as A that for each element contains the position of the element of A in its factor's levels, i.e. a number between 1 and 1350 (1350 distinct values). If A was as.factor(c(1,1,3,5,5,7))
, levels(A)
would be c(1,3,5,7)
and match(A,levels(A))
would be c(1,1,2,3,3,4)
, i.e. the position of the element in it's levels.
match(A,levels(A))将返回与A相同长度的向量,其中每个元素包含A元素在其因子级别中的位置,即1到1350之间的数字(1350个不同的值)。如果A是as.factor(c(1,1,3,5,5,7)),等级(A)将是c(1,3,5,7)并且匹配(A,等级(A))将是c(1,1,2,3,3,4),即元素在其中的位置。