I have a dataset with two columns, x$x0 and x$x1 and below are the values in this dataset x, there are more than 1234876 observations in the datasets because of many duplicate values.
我有一个包含两列的数据集,x $ x0和x $ x1及以下是此数据集x中的值,由于许多重复值,数据集中有超过1234876个观察值。
x0 x1
----------------
0 1
0 2
1 0
1 3
2 1
2 3
. .
. .
. .
1234876 1230000
I want to create a matrix using the unique values in column1 (x$x0) and unique values in column2 (x$x1). The values in x$x0 will the row names and values in x$x1 will be the column names.
我想使用column1(x $ x0)中的唯一值和column2(x $ x1)中的唯一值创建矩阵。 x $ x0中的值将是x $ x1中的行名称和值将是列名称。
Then assign a value 1 to the cells where relation exits between x$x0 and x$x1 , the final results should look something like this.....
然后将值1分配给x $ x0和x $ x1之间关系出现的单元格,最终结果看起来像这样......
| 0 1 2 3 .......1230000
--------------------------------
0 | 1 1 |
1 | 1 1 |
2 | 1 1 |
3 | |
. | |
. | |
. | |
1234876 | |
--------------------------------
Hope this makes sense :(, any advise on how to do this will be very helpful.
希望这是有道理的:(任何关于如何做到这一点的建议将非常有帮助。
1 个解决方案
#1
1
It's a little hard to tell what you are asking, but does this work? It should create a data frame with x0
values as rows and x1
values as columns. All the observations become NAs but you could put other things in there.
告诉你的问题有点难,但这有用吗?它应该创建一个数据框,其中x0值为行,x1值为列。所有观察都成为了NA,但你可以把其他东西放在那里。
Edit: I've updated this based on your changes and using your dput
output. This now creates a matrix whose row names correspond to X0
and whose colnames correspond to X1
.
编辑:我已根据您的更改和使用您的输出输出更新了这个。现在,这将创建一个矩阵,其行名称对应于X0,其名称对应于X1。
df <- structure(list(X0 = c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 4L),
X1 = c(2L, 3L, 4L, 5L, 0L, 2L, 4L, 5L, 15L, 0L, 11L, 12L,
13L, 14L, 63L, 64L, 65L, 66L, 67L, 7L)),
.Names = c("X0", "X1"), row.names = c(NA, 20L),
class = "data.frame")
library('reshape2')
df_new <- dcast(df, X0 ~ X1, function(x) ifelse(length(x) >= 1, 1, 0))
rownames(df_new) <- df_new$X0
as.matrix(df_new[-1])
# 0 2 3 4 5 7 11 12 13 14 15 63 64 65 66 67
# 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
# 1 1 1 0 1 1 0 0 0 0 0 1 0 0 0 0 0
# 2 1 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
# 3 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
# 4 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
#1
1
It's a little hard to tell what you are asking, but does this work? It should create a data frame with x0
values as rows and x1
values as columns. All the observations become NAs but you could put other things in there.
告诉你的问题有点难,但这有用吗?它应该创建一个数据框,其中x0值为行,x1值为列。所有观察都成为了NA,但你可以把其他东西放在那里。
Edit: I've updated this based on your changes and using your dput
output. This now creates a matrix whose row names correspond to X0
and whose colnames correspond to X1
.
编辑:我已根据您的更改和使用您的输出输出更新了这个。现在,这将创建一个矩阵,其行名称对应于X0,其名称对应于X1。
df <- structure(list(X0 = c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 4L),
X1 = c(2L, 3L, 4L, 5L, 0L, 2L, 4L, 5L, 15L, 0L, 11L, 12L,
13L, 14L, 63L, 64L, 65L, 66L, 67L, 7L)),
.Names = c("X0", "X1"), row.names = c(NA, 20L),
class = "data.frame")
library('reshape2')
df_new <- dcast(df, X0 ~ X1, function(x) ifelse(length(x) >= 1, 1, 0))
rownames(df_new) <- df_new$X0
as.matrix(df_new[-1])
# 0 2 3 4 5 7 11 12 13 14 15 63 64 65 66 67
# 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
# 1 1 1 0 1 1 0 0 0 0 0 1 0 0 0 0 0
# 2 1 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
# 3 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
# 4 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0