r -在行上绑定不同大小的稀疏矩阵

时间:2022-06-03 12:00:13

I am attempting to use the Matrix package to bind two sparse matrices of different size together. The binding is on rows, using the column names for matching.

我尝试使用矩阵包来绑定两个不同大小的稀疏矩阵。绑定在行上,使用列名进行匹配。

Table A:

表一:

ID     | AAAA   | BBBB   |
------ | ------ | ------ |
XXXX   | 1      | 2      |

Table B:

表2:

ID     | BBBB   | CCCC   |
------ | ------ | ------ |
YYYY   | 3      | 4      |

Binding table A and B:

装订表A和B:

ID     | AAAA   | BBBB   | CCCC   |
------ | ------ | ------ | ------ |
XXXX   | 1      | 2      |        |
YYYY   |        | 3      | 4      |

The intention is to insert a large number of small matrices into a single large matrix, to enable continuous querying and update/inserts.

其目的是将大量的小矩阵插入到单个大矩阵中,以支持连续查询和更新/插入。

I find that neither the Matrix or slam packages have functionality to handle this.

我发现矩阵和slam包都没有处理这个的功能。

Similar questions have been asked in the past, but it seems no solution has been found:

过去也有人问过类似的问题,但似乎没有找到解决办法:

Post 1: in-r-when-using-named-rows-can-a-sparse-matrix-column-be-added-concatenated

职位1:in-r-when-using-named-rows-can-a-sparse-matrix-column-be-added-concatenated

Post 2: bind-together-sparse-model-matrices-by-row-names

职位2:bind-together-sparse-model-matrices-by-row-names

Ideas on how to solve it will be highly appreciated.

关于如何解决它的想法将受到高度赞赏。

Best regards,

最好的问候,

Frederik

弗雷德里克

3 个解决方案

#1


2  

It looks it's necessary to have empty columns (columns with 0s) added to the matrices so to make them compatible for a rbind (matrices with the same column names, and on the same order). The following code does it:

看起来有必要将空列(带0的列)添加到矩阵中,以便使它们与rbind(具有相同列名和相同顺序的矩阵)兼容。下面的代码可以做到:

# dummy data
set.seed(3344)
A = Matrix(matrix(rbinom(16, 2, 0.2), 4))
colnames(A)=letters[1:4]
B = Matrix(matrix(rbinom(9, 2, 0.2), 3))
colnames(B) = letters[3:5]

# finding what's missing
misA = colnames(B)[!colnames(B) %in% colnames(A)]
misB = colnames(A)[!colnames(A) %in% colnames(B)]

misAl = as.vector(numeric(length(misA)), "list")
names(misAl) = misA
misBl = as.vector(numeric(length(misB)), "list")
names(misBl) = misB

## adding missing columns to initial matrices
An = do.call(cbind, c(A, misAl))
Bn = do.call(cbind, c(B, misBl))[,colnames(An)]

# final bind
rbind(An, Bn)

#2


0  

We can create an empty sparse Matrix that has all the rows and columns, then insert the values into it using subset assignment:

我们可以创建一个具有所有行和列的空稀疏矩阵,然后使用子集赋值将值插入其中:

my.bind = function(A, B){
  C = Matrix(0, nrow = NROW(A) + NROW(B), ncol = length(union(colnames(A), colnames(B))), 
             dimnames = list(c(rownames(A), rownames(B)), union(colnames(A), colnames(B))))
  C[rownames(A), colnames(A)] = A
  C[rownames(B), colnames(B)] = B
  return(C)
}

my.bind(A,B)
# 2 x 3 sparse Matrix of class "dgCMatrix"
#      AAAA BBBB CCCC
# XXXX    1    2    .
# YYYY    .    3    4

Note that the above assumes that the A and B do not share row names. If there are shared row names, then you should use row numbers instead of names for the assignment.

请注意,上面假定A和B不共享行名称。如果有共享的行名,那么应该使用行号而不是赋值名。

The data:

数据:

library(Matrix)
A = Matrix(c(1,2), 1, dimnames = list('XXXX', c('AAAA','BBBB')))
B = Matrix(c(3,4), 1, dimnames = list('YYYY', c('BBBB','CCCC')))

#3


0  

If one needs to combine/concatenate many small sparse matrices into one large sparse matrix, it's much better and more efficient to use a mapping of global and local row and column indices to construct a large sparse matrix. E.g.,

如果需要将许多小的稀疏矩阵合并/连接到一个大的稀疏矩阵中,那么使用全局和局部行和列索引的映射构造一个大的稀疏矩阵会更好、更有效。例如,

globalInds <- matrix(NA, nrow=dim(localPairRowColInds)[1], 2)

# extract the corresponding global row indices for the local row indices
globalInds[ , 1] <- globalRowInds[ localPairRowColInds[,1] ] 
globalInds[ , 2] <- globalColInds[ localPairRowColInds[,2] ]

write.table(cbind(globalInds, localPairVals), file=dataFname, append = T, sep = " ", row.names = F, col.names = F)

#1


2  

It looks it's necessary to have empty columns (columns with 0s) added to the matrices so to make them compatible for a rbind (matrices with the same column names, and on the same order). The following code does it:

看起来有必要将空列(带0的列)添加到矩阵中,以便使它们与rbind(具有相同列名和相同顺序的矩阵)兼容。下面的代码可以做到:

# dummy data
set.seed(3344)
A = Matrix(matrix(rbinom(16, 2, 0.2), 4))
colnames(A)=letters[1:4]
B = Matrix(matrix(rbinom(9, 2, 0.2), 3))
colnames(B) = letters[3:5]

# finding what's missing
misA = colnames(B)[!colnames(B) %in% colnames(A)]
misB = colnames(A)[!colnames(A) %in% colnames(B)]

misAl = as.vector(numeric(length(misA)), "list")
names(misAl) = misA
misBl = as.vector(numeric(length(misB)), "list")
names(misBl) = misB

## adding missing columns to initial matrices
An = do.call(cbind, c(A, misAl))
Bn = do.call(cbind, c(B, misBl))[,colnames(An)]

# final bind
rbind(An, Bn)

#2


0  

We can create an empty sparse Matrix that has all the rows and columns, then insert the values into it using subset assignment:

我们可以创建一个具有所有行和列的空稀疏矩阵,然后使用子集赋值将值插入其中:

my.bind = function(A, B){
  C = Matrix(0, nrow = NROW(A) + NROW(B), ncol = length(union(colnames(A), colnames(B))), 
             dimnames = list(c(rownames(A), rownames(B)), union(colnames(A), colnames(B))))
  C[rownames(A), colnames(A)] = A
  C[rownames(B), colnames(B)] = B
  return(C)
}

my.bind(A,B)
# 2 x 3 sparse Matrix of class "dgCMatrix"
#      AAAA BBBB CCCC
# XXXX    1    2    .
# YYYY    .    3    4

Note that the above assumes that the A and B do not share row names. If there are shared row names, then you should use row numbers instead of names for the assignment.

请注意,上面假定A和B不共享行名称。如果有共享的行名,那么应该使用行号而不是赋值名。

The data:

数据:

library(Matrix)
A = Matrix(c(1,2), 1, dimnames = list('XXXX', c('AAAA','BBBB')))
B = Matrix(c(3,4), 1, dimnames = list('YYYY', c('BBBB','CCCC')))

#3


0  

If one needs to combine/concatenate many small sparse matrices into one large sparse matrix, it's much better and more efficient to use a mapping of global and local row and column indices to construct a large sparse matrix. E.g.,

如果需要将许多小的稀疏矩阵合并/连接到一个大的稀疏矩阵中,那么使用全局和局部行和列索引的映射构造一个大的稀疏矩阵会更好、更有效。例如,

globalInds <- matrix(NA, nrow=dim(localPairRowColInds)[1], 2)

# extract the corresponding global row indices for the local row indices
globalInds[ , 1] <- globalRowInds[ localPairRowColInds[,1] ] 
globalInds[ , 2] <- globalColInds[ localPairRowColInds[,2] ]

write.table(cbind(globalInds, localPairVals), file=dataFname, append = T, sep = " ", row.names = F, col.names = F)