将稀疏矩阵从Python传输到R

I am doing some text analysis work in Python. Unfortunately, I need to switch to R in order to use a particular package (unfortunately, the package cannot be replicated in Python easily).

我正在用Python做一些文本分析工作。不幸的是，我需要切换到R才能使用特定的软件包（遗憾的是，软件包无法轻松地在Python中复制）。

Currently the text is parsed into bigram counts, reduced to a vocabulary of about 11,000 bigrams, and then stored as a dictionary:

目前，文本被解析为二元组计数，缩减为大约11,000个双字母组的词汇，然后存储为字典：

{id1: {'bigrams':[(bigram1, count), (bigram2, count), ...]},
id2: {'bigrams': ...}

I need to get this into a dgCMatrix in R, where the rows are id1, id2, ... and the columns are the different bigrams such that a cell represents the 'count' for that id-bigram.

我需要把它放到R中的dgCMatrix中，其中行是id1，id2，......并且列是不同的双字母组合，这样一个单元格表示该id-bigram的“计数”。

Any suggestions? I thought about expanding it just to a massive CSV, but that seems super inefficient plus probably infeasible due to memory constraints.

有什么建议么？我想将它扩展到一个巨大的CSV，但这似乎超级低效加上可能由于内存限制而不可行。

1 个解决方案

#1

Could you could write out the matrix in MatrixMarket format using scipy mmwrite and then read it into R using readMM from the Matrix package?

您能否使用scipy mmwrite以MatrixMarket格式写出矩阵，然后使用Matrix包中的readMM将其读入R？

#1

Could you could write out the matrix in MatrixMarket format using scipy mmwrite and then read it into R using readMM from the Matrix package?

您能否使用scipy mmwrite以MatrixMarket格式写出矩阵，然后使用Matrix包中的readMM将其读入R？

秒客网

将稀疏矩阵从Python传输到R

1 个解决方案

#1

#1

相关文章