I'd really like to convert my org.apache.spark.mllib.linalg.Matrix to org.apache.spark.mllib.linalg.distributed.RowMatrix
我真的想将我的org.apache.spark.mllib.linalg.Matrix转换为org.apache.spark.mllib.linalg.distributed.RowMatrix
I can do it as such:
我可以这样做:
val xx = X.computeGramianMatrix() //xx is type org.apache.spark.mllib.linalg.Matrix
val xxs = xx.toString()
val xxr = xxs.split("\n").map(row => row.replace(" "," ").replace(" "," ").replace(" "," ").replace(" "," ").replace(" ",",").split(","))
val xxp = sc.parallelize(xxr)
val xxd = xxp.map(ar => Vectors.dense(ar.map(elm => elm.toDouble)))
val xxrm: RowMatrix = new RowMatrix(xxd)
However, that is really gross and a total hack. Can someone show me a better way?
然而,这真的很糟糕,而且完全是黑客攻击。有人能告诉我一个更好的方法吗?
Note I am using Spark version 1.3.0
注意我使用的是Spark 1.3.0版
2 个解决方案
#1
I suggest that you convert your Matrix to an RDD[Vector] which you can automatically convert to a RowMatrix.
我建议您将Matrix转换为RDD [Vector],您可以自动将其转换为RowMatrix。
Let's consider the following example :
让我们考虑以下示例:
import org.apache.spark.rdd._
import org.apache.spark.mllib.linalg._
val denseData = Seq(
Vectors.dense(0.0, 1.0, 2.0),
Vectors.dense(3.0, 4.0, 5.0),
Vectors.dense(6.0, 7.0, 8.0),
Vectors.dense(9.0, 0.0, 1.0)
)
val dm: Matrix = Matrices.dense(3, 2, Array(1.0, 3.0, 5.0, 2.0, 4.0, 6.0))
You'll need to define a method to convert your Matrix to an RDD[Vector]
您需要定义一个方法将Matrix转换为RDD [Vector]
def matrixToRDD(m: Matrix): RDD[Vector] = {
val columns = m.toArray.grouped(m.numRows)
val rows = columns.toSeq.transpose // Skip this if you want a column-major RDD.
val vectors = rows.map(row => new DenseVector(row.toArray))
sc.parallelize(vectors)
}
and now you can apply the conversion on your Matrix :
现在您可以在Matrix上应用转换:
import org.apache.spark.mllib.linalg.distributed.RowMatrix
val rows = matrixToRDD(dm)
val mat = new RowMatrix(rows)
I hope that this can help!
我希望这可以帮助你!
#2
small correction in above code: we need to use Vectors.dense instead of new DenseVector
上面代码中的小修正:我们需要使用Vectors.dense而不是新的DenseVector
val vectors = rows.map(row => Vectors.dense(row.toArray))
#1
I suggest that you convert your Matrix to an RDD[Vector] which you can automatically convert to a RowMatrix.
我建议您将Matrix转换为RDD [Vector],您可以自动将其转换为RowMatrix。
Let's consider the following example :
让我们考虑以下示例:
import org.apache.spark.rdd._
import org.apache.spark.mllib.linalg._
val denseData = Seq(
Vectors.dense(0.0, 1.0, 2.0),
Vectors.dense(3.0, 4.0, 5.0),
Vectors.dense(6.0, 7.0, 8.0),
Vectors.dense(9.0, 0.0, 1.0)
)
val dm: Matrix = Matrices.dense(3, 2, Array(1.0, 3.0, 5.0, 2.0, 4.0, 6.0))
You'll need to define a method to convert your Matrix to an RDD[Vector]
您需要定义一个方法将Matrix转换为RDD [Vector]
def matrixToRDD(m: Matrix): RDD[Vector] = {
val columns = m.toArray.grouped(m.numRows)
val rows = columns.toSeq.transpose // Skip this if you want a column-major RDD.
val vectors = rows.map(row => new DenseVector(row.toArray))
sc.parallelize(vectors)
}
and now you can apply the conversion on your Matrix :
现在您可以在Matrix上应用转换:
import org.apache.spark.mllib.linalg.distributed.RowMatrix
val rows = matrixToRDD(dm)
val mat = new RowMatrix(rows)
I hope that this can help!
我希望这可以帮助你!
#2
small correction in above code: we need to use Vectors.dense instead of new DenseVector
上面代码中的小修正:我们需要使用Vectors.dense而不是新的DenseVector
val vectors = rows.map(row => Vectors.dense(row.toArray))