对于ALS,使用Spark MLlib管道不存在字段“item”

时间:2021-07-05 20:31:04

I am training a recommender system with ALS (Spark version: 1.3.1). Now I want to use a Pipeline for model selection via cross-validation. As a first step, I tried to adapt the example code and came up with this:

我正在使用ALS(Spark版本:1.3.1)培训推荐系统。现在我想通过交叉验证使用Pipeline进行模型选择。作为第一步,我尝试调整示例代码并提出了这个:

val conf = new SparkConf().setAppName("ALS").setMaster("local")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._

val ratings: RDD[org.apache.spark.mllib.recommendation.Rating] = // ...
val als = new ALS().setMaxIter(10).setRank(10).setRegParam(0.01)
val pipeline = new Pipeline().setStages(Array(als))
val model = pipeline.fit(ratings.toDF)

When I run it, the last line fails with an exception:

当我运行它时,最后一行失败并出现异常:

Exception in thread "main" java.lang.IllegalArgumentException: Field "item" does not exist.
at org.apache.spark.sql.types.StructType$$anonfun$apply$25.apply(dataTypes.scala:1032)
at org.apache.spark.sql.types.StructType$$anonfun$apply$25.apply(dataTypes.scala:1032)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
at scala.collection.AbstractMap.getOrElse(Map.scala:58)
at org.apache.spark.sql.types.StructType.apply(dataTypes.scala:1031)
at org.apache.spark.ml.recommendation.ALSParams$class.validateAndTransformSchema(ALS.scala:148)
at org.apache.spark.ml.recommendation.ALS.validateAndTransformSchema(ALS.scala:229)
at org.apache.spark.ml.recommendation.ALS.transformSchema(ALS.scala:304)
at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:142)
at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:142)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:108)
at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:142)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:58)
at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:100)
at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:79)
at org.apache.spark.ml.Estimator.fit(Estimator.scala:44)
...

I do not use the string "item" anywhere in my code, so I assume it is a default of some kind. When I add .setItemCol("itemId") to als the exception message changes accordingly.

我不在我的代码中的任何地方使用字符串“item”,所以我认为它是某种默认值。当我将.setItemCol(“itemId”)添加到als时,异常消息会相应地更改。

What is the meaning of "item"? How can I make the pipeline work?

“item”是什么意思?如何使管道工作?

1 个解决方案

#1


Okay, the solution was actually quite simple: use org.apache.spark.ml.recommendation.ALS.Rating instead of org.apache.spark.mllib.recommendation.Rating and it will just work.

好的,解决方案实际上非常简单:使用org.apache.spark.ml.recommendation.ALS.Rating而不是org.apache.spark.mllib.recommendation.Rating,它将正常工作。

Otherwise .setItemCol("product") does the trick because org.apache.spark.mllib.recommendation.Rating has a field called "product" whereas org.apache.spark.ml.recommendation.ALS.Rating calls the corresponding field "item". There must be some magic going on that, given a string, accesses some field of a case class (reflection?).

否则.setItemCol(“product”)可以解决这个问题,因为org.apache.spark.mllib.recommendation.Rating有一个名为“product”的字段,而org.apache.spark.ml.recommendation.ALS.Rating调用相应的字段“item” ”。必须有一些魔法,给定一个字符串,访问案例类的一些字段(反射?)。

#1


Okay, the solution was actually quite simple: use org.apache.spark.ml.recommendation.ALS.Rating instead of org.apache.spark.mllib.recommendation.Rating and it will just work.

好的,解决方案实际上非常简单:使用org.apache.spark.ml.recommendation.ALS.Rating而不是org.apache.spark.mllib.recommendation.Rating,它将正常工作。

Otherwise .setItemCol("product") does the trick because org.apache.spark.mllib.recommendation.Rating has a field called "product" whereas org.apache.spark.ml.recommendation.ALS.Rating calls the corresponding field "item". There must be some magic going on that, given a string, accesses some field of a case class (reflection?).

否则.setItemCol(“product”)可以解决这个问题,因为org.apache.spark.mllib.recommendation.Rating有一个名为“product”的字段,而org.apache.spark.ml.recommendation.ALS.Rating调用相应的字段“item” ”。必须有一些魔法,给定一个字符串,访问案例类的一些字段(反射?)。