在RDD [LabeledPoint] Spark / Scala中查找最小,最大标签

时间:2022-02-01 23:13:03

I have an RDD[LabeledPoint] and I want to find the min and the max of the labels and also apply some transformations, such as subtracting from all of them the number 5. The problem is I have tried various ways to get to the labels, but nothing works correctly.

我有一个RDD [LabeledPoint],我想找到标签的最小值和最大值,并且还应用一些转换,例如从所有转换中减去数字5.问题是我已经尝试了各种方法来获取标签,但没有任何正常工作。

How can I access only the labels and only the features of the RDD? Is there a way to get them as a List[Double] and List[Vector] for example?

如何仅访问标签和仅RDD的功能?有没有办法让它们作为List [Double]和List [Vector]例如?

I cannot go to dataframes.

我不能去数据帧。

2 个解决方案

#1


0  

You can create DataFrames from an existing RDD with a SparkSession.For DataFrame you can operate it anyway.

您可以使用SparkSession.For DataFrame从现有RDD创建DataFrame,无论如何都可以操作它。

#2


0  

Ok, so after playing around with the map function, i came up with this solution

好的,所以在玩了地图功能之后,我想出了这个解决方案

val labels = rdd.map(x=> x.label)
val min = labels.min
val max = labels.max

If you want to make changes to the labels, once again you can use the map function

如果要更改标签,可以再次使用地图功能

rdd.map(x=> x.label - 5)

This way you can play around with the label part of a RDD[LabeledPoint].

这样您就可以使用RDD [LabeledPoint]的标签部分。

After the comments of Cyril below, I decided to also add the command that lets you keep your RDD and change only the label however you want.

在下面的Cyril评论之后,我决定添加一个命令,让你保留你的RDD并只改变你想要的标签。

val newRdd = rdd.map(x => x.copy(x.label -5))

#1


0  

You can create DataFrames from an existing RDD with a SparkSession.For DataFrame you can operate it anyway.

您可以使用SparkSession.For DataFrame从现有RDD创建DataFrame,无论如何都可以操作它。

#2


0  

Ok, so after playing around with the map function, i came up with this solution

好的,所以在玩了地图功能之后,我想出了这个解决方案

val labels = rdd.map(x=> x.label)
val min = labels.min
val max = labels.max

If you want to make changes to the labels, once again you can use the map function

如果要更改标签,可以再次使用地图功能

rdd.map(x=> x.label - 5)

This way you can play around with the label part of a RDD[LabeledPoint].

这样您就可以使用RDD [LabeledPoint]的标签部分。

After the comments of Cyril below, I decided to also add the command that lets you keep your RDD and change only the label however you want.

在下面的Cyril评论之后,我决定添加一个命令,让你保留你的RDD并只改变你想要的标签。

val newRdd = rdd.map(x => x.copy(x.label -5))