I have an RDD[LabeledPoint] and I want to find the min and the max of the labels and also apply some transformations, such as subtracting from all of them the number 5. The problem is I have tried various ways to get to the labels, but nothing works correctly.
我有一个RDD [LabeledPoint],我想找到标签的最小值和最大值,并且还应用一些转换,例如从所有转换中减去数字5.问题是我已经尝试了各种方法来获取标签,但没有任何正常工作。
How can I access only the labels and only the features of the RDD? Is there a way to get them as a List[Double] and List[Vector] for example?
如何仅访问标签和仅RDD的功能?有没有办法让它们作为List [Double]和List [Vector]例如?
I cannot go to dataframes.
我不能去数据帧。
2 个解决方案
#1
0
You can create DataFrames from an existing RDD with a SparkSession.For DataFrame you can operate it anyway.
您可以使用SparkSession.For DataFrame从现有RDD创建DataFrame,无论如何都可以操作它。
#2
0
Ok, so after playing around with the map function, i came up with this solution
好的,所以在玩了地图功能之后,我想出了这个解决方案
val labels = rdd.map(x=> x.label)
val min = labels.min
val max = labels.max
If you want to make changes to the labels, once again you can use the map function
如果要更改标签,可以再次使用地图功能
rdd.map(x=> x.label - 5)
This way you can play around with the label part of a RDD[LabeledPoint].
这样您就可以使用RDD [LabeledPoint]的标签部分。
After the comments of Cyril below, I decided to also add the command that lets you keep your RDD and change only the label however you want.
在下面的Cyril评论之后,我决定添加一个命令,让你保留你的RDD并只改变你想要的标签。
val newRdd = rdd.map(x => x.copy(x.label -5))
#1
0
You can create DataFrames from an existing RDD with a SparkSession.For DataFrame you can operate it anyway.
您可以使用SparkSession.For DataFrame从现有RDD创建DataFrame,无论如何都可以操作它。
#2
0
Ok, so after playing around with the map function, i came up with this solution
好的,所以在玩了地图功能之后,我想出了这个解决方案
val labels = rdd.map(x=> x.label)
val min = labels.min
val max = labels.max
If you want to make changes to the labels, once again you can use the map function
如果要更改标签,可以再次使用地图功能
rdd.map(x=> x.label - 5)
This way you can play around with the label part of a RDD[LabeledPoint].
这样您就可以使用RDD [LabeledPoint]的标签部分。
After the comments of Cyril below, I decided to also add the command that lets you keep your RDD and change only the label however you want.
在下面的Cyril评论之后,我决定添加一个命令,让你保留你的RDD并只改变你想要的标签。
val newRdd = rdd.map(x => x.copy(x.label -5))