向/从RDD添加/选择字段

时间:2020-12-06 23:11:25

I've an RDD lets say dataRdd with fields like timestamp ,url, ...

我有一个RDD让我们说dataRdd与时间戳,网址,...等字段

I want to create a new RDD with few fields from this dataRdd.

我想创建一个新的RDD,其中包含来自此dataRdd的少量字段。

Following code segment creates the new RDD, where timestamp and URL are considered values and not field/column names:

以下代码段创建新的RDD,其中时间戳和URL被视为值而不是字段/列名称:

var fewfieldsRDD= dataRdd.map(r=> ( "timestamp" -> r.timestamp , "URL" ->   r.url))

However, with below code segment, one, two, three, arrival, and SFO are considered as column names.:

但是,对于以下代码段,一,二,三,到达和SFO被视为列名:

val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")
val numairRdd= sc.makeRDD(Seq(numbers, airports))

Can anyone tell me what am I doing wrong and how can I create a new Rdd with field names mapped to values from another Rdd?

任何人都可以告诉我我做错了什么,如何创建一个新的Rdd,其字段名称映射到另一个Rdd的值?

1 个解决方案

#1


0  

You are creating an RDD of tuples, not Map objects. Try:

您正在创建元组的RDD,而不是Map对象。尝试:

var fewfieldsRDD= dataRdd.map(r=> Map( "timestamp" -> r.timestamp , "URL" ->   r.url))

#1


0  

You are creating an RDD of tuples, not Map objects. Try:

您正在创建元组的RDD,而不是Map对象。尝试:

var fewfieldsRDD= dataRdd.map(r=> Map( "timestamp" -> r.timestamp , "URL" ->   r.url))