I've an RDD lets say dataRdd
with fields like timestamp
,url
, ...
我有一个RDD让我们说dataRdd与时间戳,网址,...等字段
I want to create a new RDD with few fields from this dataRdd
.
我想创建一个新的RDD,其中包含来自此dataRdd的少量字段。
Following code segment creates the new RDD, where timestamp
and URL
are considered values and not field/column names:
以下代码段创建新的RDD,其中时间戳和URL被视为值而不是字段/列名称:
var fewfieldsRDD= dataRdd.map(r=> ( "timestamp" -> r.timestamp , "URL" -> r.url))
However, with below code segment, one
, two
, three
, arrival
, and SFO
are considered as column names.:
但是,对于以下代码段,一,二,三,到达和SFO被视为列名:
val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")
val numairRdd= sc.makeRDD(Seq(numbers, airports))
Can anyone tell me what am I doing wrong and how can I create a new Rdd with field names mapped to values from another Rdd?
任何人都可以告诉我我做错了什么,如何创建一个新的Rdd,其字段名称映射到另一个Rdd的值?
1 个解决方案
#1
0
You are creating an RDD of tuples, not Map
objects. Try:
您正在创建元组的RDD,而不是Map对象。尝试:
var fewfieldsRDD= dataRdd.map(r=> Map( "timestamp" -> r.timestamp , "URL" -> r.url))
#1
0
You are creating an RDD of tuples, not Map
objects. Try:
您正在创建元组的RDD,而不是Map对象。尝试:
var fewfieldsRDD= dataRdd.map(r=> Map( "timestamp" -> r.timestamp , "URL" -> r.url))