Spark Streaming:Rdd.Count()没有返回有效数字

时间:2020-12-25 20:48:37

In my application I have two JavaDStreams which contain some data. I am attempting to count the number of rows in each JavaDStream however the result I am receiving in the log isn't a number but rather a completely different object that its outputting to the log. What am I doing wrong here?

在我的应用程序中,我有两个包含一些数据的JavaDStream。我试图计算每个JavaDStream中的行数,但是我在日志中收到的结果不是数字,而是输出到日志的完全不同的对象。我在这做错了什么?

Code:

      //map score result set to tweets
    JavaDStream<Tuple5<Long, String, Float, Float, String>> result =
            scoredTweets.map(new ScoreTweetsFunction());

    //get extra elements
    JavaDStream<Tuple7<Long, String, String, String, String, String, String>> extra_elements =
            json.map(new GetExtraElements());

     //join elements with score result
    System.out.println("Number of Rows in extra elements RDD: " + extra_elements.count());
    System.out.println("Number of Rows in result RDD: " + result.count());

Output from Log:

日志输出:

Number of Rows in extra elements RDD: org.apache.spark.streaming.api.java.JavaDStream@73358a55
Number of Rows in result RDD: org.apache.spark.streaming.api.java.JavaDStream@242aa3b2

1 个解决方案

#1


3  

DStream is not a RDD but a continuous and potentially infinite sequence of RDDs. Because of that it cannot be counted and it is not how count method is intended to work.

DStream不是RDD,而是连续且可能无限的RDD序列。因为它不能被计算,并且计数方法不是如何工作的。

Instead it transforms existing stream into another stream where each RDD

相反,它将现有流转换为每个RDD的另一个流

has a single element generated by counting each RDD of this DStream

通过计算此DStream的每个RDD生成单个元素

If you want to perform some action on individual RDDs you should use foreachRDD.

如果要对单个RDD执行某些操作,则应使用foreachRDD。

#1


3  

DStream is not a RDD but a continuous and potentially infinite sequence of RDDs. Because of that it cannot be counted and it is not how count method is intended to work.

DStream不是RDD,而是连续且可能无限的RDD序列。因为它不能被计算,并且计数方法不是如何工作的。

Instead it transforms existing stream into another stream where each RDD

相反,它将现有流转换为每个RDD的另一个流

has a single element generated by counting each RDD of this DStream

通过计算此DStream的每个RDD生成单个元素

If you want to perform some action on individual RDDs you should use foreachRDD.

如果要对单个RDD执行某些操作,则应使用foreachRDD。