如何保存Spark Java Dstream RDD

时间:2022-07-31 20:51:53

Spark Scala API DStream provides a method saveAsTextFiles to store the Dstream RDD on HDFS.

Spark Scala API DStream提供了一个saveAsTextFiles方法,用于在HDFS上存储Dstream RDD。

But corresponding method is not available in Spark Java API's DStream

但是Spark Java API的DStream中没有相应的方法

How to store DStream RDD in HDFS using Spark Java API?

如何使用Spark Java API在DFS中存储DStream RDD?

5 个解决方案

#1


Time parameter can be used to prefix/postfix the actual path.

时间参数可用于前缀/后缀实际路径。

myrdd.foreachRDD(new Function2<JavaPairRDD<Integer, String>, Time, Void>() {
    public Void call(JavaPairRDD<Integer, String> rdd) {
                        rdd.saveAsTextFile(path + "-" + time.toString().split(" ")[0]);
                        return null;
                    }
                });

#2


Using Dstream's foreach mathod, you can first get all RDDs from Dstream and then save these rdds using saveAsTextFile method.

使用Dstream的foreach mathod,您可以先从Dstream获取所有RDD,然后使用saveAsTextFile方法保存这些rdds。

Here is the sample code

这是示例代码

sortedCounts.foreach(new Function<JavaPairRDD<Integer, String>, Void>() {
                    public Void call(JavaPairRDD<Integer, String> rdd) {
                        rdd.saveAsTextFile(path);
                        return null;
                    }
                });

#3


Try using dstream() method to convert JavaDStream to DStream. For example..

尝试使用dstream()方法将JavaDStream转换为DStream。例如..

lines.dstream().saveAsObjectFiles("pre", "suf")

#4


If the JavaDStream object is dstream and the directory path is path you can save it as

如果JavaDStream对象是dstream并且目录路径是路径,则可以将其保存为

 dstream.foreachRDD(rdd -> {
                rdd.saveAsTextFile(path);
            });

#5


use forEachRDD API of JavaDStream class.

使用JavaDStream类的forERDRD API。

#1


Time parameter can be used to prefix/postfix the actual path.

时间参数可用于前缀/后缀实际路径。

myrdd.foreachRDD(new Function2<JavaPairRDD<Integer, String>, Time, Void>() {
    public Void call(JavaPairRDD<Integer, String> rdd) {
                        rdd.saveAsTextFile(path + "-" + time.toString().split(" ")[0]);
                        return null;
                    }
                });

#2


Using Dstream's foreach mathod, you can first get all RDDs from Dstream and then save these rdds using saveAsTextFile method.

使用Dstream的foreach mathod,您可以先从Dstream获取所有RDD,然后使用saveAsTextFile方法保存这些rdds。

Here is the sample code

这是示例代码

sortedCounts.foreach(new Function<JavaPairRDD<Integer, String>, Void>() {
                    public Void call(JavaPairRDD<Integer, String> rdd) {
                        rdd.saveAsTextFile(path);
                        return null;
                    }
                });

#3


Try using dstream() method to convert JavaDStream to DStream. For example..

尝试使用dstream()方法将JavaDStream转换为DStream。例如..

lines.dstream().saveAsObjectFiles("pre", "suf")

#4


If the JavaDStream object is dstream and the directory path is path you can save it as

如果JavaDStream对象是dstream并且目录路径是路径,则可以将其保存为

 dstream.foreachRDD(rdd -> {
                rdd.saveAsTextFile(path);
            });

#5


use forEachRDD API of JavaDStream class.

使用JavaDStream类的forERDRD API。