I 'm trying the technologies that i will be using to build a real-time data pipeline, and i have run into some issues exporting my contents to a file.
我正在尝试我将要使用的技术来构建一个实时的数据管道,并且我遇到了一些将我的内容导出到文件中的问题。
I have setup a local kafka cluster, and a node.js producer that sends a simple text message just to test functionality and get a rough estimate of complexity of implementation.
我设置了一个本地kafka集群和一个节点。js生成器只发送一个简单的文本消息以测试功能,并对实现的复杂性进行粗略估计。
This is the spark streaming job that is reading from kafka and i am trying to get it to write to a file.
这就是从kafka中读取的spark流媒体作业,我试图让它写入一个文件。
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
# Create a local StreamingContext with two working thread and batch interval of 1 second
sc = SparkContext("local[2]", "KafkaStreamingConsumer")
ssc = StreamingContext(sc, 10)
kafkaStream = KafkaUtils.createStream(ssc, "localhost:2181", "consumer-group", {"test": 1})
kafkaStream.saveAsTextFile('out.txt')
print 'Event recieved in window: ', kafkaStream.pprint()
ssc.start()
ssc.awaitTermination()
The error i am seeing when submitting the spark job is:
提交spark作业时,我看到的错误是:
kafkaStream.saveAsTextFile('out.txt')
AttributeError: 'TransformedDStream' object has no attribute 'saveAsTextFile'
No computations or transformations are performed on the data, i just want to build the flow. What do I need to change/add to be able to export the data in a file?
没有对数据执行计算或转换,我只想构建流。我需要更改/添加什么才能将数据导出到文件中?
1 个解决方案
#1
9
http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html
http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html
saveAsTextFiles (note the plural)
saveAsTextFiles(注意复数)
saveAsTextFile (singular) is a method on an RDD, not a DStream.
saveAsTextFile(单数)是一个RDD方法,而不是DStream。
#1
9
http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html
http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html
saveAsTextFiles (note the plural)
saveAsTextFiles(注意复数)
saveAsTextFile (singular) is a method on an RDD, not a DStream.
saveAsTextFile(单数)是一个RDD方法,而不是DStream。