I am having Spark 1.6.2 cluster with Hadoop YARN, Oozie. I have installed Zeppelin 0.6.1(Binary package with all interpreters: zeppelin-0.6.1-bin-all.tgz). When I am trying to use SparkR script with %spark.r interpreter,
我有Spark 1.6.2集群和Hadoop纱,Oozie。我已经安装了Zeppelin 0.6.1(带有所有解释器的二进制包:Zeppelin -0.6 -bin-all.tgz)。当我尝试使用带有%spark的SparkR脚本时。r翻译,
%spark.r
# Creating SparkConext and connecting to Cloudant DB
sc1 <- sparkR.init(sparkEnv = list("cloudant.host"="host_name","cloudant.username"="user_name","cloudant.password"="password", "jsonstore.rdd.schemaSampleSize"="-1"))
# Database to be connected to extract the data
database <- "sensordata"
# Creating Spark SQL Context
sqlContext <- sparkRSQL.init(sc)
# Creating DataFrame for the "sensordata" Cloudant DB
sensorDataDF <- read.df(sqlContext, database, header='true', source = "com.cloudant.spark",inferSchema='true')
# Get basic information about the DataFrame(sensorDataDF)
printSchema(sensorDataDF)
I am getting the following error(log):
我得到以下错误(日志):
ERROR [2016-08-25 03:28:37,336] (
{Thread-77}
JobProgressPoller.java[run]:54) - Can not get or update progress
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:373)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:111)
at org.apache.zeppelin.notebook.Paragraph.progress(Paragraph.java:237)
at org.apache.zeppelin.scheduler.JobProgressPoller.run(JobProgressPoller.java:51)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getProgress(RemoteInterpreterService.java:296)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getProgress(RemoteInterpreterService.java:281)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:370)
... 3 more
Help would be much appreciated.
非常感谢你的帮助。
2 个解决方案
#1
0
I faced the same similar issue after migrating to 0.6.1. The issue is Zeppelin is built with scala 2.11 and Apache Spark 1.6.2 is built with scala 2.10. You need to build spark 1.6.x with scala 2.11 or migrate your spark code to 2.0.0
在迁移到0.6.1之后,我遇到了同样的问题。问题是Zeppelin是用scala 2.11构建的,Apache Spark 1.6.2是用scala 2.10构建的。你需要建立spark 1.6。使用scala 2.11或将您的spark代码迁移到2.0.0
#2
0
Setting local[2] in the interpreter section fixed my issues. This was originally suggested by vgunnu
在解释器部分设置本地[2]解决了我的问题。这最初是由vgunnu提出的
"Try setting spark master as local[2], if that works, you might be missing few environmental variables in env file – vgunnu Aug 25 at 4:37"
“尝试将spark master设置为本地[2],如果可以,您可能会在env文件中丢失一些环境变量——vgunnu 8月25日4点37分。”
#1
0
I faced the same similar issue after migrating to 0.6.1. The issue is Zeppelin is built with scala 2.11 and Apache Spark 1.6.2 is built with scala 2.10. You need to build spark 1.6.x with scala 2.11 or migrate your spark code to 2.0.0
在迁移到0.6.1之后,我遇到了同样的问题。问题是Zeppelin是用scala 2.11构建的,Apache Spark 1.6.2是用scala 2.10构建的。你需要建立spark 1.6。使用scala 2.11或将您的spark代码迁移到2.0.0
#2
0
Setting local[2] in the interpreter section fixed my issues. This was originally suggested by vgunnu
在解释器部分设置本地[2]解决了我的问题。这最初是由vgunnu提出的
"Try setting spark master as local[2], if that works, you might be missing few environmental variables in env file – vgunnu Aug 25 at 4:37"
“尝试将spark master设置为本地[2],如果可以,您可能会在env文件中丢失一些环境变量——vgunnu 8月25日4点37分。”