Zeppelin0.5.6使用spark解释器

时间:2023-02-06 17:09:55

Zeppelin为0.5.6

Zeppelin默认自带本地spark,可以不依赖任何集群,下载bin包,解压安装就可以使用。

使用其他的spark集群在yarn模式下。

配置:

vi zeppelin-env.sh

Zeppelin0.5.6使用spark解释器

添加:

export SPARK_HOME=/usr/crh/current/spark-client
export SPARK_SUBMIT_OPTIONS="--driver-memory 512M --executor-memory 1G"export HADOOP_CONF_DIR=/etc/hadoop/conf

 

Zeppelin Interpreter配置

Zeppelin0.5.6使用spark解释器

 

注意:设置完重启解释器。

Properties的master属性如下:

Zeppelin0.5.6使用spark解释器

新建Notebook

Tips:几个月前zeppelin还是0.5.6,现在最新0.6.2,zeppelin 0.5.6写notebook时前面必须加%spark,而0.6.2若什么也不加就默认是scala语言。

zeppelin 0.5.6不加就报如下错:

Connect to 'databank:4300' failed
%spark.sqlselect count(*) from tc.gjl_test0

报错:

Zeppelin0.5.6使用spark解释器

com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
at [Source: {"id":"2","name":"ConvertToSafe"}; line: 1, column: 1]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
at com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:358)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:265)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:245)
at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
at com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
at com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3666)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3558)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2578)
at org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:85)
at org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
at org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:136)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.ConvertToSafe.doExecute(rowFormatConverters.scala:56)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:187)
at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165)
at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086)
at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1498)
at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1505)
at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1375)
at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1374)
at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099)
at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1374)
at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1456)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:297)
at org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:144)
at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
at org.apache.zeppelin.scheduler.Job.run(Job.java:169)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Zeppelin0.5.6使用spark解释器

原因:

进入/opt/zeppelin-0.5.6-incubating-bin-all目录下:

# ls lib |grep jackson
jackson-annotations-2.5.0.jar
jackson-core-2.5.3.jar
jackson-databind-2.5.3.jar

将里面的版本换成如下版本:

# ls lib |grep jackson
jackson-annotations-2.4.4.jar
jackson-core-2.4.4.jar
jackson-databind-2.4.4.jar

测试成功!

Zeppelin0.5.6使用spark解释器

参考网站

 

Sparksql也可直接通过hive jdbc连接,只需换端口,如下图:

Zeppelin0.5.6使用spark解释器

Zeppelin0.5.6使用spark解释器

 

zeppelin主要有以下功能

  1. 数据提取

  2. 数据发现

  3. 数据分析

  4. 数据可视化

Zeppelin0.5.6使用spark解释器

目前版本(0.5-0.6)之前支持的数据搜索引擎有如下

Zeppelin0.5.6使用spark解释器

安装

环境 
centOS 6.6

编译准备工作

sudo yum update
sudo yum install openjdk-7-jdk
sudo yum install git
sudo yum install npm

下载源码

git clone https://github.com/apache/incubator-zeppelin.git

编译,打包

cd incubator-zeppelin

#build for spark 1.4.x ,hadoop 2.4.x
mvn clean package -Pspark-1.4 -Dhadoop.version=2.4.0 -Phadoop-2.4 -DskipTests -P build-distr

Zeppelin0.5.6使用spark解释器

结果会生成在zeppelin-distribution/target

解压

tar -zxvf zeppelin-0.6.0-incubating-SNAPSHOT.tar.gz

修改配置,在zeppelin-site.xml中可以修改端口号等信息,zeppelin-env.sh中修改一些启动环境变量。

cp zeppelin-site.xml.template zeppelin-site.xml
cp zeppelin-env.sh.template zeppelin-env.sh

启动zeppelin

./bin/zeppelin-daemon.sh start

关闭zeppelin(记得要用命令关闭,不然你很可能再也起不来,别问我怎么知道的。)

./bin/zeppelin-daemon.sh stop

web ui

Zeppelin0.5.6使用spark解释器

安装环节至此结束,后续使用篇主要是hive与spark-sql的可视化使用,有时间将慢慢添加。


1.首先我们要下载zeppelin的压缩包,当我们解压之后(这一台主机上面已经安装过了java的环境)

  2.修改配置环境

   进入conf/

   将zeppelin-env.sh.template修改为zeppelin-env.sh

   将zeppelin-site.xml.template修改为zeppelin-site.xml

  Zeppelin0.5.6使用spark解释器

   然后我们接下来修改conf/zeppelin-env.sh新增

      export SPARK_MASTER_IP=192.168.109.136

      export SPARK_LOCAL_IP=192.168.109.136

  3.启动zeppelin

    进入zeppelin:进入bin目录下执行./zeppelin-daemon.sh start

    然后浏览器访问192.168.109.136:8080进入界面

  Zeppelin0.5.6使用spark解释器

      此时就启动成功

  4.zeppelin简单实用

    1.text

    Zeppelin0.5.6使用spark解释器

    2.html

    Zeppelin0.5.6使用spark解释器

    3.table

    Zeppelin0.5.6使用spark解释器

    Zeppelin0.5.6使用spark解释器

    5.可以对数据进行分析

    对于我做的最多的分析,就是基于学校的那个资料,我有学校里面的信息,这个里面的每一行的信息是以","

    进行分隔,这个其中里面的民族,此时我们对这个民族进行分析

    Zeppelin0.5.6使用spark解释器

    由于我们这个zeppelin是在linux里面的启动,所以我们必须把原有的数据放到linux的里面,此时zeppelin读的文件目录是linux里面的目录

    Zeppelin0.5.6使用spark解释器

    Zeppelin0.5.6使用spark解释器

Zeppelin0.5.6使用spark解释器

Zeppelin0.5.6使用spark解释器

    则此时我们就可以对数据库里面的东西进行视图分析,我们通过这个数据,我们发现通过读取数据

    ,以分组的方式,然后在查询数据有多少个,这样就可以对数据进行显示

    a.

Zeppelin0.5.6使用spark解释器

val text = sc.textFile("/tmp/xjdx.txt")case class Person(college:String,time:Integer)
val rdd1 = text.map(line =>{
   val fields = line.split(",")    if(fields.length >=10){
     val mz = fields(10)
     Person(mz,1)
   }else{
       Person("1",1)
   }
})

Zeppelin0.5.6使用spark解释器

    b.

rdd1.toDF().registerTempTable("rdd1")

    c.

%sql select college,count(1) from rdd1 group by college