HUE 之 sparksql 配置及使用

时间:2024-01-23 19:44:36

1、环境说明:

  HDP 2.4 V3 sandbox

  hue 4.0.0

2、hue 4.0.0 编译及安装

  地址:https://github.com/cloudera/hue/releases/tag/release-4.1.0(也许是发版这弄错了吧,连接是4.1.0,内容版本是4.0.0)

  2.1 修改%HUE_CODE_HOME%/hue/maven/pom.xml版本,如下:  

<hadoop-mr1.version>2.7.1</hadoop-mr1.version>
<hadoop.version>2.7.1</hadoop.version>
<spark.version>1.6.0</spark.version>

  2.2 将hadoop-core修改为hadoop-common(core会报错找不到)

<artifactId>hadoop-common</artifactId>

  2.3 将hadoop-test的版本改为1.2.1:

<artifactId>hadoop-test</artifactId><version>1.2.1</version>

  2.4 删除多余文件,否则编译时会报错

将两个ThriftJobTrackerPlugin.java文件删除,分别在如下两个目录:

%HUE_CODE_HOME%/hue/desktop/libs/hadoop/java/src/main/java/org/apache/hadoop/thriftfs/ThriftJobTrackerPlugin.java

%HUE_CODE_HOME%/hue/desktop/libs/hadoop/java/src/main/java/org/apache/hadoop/mapred/ThriftJobTrackerPlugin.java

  2.5 编译安装  

  PREFIX=/usr/local/hue-4.0.0-release/ make clean  //指定要安装的目录

  rm -rf /usr/local/hue-4.0.0-release/*

  PREFIX=/usr/local/hue-4.0.0-release/ make install

3. spark thrift server 配置及启动

hdp 2.4 V3 的  spark thrift server  默认端口是10015,我们将此信息配置到 /usr/hdp/current/spark-thriftserver/conf/hive-site.xml中,如下:(我没找到在ambari启动spark thrift-server的入口,只能手动启动)

<configuration>

	<property>
		<name>hive.metastore.uris</name>
		<value>thrift://sandbox.hortonworks.com:9083</value>
	</property>
	<property>
		<name>hive.server2.thrift.port</name>
		<value>10015</value>
		<description>
			Port number of HiveServer2 Thrift interface.  Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT
		</description>
	</property>
<!--	<property>
		<name>hive.server2.thrift.bind.host</name>
		<value>localhost</value>
		<description>
			Bind host on which to run the HiveServer2 Thrift interface.  Can be overridden by setting $HIVE_SERVER2_THRIFT_BIND_HOST
		</description>
	</property>
-->
</configuration>

  配置完之后,启动thrift-server。

cd /usr/hdp/current/spark-thriftserver/
sbin/start-thriftserver.sh --master yarn --deploy-mode client

4 配置hue.ini  (/usr/local/hue-4.0.0-release/hue/desktop/conf/hue.ini)

 4.1 反注释   [[interpreters]] 下的 sparksql,如下:

[[interpreters]]
    # Define the name and how to connect and execute the language.

    [[[hive]]]
      # The name of the snippet.
      name=Hive
      # The backend connection to use to communicate with the server.
      interface=hiveserver2

    [[[impala]]]
      name=Impala
      interface=hiveserver2

    [[[sparksql]]]
      name=SparkSql
      interface=hiveserver2


    [[[spark]]]
      name=Scala
      interface=livy

    [[[pyspark]]]
      name=PySpark
      interface=livy

    [[[r]]]
      name=R
      interface=livy

    [[[jar]]]
      name=Spark Submit Jar
      interface=livy-batch

  4.2 配置spark 的livy server 如下:

###########################################################################
# Settings to configure the Spark application.
###########################################################################

[spark]
  # Host address of the Livy Server.
  livy_server_host=localhost

  # Port of the Livy Server.
  livy_server_port=8998

  # Configure Livy to start in local 'process' mode, or 'yarn' workers.
  livy_server_session_kind=yarn

  # Whether Livy requires client to perform Kerberos authentication.
  security_enabled=false

  # Host of the Sql Server
  sql_server_host=localhost

  # Port of the Sql Server
  sql_server_port=10015

  注意:端口配置为spark-thrift server 端口10015

   5 验证结果

  5.1 确保spark-thrift server已经启动  

cd /usr/hdp/current/spark-thriftserver/
sbin/start-thriftserver.sh --master yarn --deploy-mode client

   5.2 启动hue  

cd /usr/local/hue-4.0.0-release/hue/
build/env/bin/supervisor

   5.3 登录hue,选择notebook-editor-sparksql,录入sql

 

 5.4 打开yarn页面,可以看到当前有一个spark thrift server 的job。

 

    5.5 执行5.3 的sql,点击5.4 job 右侧的applicationMaster ,进入spark页面,可以看到如下spark job。在stages页面,我们可以看到执行的sql,

5.6 待执行完成之后,查看hue页面,可以看到查到的数据如下:

至此,说明 hue发起的请求,spark thrift server 已经接收到,且能够正常执行。 

6  额外说明:

细心的读者可能发现了,我们配置了livy server,但是却没有启动livy-server。

在此说明:spark sql 执行(使用 spark sql [[interpreters]] )的时候,不使用livy server。直接把sql 提交到了 spark 的thrift server 上,但是要读取livy server中的sql_server_port变量

只有在使用spark scala 这些interpreters的时候,才会用到 livy-server