OOZIE调用SPARK实例

时间:2021-12-11 20:50:03

前言

最近工作上经常使用oozie,我在使用过程中发现oozie在做任务调度框架的时候非常的方便,效率也是非常的高效。在这里做一个实例编写。也是和朋友一起学习的过程。

oozie任务配置文件

我在工作中存放job信息的文件名为

job.properties

nameNode=hdfs名称
jobTracker=yarn名称
//queue name
queueName=default
fkOozieRoot=存放oozie配置文件路径
//default libpath
oozie.libpath=存放调用的程序jar
oozie.use.system.libpath=true

//表示设置定时任务
oozie.coord.application.path=${nameNode}${top10Path}
workflowAppUri=${nameNode}/user/${user.name}/${fkOozieRoot}
//表示设置单次任务
oozie.wf.application.path=${nameNode}/user/hdfs/${fkOozieRoot}
//定时任务周期(这里的时间单位是分钟)
freq=120
//任务开始时间
start=2016-12-27T12:00Z
//任务结束时间
end=2115-06-16T16:00Z

oozie单次执行的配置文件(本示例是oozie调用spark)

workflow.xml

<?xml version="1.0"?>
<!--name 参数表示oozie启动起来的名称。可以根据自己情况命名-->
<workflow-app xmlns="uri:oozie:workflow:0.4" name="wf-ajsk-2h">
<start to="shell-date"/>
<action name="shell-date">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>mapred.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
</configuration>
<master>yarn</master>
<mode>cluster</mode>
//oozie名称
<name>ajsk</name>
//主程序名称
<class>com.dxshs.ajsk.sql.ReadDataSource</class>
//程序打包后jar名称
<jar>dxshs-ajsk-1.0-SNAPSHOT.jar</jar>
//spark参数
<spark-opts> --driver-memory 4G --executor-memory 18G --executor-cores 4 --num-executors 50 --conf "spark.yarn.executor.memoryOverhead=2048" --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC" --conf "spark.storage.memoryFraction=0.6" </spark-opts>
</spark>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message></kill>
<end name="end"/>
</workflow-app>

定时任务配置参数

coordinator.xml

<?xml version="1.0"?>

<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

<!--end表示停止时间,start表示开始时间,frequency表示周期时间 name表示定时任务名称 -->

<coordinator-app xmlns="uri:oozie:coordinator:0.2" timezone="UTC" end="${end}" start="${start}" frequency="${freq}" name="cd-ajsk-2h">
<action>
<workflow>
<app-path>${workflowAppUri}</app-path>
<configuration>
<property>
<name>jobTracker</name>
<value>${jobTracker}</value>
</property>
<property>
<name>nameNode</name>
<value>${nameNode}</value>
</property>
<property>
<name>queueName</name>
<value>${queueName}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>

oozie常用命令

--启动命令
oozie job -oozie http://oozie-server:11000/oozie -config job.properties -run
--停止命令
oozie job -oozie http://oozie-server:11000/oozie kill jobid

这也是2016年最后一篇文章。希望给需要的朋友们带去帮助。帮助他人,快乐自己。

参考
http://oozie.apache.org/docs/4.1.0/index.html