首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Oozie快速入门系列(2) | 一文带你快速了解Oozie的使用(超详细!!!)

Oozie快速入门系列(2) | 一文带你快速了解Oozie的使用(超详细!!!)

作者头像
不温卜火
发布2020-10-28 16:41:43
5000
发布2020-10-28 16:41:43
举报
文章被收录于专栏:不温卜火不温卜火

一. Oozie调度shell脚本

目标:使用Oozie调度Shell脚本   大体过程如下:

1
1
  • 1. 创建工作目录
[bigdata@hadoop002 oozie-4.0.0-cdh5.3.6]$ mkdir oozie-apps/
[bigdata@hadoop002 oozie-apps]$ mkdir shell
[bigdata@hadoop002 oozie-apps]$ cd shell/
  • 2. 新建所需要的两个文件——job.properties和workflow.xml文件
// 定义工作流程
[bigdata@hadoop002 shell]$ touch workflow.xml
// 
[bigdata@hadoop002 shell]$ touch job.properties
2
2
  • 3. 修改job.properties和workflow.xml文件
// 1. job.properties
#HDFS地址
nameNode=hdfs://hadoop002:8020
#ResourceManager地址
jobTracker=hadoop003:8032
#队列名称
queueName=default
examplesRoot=oozie-apps
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/shell


// 2. workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
<!--开始节点-->
<start to="shell-node"/>
<!--动作节点-->
<action name="shell-node">
    <!--shell动作-->
    <shell xmlns="uri:oozie:shell-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <configuration>
            <property>
                <name>mapred.job.queue.name</name>
                <value>${queueName}</value>
            </property>
        </configuration>
        <!--要执行的脚本-->
        <exec>mkdir</exec>
        <argument>/opt/module/d</argument>
        <capture-output/>
    </shell>
    <ok to="end"/>
    <error to="fail"/>
</action>
<!--kill节点-->
<kill name="fail">
    <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<!--结束节点-->
<end name="end"/>
</workflow-app>
  • 4. 上传任务配置
[bigdata@hadoop002 hadoop-2.5.0-cdh5.3.6]$ bin/hadoop fs -put /opt/module/oozie-4.0.0-cdh5.3.6/oozie-apps/ /user/bigdata
  • 5. 执行任务
[bigdata@hadoop002 oozie-4.0.0-cdh5.3.6]$ bin/oozie job -oozie http://hadoop002:11000/oozie -config oozie-apps/shell/job.properties -run
3
3

web端查看

4
4

  程序流程图对比

5
5

二. Oozie逻辑调度执行多个Job

  使用Oozie执行多个Job调度,过程如下图

6
6
  • 1. 新建文件夹及文件
[bigdata@hadoop002 oozie-apps]$ mkdir xshell
[bigdata@hadoop002 oozie-apps]$ cd xshell/
[bigdata@hadoop002 xshell]$ touch workflow.xml
[bigdata@hadoop002 xshell]$ touch job.properties
7
7
  • 2. 编辑job.properties和workflow.xml文件
// 1. job.properties

nameNode=hdfs://hadoop002:8020
jobTracker=hadoop003:8032
queueName=default
examplesRoot=oozie-apps
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/xshell

// 2. workflow.xml

<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
    <start to="p1-shell-node"/>
    <action name="p1-shell-node">
        <shell xmlns="uri:oozie:shell-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <exec>mkdir</exec>
            <argument>/opt/module/d1</argument>
            <capture-output/>
        </shell>
        <ok to="forking"/>
        <error to="fail"/>
    </action>

    <action name="p2-shell-node">
        <shell xmlns="uri:oozie:shell-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <exec>mkdir</exec>
            <argument>/opt/module/d2</argument>
            <capture-output/>
        </shell>
        <ok to="joining"/>
        <error to="fail"/>
    </action>
    
    <action name="p3-shell-node">
        <shell xmlns="uri:oozie:shell-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <exec>mkdir</exec>
            <argument>/opt/module/d3</argument>
            <capture-output/>
        </shell>
        <ok to="joining"/>
        <error to="fail"/>
    </action>

    <action name="p4-shell-node">
        <shell xmlns="uri:oozie:shell-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <exec>mkdir</exec>
            <argument>/opt/module/d4</argument>
            <capture-output/>
        </shell>
        <ok to="end"/>
        <error to="fail"/>
    </action>

	<fork name="forking">
		<path start = "p2-shell-node"/>
		<path start = "p3-shell-node"/>
	</fork>

	<join name="joining" to="p4-shell-node"/>
		
    
    <kill name="fail">
        <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

  下图为流程图

10
10
  • 3. 上传任务配置
[bigdata@hadoop002 hadoop-2.5.0-cdh5.3.6]$ bin/hadoop fs -rm -r -f  /user/bigdata/oozie-apps/
[bigdata@hadoop002 hadoop-2.5.0-cdh5.3.6]$ bin/hadoop fs -put /opt/module/oozie-4.0.0-cdh5.3.6/oozie-apps/ /user/bigdata/
8
8
  • 4. 执行任务
[bigdata@hadoop002 oozie-4.0.0-cdh5.3.6]$ bin/oozie job -oozie http://hadoop002:11000/oozie -config oozie-apps/xshell/job.properties -run
9
9

三. Oozie调度MapReduce任务

目标:使用Oozie调度MapReduce任务

  • 1. 解压oozie官方案例到ozzie根目录下
11
11
[bigdata@hadoop002 oozie-4.0.0-cdh5.3.6]$ tar -zxvf oozie-examples.tar.gz 
12
12
  • 2. 进入到所解压的目录下
[bigdata@hadoop002 oozie-4.0.0-cdh5.3.6]$ cd examples/
[bigdata@hadoop002 examples]$ cd apps/
13
13
  • 3.拷贝官方模板到oozie-apps
[bigdata@hadoop002 apps]$ cp -r map-reduce/ ../../oozie-apps/
[bigdata@hadoop002 oozie-4.0.0-cdh5.3.6]$ cd oozie-apps/

// 删除多余的这两个文件,暂时用不到
[bigdata@hadoop002 map-reduce]$ rm job-with-config-class.properties workflow-with-config-class.xml

// 官方案例jar包
[bigdata@hadoop002 map-reduce]$ cp /opt/module/cdh/hadoop-2.5.0-cdh5.3.6/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar lib/
14
14
15
15
  • 4. 配置map-reduce任务的job.properties以及workflow.xml
// 1. job.properties
nameNode=hdfs://hadoop002:8020
jobTracker=hadoop003:8032
queueName=default
examplesRoot=oozie-apps
#hdfs://hadoop002:8020/user/admin/oozie-apps/map-reduce/workflow.xml
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/map-reduce/workflow.xml
outputDir=map-reduce

// 2. workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
    <start to="mr-node"/>
    <action name="mr-node">
        <map-reduce>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${nameNode}/output/"/>
            </prepare>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
                <!-- 配置调度MR任务时,使用新的API -->
                <property>
                    <name>mapred.mapper.new-api</name>
                    <value>true</value>
                </property>

                <property>
                    <name>mapred.reducer.new-api</name>
                    <value>true</value>
                </property>

                <!-- 指定Job Key输出类型 -->
                <property>
                    <name>mapreduce.job.output.key.class</name>
                    <value>org.apache.hadoop.io.Text</value>
                </property>

                <!-- 指定Job Value输出类型 -->
                <property>
                    <name>mapreduce.job.output.value.class</name>
                    <value>org.apache.hadoop.io.IntWritable</value>
                </property>

                <!-- 指定输入路径 -->
                <property>
                    <name>mapred.input.dir</name>
                    <value>/input/</value>
                </property>

                <!-- 指定输出路径 -->
                <property>
                    <name>mapred.output.dir</name>
                    <value>/output/</value>
                </property>

                <!-- 指定Map类 -->
                <property>
                    <name>mapreduce.job.map.class</name>
                    <value>org.apache.hadoop.examples.WordCount$TokenizerMapper</value>
                </property>

                <!-- 指定Reduce类 -->
                <property>
                    <name>mapreduce.job.reduce.class</name>
                    <value>org.apache.hadoop.examples.WordCount$IntSumReducer</value>
                </property>

                <property>
                    <name>mapred.map.tasks</name>
                    <value>1</value>
                </property>
            </configuration>
        </map-reduce>
        <ok to="end"/>
        <error to="fail"/>
    </action>
    <kill name="fail">
        <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>
  • 5. 上传配置好的app文件夹到HDFS
[bigdata@hadoop002 hadoop-2.5.0-cdh5.3.6]$ bin/hdfs dfs -put /opt/module/oozie-4.0.0-cdh5.3.6/oozie-apps/map-reduce/ /user/bigdata/oozie-apps
16
16
  • 6. 执行任务
[bigdata@hadoop002 oozie-4.0.0-cdh5.3.6]$ bin/oozie job -oozie http://hadoop002:11000/oozie -config oozie-apps/map-reduce/job.properties -run
// 下图为为正在跑的任务
17
17

  本次的分享就到这里了

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2020-06-13 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 一. Oozie调度shell脚本
  • 二. Oozie逻辑调度执行多个Job
  • 三. Oozie调度MapReduce任务
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档