【Oozie】Oozie企业级案例分析及实现

版权声明:本文为博主原创文章,转载请注明出处。 https://blog.csdn.net/gongxifacai_believe/article/details/81124717

1、需求分析

流程:将Hive表中的数据进行分析,将分析结果存储到hdfs上,然后用Sqoop将hdfs上的分析结果导出到mysql表中,前端通过查询mysql表进行数据展示。 Oozie构成:需要两个Action,一个是hive action,一个是sqoop action,workflow中的start节点进入hive action,hive action执行成功进入sqoop action,sqoop action执行成功到达end,有任何地方出错进入error。

2、测试SQL语句

hive (default)> select id, account, password, INPUT__FILE__NAME from user_hive;

OK
id      account password        input__file__name
1       admin   admin   hdfs://hadoop-senior.ibeifeng.com:8020/user/hive/warehouse/user_hive/part-m-00000
2       pu      12345   hdfs://hadoop-senior.ibeifeng.com:8020/user/hive/warehouse/user_hive/part-m-00000
3       system  system  hdfs://hadoop-senior.ibeifeng.com:8020/user/hive/warehouse/user_hive/part-m-00000
4       zxh     zxh     hdfs://hadoop-senior.ibeifeng.com:8020/user/hive/warehouse/user_hive/part-m-00000
5       test    test    hdfs://hadoop-senior.ibeifeng.com:8020/user/hive/warehouse/user_hive/part-m-00000
6       pudong  pudong  hdfs://hadoop-senior.ibeifeng.com:8020/user/hive/warehouse/user_hive/part-m-00000
7       qiqi    qiqi    hdfs://hadoop-senior.ibeifeng.com:8020/user/hive/warehouse/user_hive/part-m-00000
8       beifeng beifeng hdfs://hadoop-senior.ibeifeng.com:8020/user/hive/warehouse/user_hive/part-m-00000
9       xuanyu  xuanyu  hdfs://hadoop-senior.ibeifeng.com:8020/user/hive/warehouse/user_hive/part-m-00000
Time taken: 30.244 seconds, Fetched: 9 row(s)

3、SQL脚本select-user.sql

-- drop table tmp_user
drop table if exists default.tmp_user ;
-- create table tmp_user
create external table default.tmp_user like user_hive location '${OUTPUT}';
-- create table default.tmp_user like user_hive location '/user/beifeng/oozie/datas/wf-user-select/output';
-- load data to tmp_user 
insert into table tmp_user 
select 
	id, account, password 
from 
	user_hive 
where 
		id >= 15 
	and 
		account like 'beif%' ;

4、配置文件hive-site.xml

此文件也需要放入apps目录下。

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

	<property>
		<name>javax.jdo.option.ConnectionURL</name>
		<value>jdbc:mysql://hadoop-senior.ibeifeng.com:3306/metadata?createDatabaseIfNotExist=true</value>
		<description>JDBC connect string for a JDBC metastore</description>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionDriverName</name>
		<value>com.mysql.jdbc.Driver</value>
		<description>Driver class name for a JDBC metastore</description>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionUserName</name>
		<value>root</value>
		<description>username to use against metastore database</description>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionPassword</name>
		<value>123456</value>
		<description>password to use against metastore database</description>
	</property>

	<property>
		<name>hive.cli.print.header</name>
		<value>true</value>
		<description>Whether to print the names of the columns in query output.</description>
	</property>

	<property>
		<name>hive.cli.print.current.db</name>
		<value>true</value>
		<description>Whether to include the current database in the Hive prompt.</description>
	</property>
	
	<property>
		<name>hive.fetch.task.conversion</name>
		<value>minimal</value>
		<description>
		Some select queries can be converted to single FETCH task minimizing latency.
		Currently the query should be single sourced not having any subquery and should not have
		any aggregations or distincts (which incurs RS), lateral views and joins.
		1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
		2. more    : SELECT, FILTER, LIMIT only (TABLESAMPLE, virtual columns)
		</description>
	</property>
	
</configuration>

5、配置文件job.properties

nameNode=hdfs://hadoop-senior.ibeifeng.com:8020
jobTracker=hadoop-senior.ibeifeng.com:8032
queueName=default
oozieAppsRoot=user/beifeng/oozie-apps
oozieDataRoot=user/beifeng/oozie/datas

oozie.use.system.libpath=true

oozie.coord.application.path=${nameNode}/${oozieAppsRoot}/wf-user-select
start=2015-10-15T00:00+0800
end=2015-10-26T00:00+0800
workflowAppUri=${nameNode}/${oozieAppsRoot}/wf-user-select

outputDir=wf-user-select/output

6、配置文件workflow.xml

<workflow-app xmlns="uri:oozie:workflow:0.5" name="wf-user-select">
    <start to="hive-node"/>

    <action name="hive-node">
        <hive xmlns="uri:oozie:hive-action:0.5">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${nameNode}/${oozieDataRoot}/${outputDir}"/>
            </prepare>
			<job-xml>${nameNode}/${oozieAppsRoot}/hive-select/hive-site.xml</job-xml>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <script>select-user.sql</script>
            <param>OUTPUT=${nameNode}/${oozieDataRoot}/${outputDir}</param>
        </hive>
        <ok to="sqoop-node"/>
        <error to="fail"/>
    </action>
	
    <action name="sqoop-node">
        <sqoop xmlns="uri:oozie:sqoop-action:0.3">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <command>export --connect jdbc:mysql://hadoop-senior.ibeifeng.com:3306/test --username root --password 123456 --table my_user --num-mappers 1 --fields-terminated-by "\t" --export-dir /user/beifeng/oozie/datas/wf-user-select/output</command>
        </sqoop>
        <ok to="end"/>
        <error to="fail"/>
    </action>	

    <kill name="fail">
        <message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

7、配置文件Coordinator.xml

<coordinator-app name="cron-coord" frequency="30 1 * * *" 
				start="${start}" end="${end}" timezone="GMT+0800"
                xmlns="uri:oozie:coordinator:0.4">
        <action>
        <workflow>
            <app-path>${workflowAppUri}</app-path>
            <configuration>
                <property>
                    <name>jobTracker</name>
                    <value>${jobTracker}</value>
                </property>
                <property>
                    <name>nameNode</name>
                    <value>${nameNode}</value>
                </property>
                <property>
                    <name>queueName</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
        </workflow>
    </action>
</coordinator-app>

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

扫码关注云+社区

领取腾讯云代金券