Hadoop基础教程-第12章 Hive:进阶(12.1 内置函数)(草稿)

第12章 Hive:进阶

12.1 内置函数

为了方便测试Hive的内置函数,需要构造一个类似于Oracle的dual虚表

hive> create table dual(value string);
OK
Time taken: 0.117 seconds
hive>
hive> insert into dual values("test");
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170820093018_106fdbe1-3d77-4fbb-b200-3b3d56007858
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1503220733636_0016, Tracking URL = http://node1:8088/proxy/application_1503220733636_0016/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1503220733636_0016
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-08-20 09:30:34,522 Stage-1 map = 0%,  reduce = 0%
2017-08-20 09:30:44,205 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.76 sec
MapReduce Total cumulative CPU time: 1 seconds 760 msec
Ended Job = job_1503220733636_0016
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://cetc/user/hive/warehouse/dual/.hive-staging_hive_2017-08-20_09-30-18_395_4589036656384871958-1/-ext-10000
Loading data to table default.dual
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.76 sec   HDFS Read: 3674 HDFS Write: 73 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 760 msec
OK
Time taken: 28.658 seconds
hive>
hive> select 1+1 from dual;
OK
2
Time taken: 0.176 seconds, Fetched: 1 row(s)
hive> select 7%3 from dual;
OK
1
Time taken: 0.212 seconds, Fetched: 1 row(s)
hive>

12.1.1 标准函数

(1)日期函数

hive> select to_date('2016-08-31 08:30:00') from dual;
OK
2016-08-31
Time taken: 0.129 seconds, Fetched: 1 row(s) 
hive> select year('2016-08-31 08:30:00') from dual;
OK
2016
Time taken: 0.207 seconds, Fetched: 1 row(s)
hive> select month('2016-08-31 08:30:00') from dual;
OK
8
Time taken: 0.156 seconds, Fetched: 1 row(s)
hive> 

(2)数学函数

hive> select sqrt(2) from dual;
OK
1.4142135623730951
Time taken: 0.17 seconds, Fetched: 1 row(s) 
hive> select abs(-11) from dual;
OK
11
Time taken: 0.167 seconds, Fetched: 1 row(s)
hive> select floor(3.56) from dual;
OK
3
Time taken: 0.134 seconds, Fetched: 1 row(s)
hive> select ceil(3.123) from dual;
OK
4
Time taken: 0.131 seconds, Fetched: 1 row(s)
hive> select round(3.23456) from dual;
OK
3.0
Time taken: 0.138 seconds, Fetched: 1 row(s)
hive> select round(3.23456,3) from dual;
OK
3.235
Time taken: 0.291 seconds, Fetched: 1 row(s)
hive>

(3)字符串函数

hive> select length('hadoop') from dual;
OK
6
Time taken: 0.28 seconds, Fetched: 1 row(s)
hive> select reverse('hadoop') from dual;
OK
poodah
Time taken: 0.257 seconds, Fetched: 1 row(s)
hive> select substr('hadoop',2) from dual;
OK
adoop
Time taken: 0.221 seconds, Fetched: 1 row(s)
hive> select trim(' hadoop  ') from dual;
OK
hadoop
Time taken: 0.267 seconds, Fetched: 1 row(s)
hive>

12.1.2 聚合函数

hive> select count(1) from emp;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170824100247_a5b82db6-3a76-41bb-9f33-1c23c06209da
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1503582553611_0003, Tracking URL = http://node1:8088/proxy/application_1503582553611_0003/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1503582553611_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-08-24 10:03:07,501 Stage-1 map = 0%,  reduce = 0%
2017-08-24 10:03:21,721 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.49 sec
2017-08-24 10:03:33,482 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.88 sec
MapReduce Total cumulative CPU time: 5 seconds 880 msec
Ended Job = job_1503582553611_0003
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.88 sec   HDFS Read: 9088 HDFS Write: 102 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 880 msec
OK
13
Time taken: 47.277 seconds, Fetched: 1 row(s)
hive>
hive> select avg(sal) from emp;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170824095107_1f447d4c-f008-491d-8537-00fc4a0d45ea
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1503582553611_0001, Tracking URL = http://node1:8088/proxy/application_1503582553611_0001/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1503582553611_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-08-24 09:51:36,596 Stage-1 map = 0%,  reduce = 0%
2017-08-24 09:51:55,721 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.95 sec
2017-08-24 09:52:10,207 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.01 sec
MapReduce Total cumulative CPU time: 6 seconds 10 msec
Ended Job = job_1503582553611_0001
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.01 sec   HDFS Read: 9818 HDFS Write: 118 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 10 msec
OK
2077.0833333333335
Time taken: 65.158 seconds, Fetched: 1 row(s)
hive>
hive> select max(sal) from emp where did=30;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170824095233_ba070012-65fb-42da-89de-f748d33fc9b9
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1503582553611_0002, Tracking URL = http://node1:8088/proxy/application_1503582553611_0002/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1503582553611_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-08-24 09:52:52,526 Stage-1 map = 0%,  reduce = 0%
2017-08-24 09:53:07,359 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.0 sec
2017-08-24 09:53:18,303 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.7 sec
MapReduce Total cumulative CPU time: 6 seconds 700 msec
Ended Job = job_1503582553611_0002
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.7 sec   HDFS Read: 10207 HDFS Write: 106 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 700 msec
OK
2850.0
Time taken: 46.863 seconds, Fetched: 1 row(s)
hive>

12.1.3 表生成函数

hive> select array(1,2,3) from dual;
OK
[1,2,3]
Time taken: 0.371 seconds, Fetched: 1 row(s)
hive> select explode(array(1,2,3)) from dual;
OK
1
2
3
Time taken: 0.265 seconds, Fetched: 3 row(s)
hive>

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏Java学习123

tomcat 性能调优

2779
来自专栏Hadoop实操

如何使用R连接Hive与Impala

继之前的文章讲述如何在Redhat中配置R环境和如何在Redhat中安装R的包及搭建R的私有源后,那我们如何使用R连接CDH集群中的Hive和Impala进行数...

4606
来自专栏Hadoop实操

Hive2.2.0如何与CDH集群中的Spark1.6集成

1452
来自专栏搜云库

大数据平台搭建 Hadoop-2.7.4 + Spark-2.2.0 快速搭建

Apache Spark 简介 Apache Spark 是专为大规模数据处理而设计的快速通用的计算引擎。Spark是UC Berkeley AMP lab (...

4018
来自专栏腾讯云容器服务团队的专栏

kubernetes 中 kafka 和 zookeeper 有状态集群服务部署实践 (一)

本文将尝试根据社区提供的 StatefulSet 方案,对 kafka 和 zookeeper 服务进行部署。具体的部署过程包括以下几个部署: Persiste...

7.2K3
来自专栏Java编程技术

Dubbo剖析-监控平台的搭建与使用

dubbo-monitor主要用来统计服务的调用次调和调用时间的监控中心,服务消费者和提供者,在内存中累计调用次数和调用时间,定时每分钟发送一次统计数据到监控中...

1232
来自专栏运维

k8s1.5.4挂载volume之nfs

/usr/local/kubernetes/examples/volumes/nfs

692
来自专栏大数据学习笔记

Hadoop基础教程-第10章 HBase:Hadoop数据库(10.3 HBase安装与配置)(草稿)

第10章 HBase:Hadoop数据库 10.3 HBase安装与配置 HBase服务规划 机器名 HMaster HRegionServer Zook...

1786
来自专栏个人分享

hadoop-spark-hive-hbase配置相关说明

或者 ./runRemoteCmd.sh '~/och200/zookeeper/bin/zkServer-initialize.sh --myid=1' zo...

1062
来自专栏闵开慧

hive安装配置(0.11版本)

1 假设Hadoop已经安装并配置正确,MySQL已经正确安装 2 为支持Hive的多用户多会话需求,需要使用一个独立的数据库存储元数据。   这里选择MyS...

3555

扫码关注云+社区