Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.7 HQL:数据查询)(草稿)

第11章 Hive:SQL on Hadoop

11.7 HQL:数据查询

本节操作使用默认default数据库。

hive> use default;
OK
Time taken: 0.035 seconds
hive>

11.7.1 limit 子句

hive> select * from emp limit 5;
OK
7782    CLARK   MANAGER 7839    1981-06-09  2450.0  0.0 10
7839    KING    PRESIDENT   0   1981-11-17  5000.0  0.0 10
7934    MILLER  CLERK   7782    1982-01-23  1300.0  0.0 10
7369    SMITH   CLERK   7902    1980-12-17  800.0   0.0 20
7566    JONES   MANAGER 7839    1981-04-02  2975.0  0.0 20
Time taken: 2.18 seconds, Fetched: 5 row(s)
hive> 
hive> select * from emp limit 1,3;
OK
7839    KING    PRESIDENT   0   1981-11-17  5000.0  0.0 10
7934    MILLER  CLERK   7782    1982-01-23  1300.0  0.0 10
7369    SMITH   CLERK   7902    1980-12-17  800.0   0.0 20
Time taken: 0.303 seconds, Fetched: 3 row(s)
hive> 

11.7.2 case when then 语句

hive> select eid,ename,
    > case
    > when did=10 then 'd10'
    > when did=20 then 'd20'
    > when did=30 then 'd30'
    > end
    > from emp;
OK
7782    CLARK   d10
7839    KING    d10
7934    MILLER  d10
7369    SMITH   d20
7566    JONES   d20
7902    FORD    d20
7499    ALLEN   d30
7521    WARD    d30
7654    MARTIN  d30
7698    BLAKE   d30
7844    TURNER  d30
7900    JAMES   d30
Time taken: 0.469 seconds, Fetched: 12 row(s)
hive> 

11.7.3 where 子句

可以用orand连接多个表达式。

操作符

数据类型

说明

A=B

基本类型

A等于B返回true,否则返回false

A<>B,A!=B

基本类型

A或B为null,返回null; A和B都不空时,A不等于B返回true,否则返回false

A<B

基本类型

A或B为null,返回null; A小于B返回true,否则返回false

A<=B

基本类型

A或B为null,返回null; A小于等于B返回true,否则返回false

A>B

基本类型

A或B为null,返回null; A大于B返回true,否则返回false

A>=B

基本类型

A或B为null,返回null; A大于等于B返回true,否则返回false

A Between B And C

基本类型

筛选 A 的值处于 B 和 C 之间

A not between B and C

基本类型

筛选 A 的值不处于 B 和 C 之间

A is null

所有类型

如果A等于null,返回true,否则返回false。注意,不能使用A=null

A is not null

所有类型

筛选 A 值不是null的

A like B

string

B是一个SQL正则表达式,%表示一个或者多个字符,_表示一个字符

A not like B

string

与正则表达式B相反结果

A rlike B

string

正则匹配

hive> select * from emp
    > where did=30 and comm<>0.0;
OK
7499    ALLEN   SALESMAN    7698    1981-02-20  1600.0  300.0   30
7521    WARD    SALESMAN    7698    1981-02-22  1250.0  500.0   30
7654    MARTIN  SALESMAN    7698    1981-09-28  1250.0  1400.0  30
Time taken: 0.667 seconds, Fetched: 3 row(s)
hive> 
hive> select * from emp
    > where hiredate between '1981-01-01' and '1981-12-30';
OK
7782    CLARK   MANAGER 7839    1981-06-09  2450.0  0.0 10
7839    KING    PRESIDENT   0   1981-11-17  5000.0  0.0 10
7566    JONES   MANAGER 7839    1981-04-02  2975.0  0.0 20
7902    FORD    ANALYST 7566    1981-12-03  3000.0  0.0 20
7499    ALLEN   SALESMAN    7698    1981-02-20  1600.0  300.0   30
7521    WARD    SALESMAN    7698    1981-02-22  1250.0  500.0   30
7654    MARTIN  SALESMAN    7698    1981-09-28  1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981-05-01  2850.0  0.0 30
7844    TURNER  SALESMAN    7698    1981-09-08  1500.0  0.0 30
7900    JAMES   CLERK   7698    1981-12-03  950.0   0.0 30
Time taken: 0.311 seconds, Fetched: 10 row(s)
hive>

11.7.4 group by 子句

【例】查询每个部门的人数

hive> select count(*) from emp
    > group by did;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170814101839_fe4cccbd-62c4-48cb-986b-a70f93c19535
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1502717288397_0003, Tracking URL = http://node1:8088/proxy/application_1502717288397_0003/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502717288397_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-08-14 10:19:10,549 Stage-1 map = 0%,  reduce = 0%
2017-08-14 10:19:27,526 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.86 sec
2017-08-14 10:19:42,984 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.84 sec
MapReduce Total cumulative CPU time: 6 seconds 840 msec
Ended Job = job_1502717288397_0003
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.84 sec   HDFS Read: 9667 HDFS Write: 129 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 840 msec
OK
3
3
6
Time taken: 66.588 seconds, Fetched: 3 row(s)
hive> 

与Oracle数据库一样,如果使用group by子句,那么查询的字段如果没有出现在group by子句的后面,则必须使用聚合函数。

hive> select ename,avg(sal) from emp group by did;
FAILED: SemanticException [Error 10025]: Line 1:7 Expression not in GROUP BY key 'ename'
hive>

如果想对分组结果进行过滤,可以使用having子句

hive> select avg(sal) from emp 
    > group by did
    > having avg(sal)>2000;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170814103035_2106518a-aab4-4cdd-9b39-7b08fb1a44b6
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1502717288397_0007, Tracking URL = http://node1:8088/proxy/application_1502717288397_0007/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502717288397_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-08-14 10:31:08,601 Stage-1 map = 0%,  reduce = 0%
2017-08-14 10:31:25,888 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.88 sec
2017-08-14 10:31:43,122 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.34 sec
MapReduce Total cumulative CPU time: 8 seconds 340 msec
Ended Job = job_1502717288397_0007
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 8.34 sec   HDFS Read: 10807 HDFS Write: 149 SUCCESS
Total MapReduce CPU Time Spent: 8 seconds 340 msec
OK
2916.6666666666665
2258.3333333333335
Time taken: 70.202 seconds, Fetched: 2 row(s)
hive> 

11.7.5 内连接

(1)准备阶段 部门数据

[root@node3 data]# vi dept
[root@node3 data]# cat dept
10,ACCOUNTING,NEW YORK
20,RESEARCH,DALLAS
30,SALES,CHICAGO
40,OPERATIONS,BOSTON
[root@node3 data]# 

创建部门表dept,并导入数据

hive> create table if not exists dept(
    > did    int,
    > dname string,
    > dloc string)
    > row format delimited fields terminated by ',';
OK
Time taken: 0.185 seconds
hive> load data local inpath '/root/data/dept' into table dept;
Loading data to table default.dept
OK
Time taken: 0.705 seconds
hive> select * from dept;
OK
10  ACCOUNTING  NEW YORK
20  RESEARCH    DALLAS
30  SALES   CHICAGO
40  OPERATIONS  BOSTON
Time taken: 0.155 seconds, Fetched: 4 row(s)
hive> 

(2)等值连接 连接的两个表中,只有同时满足连接条件的记录才会放入结果表中。执行查询命令select eid,ename,dname from emp,dept where emp.did=dept.did;

hive> select eid,ename,dname
    > from emp,dept
    > where emp.did=dept.did;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170815101902_18fb066b-11f2-438d-8a92-9d4490b80ec2
Total jobs = 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2017-08-15 10:19:13 Starting to launch local task to process map join;  maximum memory = 518979584
2017-08-15 10:19:17 Dump the side-table for tag: 1 with group count: 4 into file: file:/tmp/root/6e48493b-cca6-4657-a459-7ff36c6137b1/hive_2017-08-15_10-19-02_412_1947825427115470836-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile11--.hashtable
2017-08-15 10:19:17 Uploaded 1 File to: file:/tmp/root/6e48493b-cca6-4657-a459-7ff36c6137b1/hive_2017-08-15_10-19-02_412_1947825427115470836-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile11--.hashtable (373 bytes)
2017-08-15 10:19:17 End of local task; Time Taken: 3.36 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502805552881_0002, Tracking URL = http://node1:8088/proxy/application_1502805552881_0002/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502805552881_0002
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2017-08-15 10:19:38,347 Stage-3 map = 0%,  reduce = 0%
2017-08-15 10:19:52,530 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.69 sec
MapReduce Total cumulative CPU time: 2 seconds 690 msec
Ended Job = job_1502805552881_0002
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   Cumulative CPU: 2.69 sec   HDFS Read: 7746 HDFS Write: 459 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 690 msec
OK
7782    CLARK   ACCOUNTING
7839    KING    ACCOUNTING
7934    MILLER  ACCOUNTING
7369    SMITH   RESEARCH
7566    JONES   RESEARCH
7902    FORD    RESEARCH
7499    ALLEN   SALES
7521    WARD    SALES
7654    MARTIN  SALES
7698    BLAKE   SALES
7844    TURNER  SALES
7900    JAMES   SALES
Time taken: 52.488 seconds, Fetched: 12 row(s)
hive> 

内连接另一种形式:select e.eid,e.ename,d.dname from emp e join dept d on e.did=d.did; 其中,e和d分别是表emp和dept的别名。

hive> select e.eid,e.ename,d.dname
    > from emp e join dept d
    > on e.did=d.did;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170815102454_0ca5cc0f-1802-4b4f-8156-d055482971b7
Total jobs = 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2017-08-15 10:25:06 Starting to launch local task to process map join;  maximum memory = 518979584
2017-08-15 10:25:10 Dump the side-table for tag: 1 with group count: 4 into file: file:/tmp/root/6e48493b-cca6-4657-a459-7ff36c6137b1/hive_2017-08-15_10-24-54_978_8533839352082414207-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable
2017-08-15 10:25:10 Uploaded 1 File to: file:/tmp/root/6e48493b-cca6-4657-a459-7ff36c6137b1/hive_2017-08-15_10-24-54_978_8533839352082414207-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable (373 bytes)
2017-08-15 10:25:10 End of local task; Time Taken: 3.864 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502805552881_0003, Tracking URL = http://node1:8088/proxy/application_1502805552881_0003/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502805552881_0003
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2017-08-15 10:25:27,803 Stage-3 map = 0%,  reduce = 0%
2017-08-15 10:25:38,323 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.24 sec
MapReduce Total cumulative CPU time: 2 seconds 240 msec
Ended Job = job_1502805552881_0003
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   Cumulative CPU: 2.24 sec   HDFS Read: 7746 HDFS Write: 459 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 240 msec
OK
7782    CLARK   ACCOUNTING
7839    KING    ACCOUNTING
7934    MILLER  ACCOUNTING
7369    SMITH   RESEARCH
7566    JONES   RESEARCH
7902    FORD    RESEARCH
7499    ALLEN   SALES
7521    WARD    SALES
7654    MARTIN  SALES
7698    BLAKE   SALES
7844    TURNER  SALES
7900    JAMES   SALES
Time taken: 44.558 seconds, Fetched: 12 row(s)
hive> 

11.7.6 外连接

内连接只显示了俩张表都存在的数据,而外连接则是显示出所有的数据 。外连接又分为左外连接、右外连接以及全外连接,可以简称左连接、右连接和全连接。其中左连接是左边的表输出完整数据。右连接是右边的表输出完整数据,全连接就是两张表的所有行全部输出。

(1)补充数据 为了测试外连接,需要在雇员表中添加一个没有部门的新员工信息

hive> insert into emp(eid,ename,hiredate) values(8888,'HADRON','2016-08-31'); 
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170816091805_3f28cfe1-278a-486e-9ed4-34ee791f5322
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502887685471_0005, Tracking URL = http://node1:8088/proxy/application_1502887685471_0005/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502887685471_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-08-16 09:18:20,468 Stage-1 map = 0%,  reduce = 0%
2017-08-16 09:18:32,863 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.48 sec
MapReduce Total cumulative CPU time: 2 seconds 480 msec
Ended Job = job_1502887685471_0005
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://cetc/user/hive/warehouse/emp/.hive-staging_hive_2017-08-16_09-18-05_560_4497324549163312419-1/-ext-10000
Loading data to table default.emp
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.48 sec   HDFS Read: 4861 HDFS Write: 105 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 480 msec
OK
Time taken: 30.102 seconds
hive>

(2)左连接

hive> select e.eid,e.ename,d.dname from emp e left join dept d on e.did=d.did;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170816091950_1483c055-0461-4293-9387-3221eb15b448
Total jobs = 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2017-08-16 09:20:01 Starting to launch local task to process map join;  maximum memory = 518979584
2017-08-16 09:20:04 Dump the side-table for tag: 1 with group count: 4 into file: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-19-50_831_2141891134326428737-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable
2017-08-16 09:20:04 Uploaded 1 File to: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-19-50_831_2141891134326428737-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable (373 bytes)
2017-08-16 09:20:04 End of local task; Time Taken: 2.606 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502887685471_0006, Tracking URL = http://node1:8088/proxy/application_1502887685471_0006/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502887685471_0006
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2017-08-16 09:20:20,596 Stage-3 map = 0%,  reduce = 0%
2017-08-16 09:20:30,466 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 1.82 sec
MapReduce Total cumulative CPU time: 1 seconds 820 msec
Ended Job = job_1502887685471_0006
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   Cumulative CPU: 1.82 sec   HDFS Read: 7158 HDFS Write: 486 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 820 msec
OK
7782    CLARK   ACCOUNTING
7839    KING    ACCOUNTING
7934    MILLER  ACCOUNTING
7369    SMITH   RESEARCH
7566    JONES   RESEARCH
7902    FORD    RESEARCH
7499    ALLEN   SALES
7521    WARD    SALES
7654    MARTIN  SALES
7698    BLAKE   SALES
7844    TURNER  SALES
7900    JAMES   SALES
8888    HADRON  NULL
Time taken: 41.877 seconds, Fetched: 13 row(s)
hive> 

(3)右连接

hive> select e.eid,e.ename,d.dname from emp e right join dept d on e.did=d.did;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170816092117_f97eea90-4039-45f9-8097-900c457f50f3
Total jobs = 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2017-08-16 09:21:28 Starting to launch local task to process map join;  maximum memory = 518979584
2017-08-16 09:21:31 Dump the side-table for tag: 0 with group count: 4 into file: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-21-17_901_4210650042244478897-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile30--.hashtable
2017-08-16 09:21:31 Uploaded 1 File to: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-21-17_901_4210650042244478897-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile30--.hashtable (498 bytes)
2017-08-16 09:21:31 End of local task; Time Taken: 2.599 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502887685471_0007, Tracking URL = http://node1:8088/proxy/application_1502887685471_0007/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502887685471_0007
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2017-08-16 09:21:48,072 Stage-3 map = 0%,  reduce = 0%
2017-08-16 09:21:58,404 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 1.95 sec
MapReduce Total cumulative CPU time: 1 seconds 950 msec
Ended Job = job_1502887685471_0007
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   Cumulative CPU: 1.95 sec   HDFS Read: 6169 HDFS Write: 488 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 950 msec
OK
7782    CLARK   ACCOUNTING
7839    KING    ACCOUNTING
7934    MILLER  ACCOUNTING
7369    SMITH   RESEARCH
7566    JONES   RESEARCH
7902    FORD    RESEARCH
7499    ALLEN   SALES
7521    WARD    SALES
7654    MARTIN  SALES
7698    BLAKE   SALES
7844    TURNER  SALES
7900    JAMES   SALES
NULL    NULL    OPERATIONS
Time taken: 42.768 seconds, Fetched: 13 row(s)
hive> 

(4)全连接

hive> select e.eid,e.ename,d.dname from emp e full join dept d on e.did=d.did;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170816092230_3b3f51e8-2753-41a2-9f41-af98770847fa
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1502887685471_0008, Tracking URL = http://node1:8088/proxy/application_1502887685471_0008/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502887685471_0008
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2017-08-16 09:22:45,280 Stage-1 map = 0%,  reduce = 0%
2017-08-16 09:23:06,469 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.28 sec
2017-08-16 09:23:16,137 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.52 sec
MapReduce Total cumulative CPU time: 8 seconds 520 msec
Ended Job = job_1502887685471_0008
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2  Reduce: 1   Cumulative CPU: 8.52 sec   HDFS Read: 16627 HDFS Write: 515 SUCCESS
Total MapReduce CPU Time Spent: 8 seconds 520 msec
OK
8888    HADRON  NULL
7782    CLARK   ACCOUNTING
7934    MILLER  ACCOUNTING
7839    KING    ACCOUNTING
7902    FORD    RESEARCH
7566    JONES   RESEARCH
7369    SMITH   RESEARCH
7654    MARTIN  SALES
7521    WARD    SALES
7900    JAMES   SALES
7499    ALLEN   SALES
7844    TURNER  SALES
7698    BLAKE   SALES
NULL    NULL    OPERATIONS
Time taken: 47.921 seconds, Fetched: 14 row(s)
hive> 

(5)左半连接

hive> select e.eid,e.ename from emp e 
    > left semi join dept d
    > on e.did=d.did;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170816093310_97649892-9cb8-4b26-954a-1a301976184d
Total jobs = 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2017-08-16 09:33:22 Starting to launch local task to process map join;  maximum memory = 518979584
2017-08-16 09:33:25 Dump the side-table for tag: 1 with group count: 4 into file: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-33-10_356_3322488615725018370-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile51--.hashtable
2017-08-16 09:33:25 Uploaded 1 File to: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-33-10_356_3322488615725018370-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile51--.hashtable (332 bytes)
2017-08-16 09:33:25 End of local task; Time Taken: 2.642 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502887685471_0010, Tracking URL = http://node1:8088/proxy/application_1502887685471_0010/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502887685471_0010
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2017-08-16 09:33:43,441 Stage-3 map = 0%,  reduce = 0%
2017-08-16 09:33:55,350 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.93 sec
MapReduce Total cumulative CPU time: 2 seconds 930 msec
Ended Job = job_1502887685471_0010
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   Cumulative CPU: 2.93 sec   HDFS Read: 7152 HDFS Write: 363 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 930 msec
OK
7782    CLARK
7839    KING
7934    MILLER
7369    SMITH
7566    JONES
7902    FORD
7499    ALLEN
7521    WARD
7654    MARTIN
7698    BLAKE
7844    TURNER
7900    JAMES
Time taken: 46.179 seconds, Fetched: 12 row(s)
hive> 

左半连接和内连接的结果一样,只不过使用IN查询。

hive> select e.eid,e.ename from emp e 
    > where e.did in(select d.did from dept d);
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170816092716_b479bf9e-fd03-48a1-be72-fb64d90f2efc
Total jobs = 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2017-08-16 09:27:27 Starting to launch local task to process map join;  maximum memory = 518979584
2017-08-16 09:27:30 Dump the side-table for tag: 1 with group count: 4 into file: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-27-16_222_8300160340562225012-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile41--.hashtable
2017-08-16 09:27:30 Uploaded 1 File to: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-27-16_222_8300160340562225012-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile41--.hashtable (332 bytes)
2017-08-16 09:27:30 End of local task; Time Taken: 2.987 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502887685471_0009, Tracking URL = http://node1:8088/proxy/application_1502887685471_0009/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502887685471_0009
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2017-08-16 09:27:46,861 Stage-3 map = 0%,  reduce = 0%
2017-08-16 09:27:59,352 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.91 sec
MapReduce Total cumulative CPU time: 2 seconds 910 msec
Ended Job = job_1502887685471_0009
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   Cumulative CPU: 2.91 sec   HDFS Read: 7152 HDFS Write: 363 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 910 msec
OK
7782    CLARK
7839    KING
7934    MILLER
7369    SMITH
7566    JONES
7902    FORD
7499    ALLEN
7521    WARD
7654    MARTIN
7698    BLAKE
7844    TURNER
7900    JAMES
Time taken: 44.323 seconds, Fetched: 12 row(s)
hive> 

注意,左半连接通常比内连接要高效,因为对于左表的一条 指定的记录,在右表一旦找到匹配的记录,Hive就会停止扫描。

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏大数据学习笔记

Hadoop基础教程-第12章 Hive:进阶(12.1 内置函数)(草稿)

第12章 Hive:进阶 12.1 内置函数 为了方便测试Hive的内置函数,需要构造一个类似于Oracle的dual虚表 hive> create table...

1978
来自专栏码匠的流水账

聊聊reactive streams的parallel flux

本文主要研究下reactive streams的flux的parallel运行方式.

561
来自专栏Albert陈凯

MapReduce设计模式

MapReduce设计模式(MapReduce Design Pattern) 整个MapReduce作业的阶段主要可以分为以下四种:   1、Input-M...

3026
来自专栏大数据学习笔记

Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.8 HQL:排序)(草稿)

第11章 Hive:SQL on Hadoop 11.8 HQL:排序 11.8.1 order by Hive 中的 order by与SQL 中的order...

2087
来自专栏分布式系统进阶

Librdkafka对Kafka Message的封装和相关操作

1283
来自专栏HansBug's Lab

3299: [USACO2011 Open]Corn Maze玉米迷宫

3299: [USACO2011 Open]Corn Maze玉米迷宫 Time Limit: 10 Sec  Memory Limit: 128 MB Sub...

2574
来自专栏名山丶深处

springboot集成rabbitmq(实战)

982
来自专栏ml

HDUOJ-----1541 Stars

Stars Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/32768 K (Jav...

2678
来自专栏JAVA同学会

Zookeeper应用之——选举(Election)

请注意,此篇文章并不是介绍Zookeeper集群内部Leader的选举机制,而是应用程序使用Zookeeper作为选举。

542
来自专栏闵开慧

hive启动后相关操作

hadoop@ubuntu118:~$ $HIVE_HOME/bin/hive WARNING: org.apache.hadoop.metri...

3338

扫码关注云+社区