Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.3 Hive 快速入门)

第11章 Hive:SQL on Hadoop

11.3 Hive快速入门

11.3.1 HQL介绍

Hive查询语言(Hive QL,可以简称HQL)的语法和SQL非常相似,实际上HQL基本上实现了SQL-92标准,并做了一些扩展。但是仍然存在一些差异,比如不支持行级操作,不支持事务处理。HQL更接近MySQL的SQL方言,对于熟悉SQL语言的开发者而言,HQL很容易上手。

11.3.2 进入Hive Shell

Hive Shell即是Hive的CLI交互模式,输入hive回车即可进入。

[root@node3 ~]# hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/hive-2.1.1/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>

输入“quit;”即可退出HiveShell

hive> quit;
[root@node3 ~]# 

11.3.3 Hive数据库

(1)显示数据库实例 Hive数据库本质上就是表的目录,或者说命名空间。如果创建数据表没有指定数据库,则使用默认数据库default。与MySQL一样,可以通过show databases命令查看已经存在的数据库。

hive> show databases;
OK
default
Time taken: 1.451 seconds, Fetched: 1 row(s)
hive> 

(2)创建数据库 使用create database语句创建数据库。

hive> create database test;
OK
Time taken: 0.414 seconds
hive> show databases;
OK
default
test
Time taken: 0.021 seconds, Fetched: 2 row(s)
hive> 

(3)查看数据库结构 describe 命令(可以简写desc)来查看数据库定义,包括:数据库名称、数据库在 HDFS 目录、HDFS

hive> desc database test;
OK
test        hdfs://cetc/user/hive/warehouse/test.db root    USER    
Time taken: 0.223 seconds, Fetched: 1 row(s)
hive> 

Hive会为每个创建的数据库在HDFS上创建一个目录,该数据库的表会以子目录形式存储,表中的数据会以表目录下的文件形式存储。对于default数据库,默认的缺省数据库没有自己的目录,default数据库下的表默认存放在/user/hive/warehouse目录下。

(4)切换当前数据库

hive> use test;
OK
Time taken: 1.217 seconds
hive>

11.3.4 创建表

在默认default数据库下创建表users

hive> create table users(id int,name char(20));
OK
Time taken: 0.481 seconds
hive>

在test数据库下创建表tb1

hive> create table test.tb1(
    > id int,
    > name string,
    > price double
    > );
OK
Time taken: 2.044 seconds
hive> 

11.3.4 查看表

(1)显示default数据库下的所有表

hive> show tables;
OK
users
Time taken: 0.231 seconds, Fetched: 1 row(s)
hive> 

(2)查看表结构

hive> desc users;
OK
id                      int                                         
name                    char(20)                                    
Time taken: 0.171 seconds, Fetched: 2 row(s)
hive> 

(3)查看test数据库下的数据表

hive> use test;
OK
Time taken: 0.037 seconds
hive> show tables;
OK
tb1
Time taken: 0.036 seconds, Fetched: 1 row(s)
hive>

也可以通过下面命令实现上面功能

hive> show tables in test;
OK
tb1
Time taken: 0.04 seconds, Fetched: 1 row(s)
hive>  

查看tb1表结构

hive> desc tb1;
OK
id                      int                                         
name                    string                                      
price                   double                                      
Time taken: 0.138 seconds, Fetched: 3 row(s)
hive> 

11.3.5 插入数据

(1)显式字段插入

 hive> insert into users(id,name) values(1,'aa');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170810094706_d75cb718-9742-4b25-864c-44a63c27d064
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502371586068_0001, Tracking URL = http://node1:8088/proxy/application_1502371586068_0001/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502371586068_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-08-10 09:47:34,602 Stage-1 map = 0%,  reduce = 0%
2017-08-10 09:47:50,211 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.36 sec
MapReduce Total cumulative CPU time: 2 seconds 360 msec
Ended Job = job_1502371586068_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://cetc/user/hive/warehouse/users/.hive-staging_hive_2017-08-10_09-47-06_012_378591815767107640-1/-ext-10000
Loading data to table default.users
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.36 sec   HDFS Read: 4225 HDFS Write: 92 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 360 msec
OK
Time taken: 46.216 seconds
hive> 

(2)隐式字段插入

hive> insert into users values(2,'bb');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170810095011_cf3ab969-c1bb-459b-a1eb-f9b411ec0614
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502371586068_0002, Tracking URL = http://node1:8088/proxy/application_1502371586068_0002/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502371586068_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-08-10 09:50:28,186 Stage-1 map = 0%,  reduce = 0%
2017-08-10 09:50:41,103 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.34 sec
MapReduce Total cumulative CPU time: 2 seconds 340 msec
Ended Job = job_1502371586068_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://cetc/user/hive/warehouse/users/.hive-staging_hive_2017-08-10_09-50-11_441_8940366110116441398-1/-ext-10000
Loading data to table default.users
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.34 sec   HDFS Read: 4228 HDFS Write: 92 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 340 msec
OK
Time taken: 32.393 seconds
hive> 

(3)插入多条记录

hive> insert into users values(3,'cc'),(4,'dd'),(5,'ee');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170810095507_48fc0ce3-7789-4427-b35c-3809586f208f
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502371586068_0003, Tracking URL = http://node1:8088/proxy/application_1502371586068_0003/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502371586068_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-08-10 09:55:24,363 Stage-1 map = 0%,  reduce = 0%
2017-08-10 09:55:36,294 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.35 sec
MapReduce Total cumulative CPU time: 2 seconds 350 msec
Ended Job = job_1502371586068_0003
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://cetc/user/hive/warehouse/users/.hive-staging_hive_2017-08-10_09-55-07_839_6570192849374755881-1/-ext-10000
Loading data to table default.users
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.35 sec   HDFS Read: 4238 HDFS Write: 138 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 350 msec
OK
Time taken: 31.315 seconds
hive>

可见,Hive每次向表中插入一条记录,都会转换为MapReduce程序,效率低下。

11.3.6 简单查询

(1)字段查询

hive> select id,name from users;
OK
1   aa                  
2   bb                  
3   cc                  
4   dd                  
5   ee                  
Time taken: 0.224 seconds, Fetched: 5 row(s)
hive> 

(2)条件查看

hive> select * from users where id=1;
OK
1   aa                  
Time taken: 0.399 seconds, Fetched: 1 row(s)
hive> select name from users where name like '%a';
OK
aa                  
Time taken: 0.214 seconds, Fetched: 1 row(s)
hive>

(3)统计查询

hive> select count(*) from users;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170810095858_fa356df4-8b18-4a5b-835a-643427dd99f6
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1502371586068_0004, Tracking URL = http://node1:8088/proxy/application_1502371586068_0004/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502371586068_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-08-10 09:59:21,494 Stage-1 map = 0%,  reduce = 0%
2017-08-10 09:59:31,484 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.02 sec
2017-08-10 09:59:43,424 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.11 sec
MapReduce Total cumulative CPU time: 4 seconds 110 msec
Ended Job = job_1502371586068_0004
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.11 sec   HDFS Read: 8068 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 110 msec
OK
5
Time taken: 46.061 seconds, Fetched: 1 row(s)
hive> 

11.3.7 不支持删除修改记录

hive> delete from users where id=1;
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
hive> update users set name='xx' where id=1;
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
hive>

表中数据不支持删除修改操作,但是可以截断表和删除整张表。

hive> truncate table users;
OK
Time taken: 0.273 seconds
hive> select * from users;
OK
Time taken: 0.243 seconds
hive> 
hive> drop table users;
OK
Time taken: 2.025 seconds
hive>

$(function () { $('pre.prettyprint code').each(function () { var lines = $(this).text().split('\n').length; var $numbering = $('<ul/>').addClass('pre-numbering').hide(); $(this).addClass('has-numbering').parent().append($numbering); for (i = 1; i <= lines; i++) { $numbering.append($('<li/>').text(i)); }; $numbering.fadeIn(1700); }); });

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏闵开慧

Warning: $HADOOP_HOME is deprecated.

添加   export HADOOP_HOME_WARN_SUPPRESS=TRUE 到 hadoop-env.sh  中 注意要添加到集群中每一个节点中。 版...

3018
来自专栏有困难要上,没有困难创造困难也要上!

Kettle 7.1 连接HBase数据表

在Tools -> Hadoop Distribution 中选择 “HortonWorks HDP 2.5.x”。

682
来自专栏cloudskyme

跟我一起hadoop(1)-hadoop2.6安装与使用

伪分布式 hadoop的三种安装方式: Local (Standalone) Mode Pseudo-Distributed Mode Fully-Distri...

3957
来自专栏Hadoop实操

Cloudera Labs中的Phoenix

Cloudera Labs在2016-06-27宣布打包了Apache Phoenix项目,版本为4.7.0,并基于CDH5.7.0。安装依旧是大家熟悉的Par...

4999
来自专栏Hadoop实操

Hive多分隔符支持示例

如何将上述事例数据加载到Hive表(multi_delimiter_test)中,表结构如下:

61212
来自专栏Hadoop实操

Yarn的JobHistory目录权限问题导致MapReduce作业异常

0: jdbc:hive2://localhost:10000>select count(*) from student;

6747
来自专栏数据之美

Hive & Performance 学习笔记

注:本文来源于 Hortonworks 的 Adam Muise 在 July 23 2013 日的 Toronto Hadoop User Group 大会...

2555
来自专栏别先生

Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Wi

1、window操作系统的eclipse运行wordcount程序出现如下所示的错误: Exception in thread "main" java.lang...

3779
来自专栏乐沙弥的世界

基于mysqldump搭建gtid主从

有关知识点参考: 配置MySQL GTID 主从复制 基于mysqldump快速搭建从库 使用mysqldump导出数据库

540
来自专栏积累沉淀

Hive与HBase实现数据互导

建立与HBase的识别表 hive> create table hive_hbase_1(key int,value string)     > stored...

1918

扫码关注云+社区