《快学BigData》-Hive总结

Hive 总结 - - - - - - - - - - - - - - - - - - - - - - - - - - - - 262

概述- - - - - - - - - - - - - - - - - - - - - - - - - - - - 263

外部表与内部表的区别- - - - - - - - - - - - - - - - - - - - - 263

特性- - - - - - - - - - - - - - - - - - - - - - - - - - - - 263

安装Hive - - - - - - - - - - - - - - - - - - - - - - - - - - - 264

Hive的初级应用 - - - - - - - - - - - - - - - - - - - - - - - - 271

Hive 高级应用 - - - - - - - - - - - - - - - - - - - - - - - - 283

保存select查询结果的几种方式 - - - - - - - - - - - - - - - - - 302

JOIN的使用- - - - - - - - - - - - - - - - - - - - - - - - - - - 304

Hive 常用函数的使用 - - - - - - - - - - - - - - - - - - - - - - 306

Hive 参数优化 - - - - - - - - - - - - - - - - - - - - - - - - - 307

Hive Shell 参数的使用 - - - - - - - - - - - - - - - - - - - - - 308

Hive Web图形界面安装 - - - - - - - - - - - - - - - - - - - - - 309

Hive与Tez结合 - - - - - - - - - - - - - - - - - - - - - - - - - 307

Beeline 链接Hive - - - - - - - - - - - - - - - - - - - - - - - 314

DBeaver 4.0 链接Hive 工具 - - - - - - - - - - - - - - - - - - 317

Hive 常见错误 - - - - - - - - - - - - - - - - - - - - - - - - - 320

Hive数据导出与导入方案 - - - - - - - - - - - - - - - - - - - - 321

Hive的初级应用

1-1)、插入普通的数据

hive> create table t_test(id int ,name string, password string) row format delimited fields terminated by ',';

OK

Time taken: 0.196 seconds

hive> insert into t_test(id,name,password) values('1','323','sfdf');

Query ID = root_20160924191209_07686e5a-177e-4194-88a5-58536f15698a

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Job running in-process (local Hadoop)

2016-09-24 19:12:13,483 Stage-1 map = 100%, reduce = 0%

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Loading data to table hivetest.t_test

Table hivetest.t_test stats: [numFiles=1, numRows=1, totalSize=11, rawDataSize=10]

MapReduce Jobs Launched:

Stage-Stage-1: HDFS Read: 104 HDFS Write: 186 SUCCESS

Total MapReduce CPU Time Spent: 0 msec

OK

Time taken: 4.957 seconds

hive> select * from t_test;

OK

1323sfdf

Time taken: 0.14 seconds, Fetched: 1 row(s)

[root@hadoop1 bin]# hadoop fs -cat /user/hive/warehouse/hivetest.db/t_test/000000_0

1,323,sfdf

注意:在默认的情况下会把表,数据库看做一个文件夹处理,只有数据才会看做文件,默认的事000000——0如果有多个文件则会一次累加。

查看保存本地的数据库:

1-2)、SQL执行方式

A)、查看帮助

# hive -help

log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.

usage: hive

-d,--define Variable subsitution to apply to hive

commands. e.g. -d A=B or --define A=B

--database Specify the database to use

-e SQL from command line

-f SQL from files

-H,--help Print help information

--hiveconf

Use value for given property

--hivevar Variable subsitution to apply to hive

commands. e.g. --hivevar A=B

-i Initialization SQL file

-S,--silent Silent mode in interactive shell

-v,--verbose Verbose mode (echo executed SQL to the

console)

B)、hive -e命令的使用

[root@hadoop3 hive]# hive -e "show databases"

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/spark-1.3.1-bin-hadoop2.6/lib/spark-assembly-1.3.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/spark-1.3.1-bin-hadoop2.6/lib/spark-assembly-1.3.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]

Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties

OK

default

hivetest

Time taken: 2.623 seconds, Fetched: 2 row(s)

C)、静音模式

[root@hadoop3 hive]# hive-S-e "show databases"

在静音的模式下不会显示mapreduce的操作过程

D)、附加文件的形式

# vi show-databases.sql

show databases;

# hive -S -f show-databases.sql

log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.

Default

E)、env命名空间可作为向Hive传递变量的一个可选的方式

# export TRAINSVAR=trains_var

# hive -S -e "select * from $ limit 10"

****************************

1-3)、外部表数据(使用hdfs上传数据)

准备数据:

[root@hadoop1 /]# vi /test.bat

1,123,ffgg

2,sfgf,eferg

3,dvf,dvrfg

A)、查看表的字段信息

hive> desc t_test;

OK

idint

namestring

passwordstring

Time taken: 0.21 seconds, Fetched: 3 row(s)

B)、查看执行结果

[root@hadoop1/]# hadoop fs -put test.bat /user/hive/warehouse/hivetest.db/t_test

hive> select * from t_test;

OK

1323sfdf

1123ffgg

2sfgfeferg

3dvfdvrfg

Time taken: 0.125 seconds, Fetched: 4 row(s)

[root@hadoop1/]# hadoop fs -cat /user/hive/warehouse/hivetest.db/t_test/test.bat

1,123,ffgg

2,sfgf,eferg

3,dvf,dvrfg

注意:要上传的数据后缀必须是任意格式的,如果字段的类型与数据库的类型不一致,则会当做NULL来处理,以及多的字段会自动去除,不足的字段也会用NULL来补齐

1-4)、内部表数据(使用hive的命令导入与导出数据)

A)、查看表的结构

hive> desc t_test;

OK

idint

namestring

passwordstring

Time taken: 0.21 seconds, Fetched: 3 row(s)

B)、准备数据

[root@hadoop1 /]# vi t_test.bat

4,dsfef,fgrg

8,drfg,fgrg

9,dgr,rgr

10,dgfrg,rgr

11,dgr,rgrgr

12,dsfe,sfe

C)、本地和HDFS把数据导入到Hive中1-1)、上传之前

[root@hadoop1testData]# ls

test.bat t_test.bat

hive> load data local inpath '/usr/local/hive/testData/t_test.bat' into table t_test;

Loading data to table hivetest.t_test

Table hivetest.t_test stats: [numFiles=3, numRows=0, totalSize=120, rawDataSize=0]

OK

Time taken: 1.059 seconds

1-2)、上传之后

[root@hadoop3 testData]# ls

test.bat t_test.bat

1-3)、HDFS上传

[root@hadoop3 testData]# ls

test.bat t_test.bat t_hadoop.bat

[root@hadoop3 testData]# hadoop fs -put t_hadoop.bat /user

HDFS上传到表:

hive> load data inpath '/user/t_hadoop.bat' into table t_test;

Loading data to table hivetest.t_test

Table hivetest.t_test stats: [numFiles=4, numRows=0, totalSize=182, rawDataSize=0]

OK

Time taken: 0.728 seconds

查看HDFS上的文件:

[root@hadoop3 testData]# hadoop fs -ls /user

Found 2 items

drwxr-xr-x - root supergroup 0 2016-09-24 00:46 /user/hive

drwxr-xr-x - root supergroup 0 2016-09-02 05:25 /user/root

注意:在使用本地上传数据时,会先把数据临时复制一份到HDFS上,目录是有上传的目录一致,等到上传完再把数据移动到指定的目录。在HDFS中则是直接移动文件。其中本地上传需要在inpath前加local关键字

D)、把数据导出的本地

[root@hadoop1 testData]# ls

[root@hadoop1 testData]#

hive>insert overwrite local directory '/usr/local/hive/testData/t_test.text' row format delimited fields terminated by '\t' select * from t_test;

Query ID = root_20160924201517_fda8eced-7f74-4aa1-9b1f-cc6fdc153064

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there's no reduce operator

Job running in-process (local Hadoop)

2016-09-24 20:15:19,324 Stage-1 map = 100%, reduce = 0%

Copying data to local directory /usr/local/hive/testData/t_test.text

Copying data to local directory /usr/local/hive/testData/t_test.text

MapReduce Jobs Launched:

Stage-Stage-1: HDFS Read: 1274 HDFS Write: 0 SUCCESS

Total MapReduce CPU Time Spent: 0 msec

OK

Time taken: 1.692 seconds

[root@hadoop1 testData]# cd t_test.text/

[root@hadoop1 t_test.text]# ls

000000_0

[root@hadoop3 t_test.text]# cat 000000_0

1323sfdf

345dffefer

78efefeefefeg

1323fegfeefegeg

3545dfdgdgfg

4dsfeffgrg

8drfgfgrg

9dgrrgr

10dgfrgrgr

11dgrrgrgr

12dsfesfe

1123ffgg

2sfgfeferg

3dvfdvrfg

E)、把数据导出到HDFS

[root@hadoop1 testData]# ls

[root@hadoop1 testData]#

hive>insert overwrite directory '/usr/local/hive/testData/t_test.text' row format delimited fields terminated by '\t' select * from t_test;

F)、导出到另外一张表中

[root@hadoop1 testData]# insert table t_test_table select * from t_test;

1-5)、HIVE加载SQL文件的形式执行脚本

A)、写配置文件

[root@hadoop1 testDate]# vi common.property

set hivevar:db.default=default;

set hivevar:db.allinfo=allinfo;

set hivevar:db.xiaoxu=xiaoxu;

set hivevar:hdfs.url=hdfs://hadoop1:9000;

Hivevar:是在内部文件中传参的设置

B)、写SQL文件

[root@hadoop1 testDate]# vitextTable.sql

--设置MR的task的个数

--设置HDFS的路径

set hdfs.url=$;

--设置获取配置文件的初始化数据库的信息

--创建数据库

CREATE DATABASE IF NOT EXISTS $;

CREATE DATABASE IF NOT EXISTS $;

--创建表结构

CREATE TABLE IF NOT EXISTS $.textTable(

idbigint,

namestring,

ageint

)

--执行逻辑

CREATE TABLE IF NOT EXISTS $.textTable AS

SELECT ps.id,ps.name,ps.age FROM $.person ps;

--使用内存保存数据

WITH memoryData1 AS ( SELECT ps.id,ps.name,ps.age FROM $.textTable ps WHERE age='23'),

memoryData2 AS ( SELECT ps.id,ps.name,ps.age FROM $.textTable ps WHERE age='24' )

SELECT * FROM memoryData1 UNION ALL SELECT * FROM memoryData2;

Hiveconf:是接受内部参数的,与hivevar相配合使用,其中WITH关键字是使用是吧加载的数据放到内存中,建议加载到内存中的数据不要过大。

C)、测试结果

[root@hadoop1 testDate]# hive -icommon.property -f textTable .sql

Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties

OK

Time taken: 1.642 seconds

OK

Time taken: 0.045 seconds

OK

Time taken: 0.247 seconds

Query ID = root_20170310190113_12f2d1a9-6f30-4991-b09a-11f2cb82b043

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there's no reduce operator

Job running in-process (local Hadoop)

2017-03-10 19:01:19,218 Stage-1 map = 100%, reduce = 0%

MapReduce Jobs Launched:

Stage-Stage-1: HDFS Read: 130 HDFS Write: 0 SUCCESS

Total MapReduce CPU Time Spent: 0 msec

OK

1xiaozhang23

2xiaowang24

9daye24

Time taken: 6.281 seconds, Fetched: 3 row(s)

以上可以看出执行的过程,map阶段以及reduce阶段的执行情况,其中hive -i是加载配置文件-f是加载SQL文件,详细的请使用hive -help命令查看

[root@hadoop1 testDate]# hive -help

usage: hive

-d,--define Variable subsitution to apply to hive

commands. e.g. -d A=B or --define A=B

--database Specify the database to use

-e SQL from command line

-f SQL from files

-H,--help Print help information

--hiveconf

Use value for given property

--hivevar Variable subsitution to apply to hive

commands. e.g. --hivevar A=B

-i Initialization SQL file

-S,--silent Silent mode in interactive shell

-v,--verbose Verbose mode (echo executed SQL to the

console)

1-6)、查看创建表的属性信息

hive> show create table result;

OK

CREATE TABLE `result`(

`id` bigint,

`name` string,

`age` int)

ROW FORMAT SERDE

STORED AS INPUTFORMAT

OUTPUTFORMAT

LOCATION

'hdfs://hadoop1:9000/user/hive/warehouse/xiaoxu.db/result'

TBLPROPERTIES (

'COLUMN_STATS_ACCURATE'='true',

'numFiles'='1',

'numRows'='10',

'rawDataSize'='120',

'totalSize'='130',

'transient_lastDdlTime'='1489208970')

Time taken: 0.287 seconds, Fetched: 19 row(s)

在以上可以看出result表的结构信息,以及HDFS保存的路径信息。

1-7)、Hive命令外部传参

A)、传一个参数实例

[root@hadoop1 testDate]# cat hiveVar.sql

set ouput.date=$;

select * from $.person where age=$

[root@hadoop1 testDate]# hive -i common.property --hivevar age='23' -f hiveVar.sql

****************************

OK

Time taken: 9.516 seconds

OK

1xiaozhang23

Time taken: 1.028 seconds, Fetched: 1 row(s)

B)、带多个参数的实例

[root@hadoop1 testDate]# vi hiveVar.sql

set ouput.date=$;

select * from $.person WHERE AGE BETWEEN $ AND age

[root@hadoop1 testDate]# hive -i common.property --hivevar age='10' --hivevar age1='25' -f hiveVar.sql

Time taken: 7.524 seconds

OK

1xiaozhang23

2xiaowang24

3xiaoli25

4xiaoxiao26

5xiaoxiao27

6xiaolizi39

7xiaodaye10

8dageda12

9daye24

10dada25

1-8)、Insert into与Insert overwrite的区别

A)、查看元数据

hive> select * from default.person;

OK

1xiaozhang23

2xiaowang24

3xiaoli25

4xiaoxiao26

5xiaoxiao27

6xiaolizi39

7xiaodaye10

8dageda12

9daye24

10dada25

B)、overwrite插入数据

hive> insert overwrite table default.hiveVar select * from default.person;

hive> select * from default.hiveVar;

OK

1xiaozhang23

2xiaowang24

3xiaoli25

4xiaoxiao26

5xiaoxiao27

6xiaolizi39

7xiaodaye10

8dageda12

9daye24

10dada25

执行上面的insert多次后,查询数据还是保持原来的数据个数,可以看出overwrite是把重复的数据给覆盖掉了。

C)、into插入数据

hive> insert into table default.hiveVar select * from default.person;

hive> select * from default.hiveVar;

OK

1xiaozhang23

2xiaowang24

3xiaoli25

4xiaoxiao26

5xiaoxiao27

6xiaolizi39

7xiaodaye10

8dageda12

9daye24

10dada25

1xiaozhang23

2xiaowang24

3xiaoli25

4xiaoxiao26

5xiaoxiao27

6xiaolizi39

7xiaodaye10

8dageda12

9daye24

10dada25

执行上面的into多次后,查询数据是原来的数据个数的插入的次数倍。可以看出into是在尾部追加数据之前的数据还再保留。

D)、两者的区别

insert overwrite会覆盖已经存在的数据,我们假设要插入的数据和已经存在的N条数据一样,那么插入后只会保留一条数据;

insert into只是简单的copy插入,不做重复性校验,如果插入前有N条数据和要插入的数据一样,那么插入后会有N+1条数据;

1-9)、在当前Hive模式下查看HDFS中文件的信息

hive> dfs -ls /;

Found 14 items

drwx-wx-wx - root supergroup 0 2016-11-05 02:58 /tmp

drwxr-xr-x - root supergroup 0 2016-11-05 03:14 /user

drwxr-xr-x - root supergroup 0 2016-11-06 08:02 /usr

hive> dfs -ls /user/;

Found 1 items

drwxr-xr-x - root supergroup 0 2016-11-05 03:14 /user/hive

  • 发表于:
  • 原文链接http://kuaibao.qq.com/s/20180329G1F2KV00?refer=cp_1026
  • 腾讯「云+社区」是腾讯内容开放平台帐号(企鹅号)传播渠道之一,根据《腾讯内容开放平台服务协议》转载发布内容。
  • 如有侵权,请联系 yunjia_community@tencent.com 删除。

扫码关注云+社区

领取腾讯云代金券