Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.6 HQL:DML数据操纵)(草稿)

第11章 Hive:SQL on Hadoop

11.6 HQL:DML数据操纵

11.6.1 普通表装载数据

在前面我们已经熟悉了一次性向Hive表导入(装载)大量数据命令load data,语法如下:

load data [local] input 'path' [overwrite] into table tableName

说明:

  • 可选项local表示将本地文件装载到Hive表,不加local选项表示将HDFS文件加载到Hive表(实际上将HDFS上的文件移动到指定目录)。
  • overwrite 关键字表示如果 hive 表中存在数据,就会覆盖掉原有的数据。如果省略overwrite,则默认是追加数据。

【例1】装载数据 (1)准备数据

[root@node3 ~]# cat data/user1.txt 
1   qiang
2   root
3   test
[root@node3 ~]# cat data/user2.txt 
4   aa
5   bb
6   cc
[root@node3 ~]# 

上传user2.txt到HDFS

[root@node3 ~]# hdfs dfs -put data/user2.txt input

确认HDFS上的user2.txt文件

[root@node3 ~]# hdfs dfs -ls input
Found 2 items
-rw-r--r--   3 root supergroup    2287354 2017-08-05 05:30 input/mydata.txt
-rw-r--r--   3 root supergroup         15 2017-08-13 05:25 input/user2.txt
[root@node3 ~]# 

(2)装载本地文件

hive> load data local inpath '/root/data/user1.txt' into table users;
Loading data to table default.users
OK
Time taken: 1.461 seconds
hive> select * from users;
OK
1   qiang
2   root
3   test
Time taken: 0.271 seconds, Fetched: 3 row(s)
hive> 

(3)覆盖装载

hive> load data inpath 'input/user2.txt' overwrite into table users;
Loading data to table default.users
OK
Time taken: 0.822 seconds
hive> select * from users;
OK
4   aa
5   bb
6   cc
Time taken: 0.212 seconds, Fetched: 3 row(s)
hive> 

查询HDFS,发现user2.txt文件已经不在了

hive> dfs -ls input;
Found 1 items
-rw-r--r--   3 root supergroup    2287354 2017-08-05 05:30 input/mydata.txt
hive> 

11.6.2 分区表装载数据

准备3个数据文件,内容如下。注意对于时间数据,格式是yyyy-mm-dd

[root@node1 ~]# cat data/emp10
7782,CLARK,MANAGER,7839,1981-06-09,2450,0,10
7839,KING,PRESIDENT,0,1981-11-17,5000,0,10
7934,MILLER,CLERK,7782,1982-01-23,1300,0,10
[root@node1 ~]# cat data/emp20
7369,SMITH,CLERK,7902,1980-12-17,800,0,20
7566,JONES,MANAGER,7839,1981-04-02,2975,0,20
7902,FORD,ANALYST,7566,1981-12-03,3000,0,20
[root@node1 ~]# cat data/emp30
7499,ALLEN,SALESMAN,7698,1981-02-20,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22,1250,500,30
7654,MARTIN,SALESMAN,7698,1981-09-28,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01,2850,0,30
7844,TURNER,SALESMAN,7698,1981-09-08,1500,0,30
7900,JAMES,CLERK,7698,1981-12-03,950,0,30
[root@node1 ~]# 

上传到HDFS

[root@node1 ~]# hdfs dfs -put data/emp10 input
[root@node1 ~]# hdfs dfs -put data/emp20 input
[root@node1 ~]# hdfs dfs -put data/emp30 input

在Hive环境中,有时需要查看HDFS。不需要退出Hive,可以直接在Hive环境中使用dfs命令。

hive> dfs -ls input;
Found 4 items
-rw-r--r--   3 root supergroup        133 2017-08-12 09:03 input/emp10
-rw-r--r--   3 root supergroup        133 2017-08-12 09:03 input/emp20
-rw-r--r--   3 root supergroup        280 2017-08-12 09:04 input/emp30
-rw-r--r--   3 root supergroup    2287354 2017-08-05 05:30 input/mydata.txt
hive> 

导入分区数据

hive> load data inpath 'input/emp10' into table test.emp partition(did=10);
Loading data to table test.emp partition (did=10)
OK
Time taken: 5.912 seconds
hive> load data inpath 'input/emp20' into table test.emp partition(did=20);
Loading data to table test.emp partition (did=20)
OK
Time taken: 3.536 seconds
hive> load data inpath 'input/emp30' into table test.emp partition(did=30);
Loading data to table test.emp partition (did=30)
OK
Time taken: 4.176 seconds
hive> 
hive> select * from test.emp;
OK
7782    CLARK   MANAGER 7839    1981-06-09  2450.0  0.0 10
7839    KING    PRESIDENT   0   1981-11-17  5000.0  0.0 10
7934    MILLER  CLERK   7782    1982-01-23  1300.0  0.0 10
7369    SMITH   CLERK   7902    1980-12-17  800.0   0.0 20
7566    JONES   MANAGER 7839    1981-04-02  2975.0  0.0 20
7902    FORD    ANALYST 7566    1981-12-03  3000.0  0.0 20
7499    ALLEN   SALESMAN    7698    1981-02-20  1600.0  300.0   30
7521    WARD    SALESMAN    7698    1981-02-22  1250.0  500.0   30
7654    MARTIN  SALESMAN    7698    1981-09-28  1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981-05-01  2850.0  0.0 30
7844    TURNER  SALESMAN    7698    1981-09-08  1500.0  0.0 30
7900    JAMES   CLERK   7698    1981-12-03  950.0   0.0 30
Time taken: 1.881 seconds, Fetched: 12 row(s)
hive> 
hive> dfs -ls /user/hive/warehouse/test.db;
Found 2 items
drwxr-xr-x   - root supergroup          0 2017-08-12 09:08 /user/hive/warehouse/test.db/emp
drwxr-xr-x   - root supergroup          0 2017-08-10 09:39 /user/hive/warehouse/test.db/tb1
hive> dfs -ls /user/hive/warehouse/test.db/emp;
Found 3 items
drwxr-xr-x   - root supergroup          0 2017-08-12 09:08 /user/hive/warehouse/test.db/emp/did=10
drwxr-xr-x   - root supergroup          0 2017-08-12 09:08 /user/hive/warehouse/test.db/emp/did=20
drwxr-xr-x   - root supergroup          0 2017-08-12 09:08 /user/hive/warehouse/test.db/emp/did=30
hive> 

11.6.3 复制表

(1)复制表数据,得到普通表(非分区表)

hive> use default;
OK
Time taken: 0.035 seconds
hive> create table emp as select * from test.emp;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170816085512_041f0653-7e3c-4fd2-894c-a364f56a7462
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502887685471_0002, Tracking URL = http://node1:8088/proxy/application_1502887685471_0002/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502887685471_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-08-16 08:55:28,942 Stage-1 map = 0%,  reduce = 0%
2017-08-16 08:55:42,681 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.83 sec
MapReduce Total cumulative CPU time: 1 seconds 830 msec
Ended Job = job_1502887685471_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://cetc/user/hive/warehouse/.hive-staging_hive_2017-08-16_08-55-12_391_2156611965186432614-1/-ext-10002
Moving data to directory hdfs://cetc/user/hive/warehouse/emp
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.83 sec   HDFS Read: 5548 HDFS Write: 658 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 830 msec
OK
Time taken: 32.073 seconds
hive> create table emp as select * from test.emp;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170816085512_041f0653-7e3c-4fd2-894c-a364f56a7462
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502887685471_0002, Tracking URL = http://node1:8088/proxy/application_1502887685471_0002/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502887685471_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-08-16 08:55:28,942 Stage-1 map = 0%,  reduce = 0%
2017-08-16 08:55:42,681 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.83 sec
MapReduce Total cumulative CPU time: 1 seconds 830 msec
Ended Job = job_1502887685471_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://cetc/user/hive/warehouse/.hive-staging_hive_2017-08-16_08-55-12_391_2156611965186432614-1/-ext-10002
Moving data to directory hdfs://cetc/user/hive/warehouse/emp
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.83 sec   HDFS Read: 5548 HDFS Write: 658 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 830 msec
OK
Time taken: 32.073 seconds
hive>

查看副本表emp结构,发现是普通表,非分区表。

hive> desc emp;
OK
eid                     int                                         
ename                   string                                      
job                     string                                      
mgr                     int                                         
hiredate                date                                        
sal                     double                                      
comm                    double                                      
did                     int                                         
Time taken: 0.113 seconds, Fetched: 8 row(s)
hive> select * from emp;
OK
7782    CLARK   MANAGER 7839    1981-06-09  2450.0  0.0 10
7839    KING    PRESIDENT   0   1981-11-17  5000.0  0.0 10
7934    MILLER  CLERK   7782    1982-01-23  1300.0  0.0 10
7369    SMITH   CLERK   7902    1980-12-17  800.0   0.0 20
7566    JONES   MANAGER 7839    1981-04-02  2975.0  0.0 20
7902    FORD    ANALYST 7566    1981-12-03  3000.0  0.0 20
7499    ALLEN   SALESMAN    7698    1981-02-20  1600.0  300.0   30
7521    WARD    SALESMAN    7698    1981-02-22  1250.0  500.0   30
7654    MARTIN  SALESMAN    7698    1981-09-28  1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981-05-01  2850.0  0.0 30
7844    TURNER  SALESMAN    7698    1981-09-08  1500.0  0.0 30
7900    JAMES   CLERK   7698    1981-12-03  950.0   0.0 30
Time taken: 0.226 seconds, Fetched: 12 row(s)
hive> 
hive> 

(2)复制表结构

hive> create table empBak like test.emp;
OK
Time taken: 0.161 seconds
hive> desc test.emp;
OK
eid                     int                                         
ename                   string                                      
job                     string                                      
mgr                     int                                         
hiredate                date                                        
sal                     double                                      
comm                    double                                      
did                     int                                         

# Partition Information      
# col_name              data_type               comment             

did                     int                                         
Time taken: 0.077 seconds, Fetched: 13 row(s)
hive> select * from empBak;
OK
Time taken: 0.203 seconds
hive> 

11.6.4 导出数据

(1)导出到HDFS

hive> insert overwrite directory 'input/emp.bak'
    > select * from test.emp;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170813093308_2bc4a189-000f-490b-8f9c-2451cfc86355
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502614100973_0002, Tracking URL = http://node1:8088/proxy/application_1502614100973_0002/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502614100973_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-08-13 09:33:31,834 Stage-1 map = 0%,  reduce = 0%
2017-08-13 09:33:45,724 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.14 sec
MapReduce Total cumulative CPU time: 2 seconds 140 msec
Ended Job = job_1502614100973_0002
Stage-3 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Stage-4 is filtered out by condition resolver.
Moving data to directory hdfs://cetc/user/root/input/emp.bak/.hive-staging_hive_2017-08-13_09-33-08_728_3968927259689334332-1/-ext-10000
Moving data to directory input/emp.bak
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.14 sec   HDFS Read: 5515 HDFS Write: 590 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 140 msec
OK
Time taken: 39.39 seconds
hive>
hive> dfs -ls input;
Found 2 items
drwxr-xr-x   - root supergroup          0 2017-08-13 09:33 input/emp.bak
-rw-r--r--   3 root supergroup    2287354 2017-08-05 05:30 input/mydata.txt
hive> dfs -ls input/emp.bak;
Found 1 items
-rwxr-xr-x   3 root supergroup        590 2017-08-13 09:33 input/emp.bak/000000_0
hive> dfs -cat input/emp.bak/000000_0;
7782CLARKMANAGER78391981-06-092450.00.010
7839KINGPRESIDENT01981-11-175000.00.010
7934MILLERCLERK77821982-01-231300.00.010
7369SMITHCLERK79021980-12-17800.00.020
7566JONESMANAGER78391981-04-022975.00.020
7902FORDANALYST75661981-12-033000.00.020
7499ALLENSALESMAN76981981-02-201600.0300.030
7521WARDSALESMAN76981981-02-221250.0500.030
7654MARTINSALESMAN76981981-09-281250.01400.030
7698BLAKEMANAGER78391981-05-012850.00.030
7844TURNERSALESMAN76981981-09-081500.00.030
7900JAMESCLERK76981981-12-03950.00.030
hive> 

(2)导出到本地

hive> insert overwrite local directory '/root/data/emp.bak'
    > select * from test.emp;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170813094410_5d7bbe80-9cc3-430a-b1bc-f22d9c8bd990
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502614100973_0003, Tracking URL = http://node1:8088/proxy/application_1502614100973_0003/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502614100973_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-08-13 09:44:25,516 Stage-1 map = 0%,  reduce = 0%
2017-08-13 09:44:36,939 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.01 sec
MapReduce Total cumulative CPU time: 2 seconds 10 msec
Ended Job = job_1502614100973_0003
Moving data to local directory /root/data/emp.bak
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.01 sec   HDFS Read: 5632 HDFS Write: 590 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 10 msec
OK
Time taken: 28.015 seconds
hive> quit;
[root@node3 ~]#
[root@node3 ~]# cat /root/data/emp.bak/
000000_0       .000000_0.crc  
[root@node3 ~]# ls /root/data/emp.bak/
000000_0
[root@node3 ~]# cat /root/data/emp.bak/000000_0
7782CLARKMANAGER78391981-06-092450.00.010
7839KINGPRESIDENT01981-11-175000.00.010
7934MILLERCLERK77821982-01-231300.00.010
7369SMITHCLERK79021980-12-17800.00.020
7566JONESMANAGER78391981-04-022975.00.020
7902FORDANALYST75661981-12-033000.00.020
7499ALLENSALESMAN76981981-02-201600.0300.030
7521WARDSALESMAN76981981-02-221250.0500.030
7654MARTINSALESMAN76981981-09-281250.01400.030
7698BLAKEMANAGER78391981-05-012850.00.030
7844TURNERSALESMAN76981981-09-081500.00.030
7900JAMESCLERK76981981-12-03950.00.030
[root@node3 ~]# 

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏Java Edge

Spring Http源码

/* Copyright 2002-2017 the original author or authors. Licensed under the Apac...

3005
来自专栏练小习的专栏

JS的左右两列自动拉伸对齐

前边发了个后遗症比较多的自动对齐,现在发个JS的高度自动对齐代码。 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 T...

1925
来自专栏算法修养

CodeForces 731A Night at the Museum

A. Night at the Museum time limit per test 1 second memory limit per test ...

2524
来自专栏菩提树下的杨过

Flash/Flex学习笔记(14):制作涂鸦板

关键点:知道如何画线就行了,代码不复杂,直接看吧(从同事一本书上抄的) var drawing:Boolean; Mouse.hide(); //隐藏默认的光...

17810
来自专栏WindCoder

Declarative Programming: Is It A Real Thing?

由于合作方希望能以英文形式发布,故以后top的译文看时间而定,没时间就不再尝试翻译(而且本来水平也不咋地),仅保留原文于此。本次是一篇关于声明式编程的讨论文章,...

371
来自专栏包子铺里聊IT

劳动最光荣!Double Letter Problem from TopCoder

包子IT面试培训 助你拿到理想的offer! 这是TopCoder上面一道比较简单的250分的题目,非常适合作为面试的第一道热身题目,考察了面试者对于基本的数据...

2606
来自专栏数据之美

Hive & Performance 学习笔记

注:本文来源于 Hortonworks 的 Adam Muise 在 July 23 2013 日的 Toronto Hadoop User Group 大会...

2545
来自专栏Golang语言社区

Go中的依赖注入

I have written a small utility package to handle dependency injection in Go (it'...

664
来自专栏计算机视觉与深度学习基础

Leetcode 150 Evaluate Reverse Polish Notation

Evaluate the value of an arithmetic expression in Reverse Polish Notation. Val...

17610
来自专栏熊二哥

Spring.NET的AOP怎么玩

之前公司一直不让使用第三方组件,因此AOP方面的组建一直不能使用,很多面向切面的应用只能通过自己写一些GenericMethod的泛型方法来解决,有一些呆板。由...

1775

扫码关注云+社区