这里分类和汇总了欣宸的全部原创(含配套源码):https://github.com/zq2599/blog_demos
本文是《hive学习笔记》系列的第四篇,要学习的是hive的分区表,简单来说hive的分区就是创建层级目录的一种方式,处于同一分区的记录其实就是数据在同一个子目录下,分区一共有两种:静态和动态,接下来逐一尝试;
先尝试用单个字段分区,t9表有三个字段:名称city、年龄age、城市city,以城市作为分区字段:
create table t9 (name string, age int)
partitioned by (city string)
row format delimited
fields terminated by ',';
hive> desc t9;
OK
name string
age int
city string
# Partition Information
# col_name data_type comment
city string
Time taken: 0.159 seconds, Fetched: 8 row(s)
tom,11
jerry,12
load data
local inpath '/home/hadoop/temp/202010/25/009.txt'
into table t9
partition(city='shenzhen');
load data
local inpath '/home/hadoop/temp/202010/25/009.txt'
into table t9
partition(city='guangzhou');
hive> select * from t9;
OK
t9.name t9.age t9.city
tom 11 guangzhou
jerry 12 guangzhou
tom 11 shenzhen
jerry 12 shenzhen
Time taken: 0.104 seconds, Fetched: 4 row(s)
[hadoop@node0 bin]$ ./hadoop fs -ls /user/hive/warehouse/t9/city=guangzhou
Found 1 items
-rwxr-xr-x 3 hadoop supergroup 16 2020-10-31 16:47 /user/hive/warehouse/t9/city=guangzhou/009.txt
[hadoop@node0 bin]$ ./hadoop fs -cat /user/hive/warehouse/t9/city=guangzhou/009.txt
tom,11
jerry,12
[hadoop@node0 bin]$
以上就是以单个字段做静态分区的实践,接下来尝试多字段分区;
create table t10 (name string, age int)
partitioned by (province string, city string)
row format delimited
fields terminated by ',';
load data
local inpath '/home/hadoop/temp/202010/25/009.txt'
into table t10
partition(province='shanxi', city='xian');
load data
local inpath '/home/hadoop/temp/202010/25/009.txt'
into table t10
partition(province='shanxi', city='hanzhong');
load data
local inpath '/home/hadoop/temp/202010/25/009.txt'
into table t10
partition(province='guangdong', city='guangzhou');
load data
local inpath '/home/hadoop/temp/202010/25/009.txt'
into table t10
partition(province='guangdong', city='shenzhen');
hive> select * from t10;
OK
t10.name t10.age t10.province t10.city
tom 11 guangdong guangzhou
jerry 12 guangdong guangzhou
tom 11 guangdong shenzhen
jerry 12 guangdong shenzhen
tom 11 shanxi hanzhong
jerry 12 shanxi hanzhong
tom 11 shanxi xian
jerry 12 shanxi xian
Time taken: 0.129 seconds, Fetched: 8 row(s)
[hadoop@node0 bin]$ ./hadoop fs -cat /user/hive/warehouse/t10/province=shanxi/city=hanzhong/009.txt
tom,11
jerry,12
set hive.exec.dynamic.partition=true
set hive.exec.dynamic.partition.mode=nostrict;
create external table t11 (name string, age int, province string, city string)
row format delimited
fields terminated by ','
location '/data/external_t11';
tom,11,guangdong,guangzhou
jerry,12,guangdong,shenzhen
tony,13,shanxi,xian
john,14,shanxi,hanzhong
load data
local inpath '/home/hadoop/temp/202010/25/011.txt'
into table t11;
create table t12 (name string, age int)
partitioned by (province string, city string)
row format delimited
fields terminated by ',';
insert overwrite table t12
partition(province, city)
select name, age, province, city from t11;
[hadoop@node0 bin]$ ./hadoop fs -cat /user/hive/warehouse/t12/province=guangdong/city=guangzhou/000000_0
tom,11
至此,分区表的学习就完成了,希望能给您一些参考;