这里分类和汇总了欣宸的全部原创(含配套源码):https://github.com/zq2599/blog_demos
本文是《hive学习笔记》系列的第四篇,要学习的是hive的分区表,简单来说hive的分区就是创建层级目录的一种方式,处于同一分区的记录其实就是数据在同一个子目录下,分区一共有两种:静态和动态,接下来逐一尝试;
先尝试用单个字段分区,t9表有三个字段:名称city、年龄age、城市city,以城市作为分区字段:
create table t9 (name string, age int)
partitioned by (city string)
row format delimited
fields terminated by ',';
hive> desc t9;
OK
name string
age int
city string
# Partition Information
# col_name data_type comment
city string
Time taken: 0.159 seconds, Fetched: 8 row(s)
tom,11
jerry,12
load data
local inpath '/home/hadoop/temp/202010/25/009.txt'
into table t9
partition(city='shenzhen');
load data
local inpath '/home/hadoop/temp/202010/25/009.txt'
into table t9
partition(city='guangzhou');
hive> select * from t9;
OK
t9.name t9.age t9.city
tom 11 guangzhou
jerry 12 guangzhou
tom 11 shenzhen
jerry 12 shenzhen
Time taken: 0.104 seconds, Fetched: 4 row(s)
[hadoop@node0 bin]$ ./hadoop fs -ls /user/hive/warehouse/t9/city=guangzhou
Found 1 items
-rwxr-xr-x 3 hadoop supergroup 16 2020-10-31 16:47 /user/hive/warehouse/t9/city=guangzhou/009.txt
[hadoop@node0 bin]$ ./hadoop fs -cat /user/hive/warehouse/t9/city=guangzhou/009.txt
tom,11
jerry,12
[hadoop@node0 bin]$
以上就是以单个字段做静态分区的实践,接下来尝试多字段分区;
create table t10 (name string, age int)
partitioned by (province string, city string)
row format delimited
fields terminated by ',';
load data
local inpath '/home/hadoop/temp/202010/25/009.txt'
into table t10
partition(province='shanxi', city='xian');
load data
local inpath '/home/hadoop/temp/202010/25/009.txt'
into table t10
partition(province='shanxi', city='hanzhong');
load data
local inpath '/home/hadoop/temp/202010/25/009.txt'
into table t10
partition(province='guangdong', city='guangzhou');
load data
local inpath '/home/hadoop/temp/202010/25/009.txt'
into table t10
partition(province='guangdong', city='shenzhen');
hive> select * from t10;
OK
t10.name t10.age t10.province t10.city
tom 11 guangdong guangzhou
jerry 12 guangdong guangzhou
tom 11 guangdong shenzhen
jerry 12 guangdong shenzhen
tom 11 shanxi hanzhong
jerry 12 shanxi hanzhong
tom 11 shanxi xian
jerry 12 shanxi xian
Time taken: 0.129 seconds, Fetched: 8 row(s)
[hadoop@node0 bin]$ ./hadoop fs -cat /user/hive/warehouse/t10/province=shanxi/city=hanzhong/009.txt
tom,11
jerry,12
set hive.exec.dynamic.partition=true
set hive.exec.dynamic.partition.mode=nostrict;
create external table t11 (name string, age int, province string, city string)
row format delimited
fields terminated by ','
location '/data/external_t11';
tom,11,guangdong,guangzhou
jerry,12,guangdong,shenzhen
tony,13,shanxi,xian
john,14,shanxi,hanzhong
load data
local inpath '/home/hadoop/temp/202010/25/011.txt'
into table t11;
create table t12 (name string, age int)
partitioned by (province string, city string)
row format delimited
fields terminated by ',';
insert overwrite table t12
partition(province, city)
select name, age, province, city from t11;
[hadoop@node0 bin]$ ./hadoop fs -cat /user/hive/warehouse/t12/province=guangdong/city=guangzhou/000000_0
tom,11
至此,分区表的学习就完成了,希望能给您一些参考;
如果您不想自己搭建kubernetes环境,推荐使用腾讯云容器服务TKE:无需自建,即可在腾讯云上使用稳定, 安全,高效,灵活扩展的 Kubernetes 容器平台;
如果您希望自己的镜像可以通过外网上传和下载,推荐腾讯云容器镜像服务TCR:像数据加密存储,大镜像多节点快速分发,跨地域镜像同步
微信搜索「程序员欣宸」,我是欣宸,期待与您一同畅游Java世界...
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。