这里分类和汇总了欣宸的全部原创(含配套源码):https://github.com/zq2599/blog_demos
create table address (addressid int, province string, city string)
row format delimited
fields terminated by ',';
1,guangdong,guangzhou
2,guangdong,shenzhen
3,shanxi,xian
4,shanxi,hanzhong
6,jiangshu,nanjing
load data
local inpath '/home/hadoop/temp/202010/25/address.txt'
into table address;
create table student (name string, age int, addressid int)
row format delimited
fields terminated by ',';
tom,11,1
jerry,12,2
mike,13,3
john,14,4
mary,15,5
load data
local inpath '/home/hadoop/temp/202010/25/student.txt'
into table student;
hive> select * from address;
OK
1 guangdong guangzhou
2 guangdong shenzhen
3 shanxi xian
4 shanxi hanzhong
6 jiangshu nanjing
Time taken: 0.043 seconds, Fetched: 5 row(s)
hive> select * from student;
OK
tom 11 1
jerry 12 2
mike 13 3
john 14 4
mary 15 5
Time taken: 0.068 seconds, Fetched: 5 row(s)
最普通的带条件查询:
hive> select * from address where city like '%a%';
OK
1 guangdong guangzhou
3 shanxi xian
4 shanxi hanzhong
6 jiangshu nanjing
Time taken: 0.128 seconds, Fetched: 4 row(s)
select province, count(*) from address group by province;
该查询会触发MR计算,结果如下:
...
Total MapReduce CPU Time Spent: 1 seconds 910 msec
OK
guangdong 2
jiangshu 1
shanxi 2
Time taken: 17.847 seconds, Fetched: 3 row(s)
select t.province, count(*) from (
select * from address where city like '%a%'
) t
group by t.province;
结果如下:
Total MapReduce CPU Time Spent: 1 seconds 760 msec
OK
guangdong 1
jiangshu 1
shanxi 2
Time taken: 18.036 seconds, Fetched: 3 row(s)
select t1.* from (
select t.province, count(*) as cnt from (
select * from address where city like '%a%'
) t
group by t.province) t1
where t1.cnt>1;
结果如下,可见只有shanxi被显示了:
Total MapReduce CPU Time Spent: 2 seconds 250 msec
OK
shanxi 2
Time taken: 20.067 seconds, Fetched: 1 row(s)
select t.province, count(*) as cnt from (
select * from address where city like '%a%'
) t
group by t.province having cnt>1;
select t.province, count(*) as cnt from (
select * from address where city like '%a%'
) t
group by t.province order by cnt;
会触发MR,结果如下:
Total MapReduce CPU Time Spent: 3 seconds 50 msec
OK
jiangshu 1
guangdong 1
shanxi 2
Time taken: 40.315 seconds, Fetched: 3 row(s)
select
s.name, s.age,
a.province, a.city
from
student s
inner join
address a
on
s.addressid=a.addressid;
结果如下:
Total MapReduce CPU Time Spent: 1 seconds 20 msec
OK
tom 11 guangdong guangzhou
jerry 12 guangdong shenzhen
mike 13 shanxi xian
john 14 shanxi hanzhong
Time taken: 17.294 seconds, Fetched: 4 row(s)
select name, age, province, city from student natural join address;
结果如下,可见不会根据student表的addressid字段值去address查找记录,而是将addrerss的记录全部连接一次:
Total MapReduce CPU Time Spent: 940 msec
OK
tom 11 guangdong guangzhou
jerry 12 guangdong guangzhou
mike 13 guangdong guangzhou
john 14 guangdong guangzhou
mary 15 guangdong guangzhou
tom 11 guangdong shenzhen
jerry 12 guangdong shenzhen
mike 13 guangdong shenzhen
john 14 guangdong shenzhen
mary 15 guangdong shenzhen
tom 11 shanxi xian
jerry 12 shanxi xian
mike 13 shanxi xian
john 14 shanxi xian
mary 15 shanxi xian
tom 11 shanxi hanzhong
jerry 12 shanxi hanzhong
mike 13 shanxi hanzhong
john 14 shanxi hanzhong
mary 15 shanxi hanzhong
tom 11 jiangshu nanjing
jerry 12 jiangshu nanjing
mike 13 jiangshu nanjing
john 14 jiangshu nanjing
mary 15 jiangshu nanjing
Time taken: 18.525 seconds, Fetched: 25 row(s)
select
s.name, s.age, s.addressid,
a.province, a.city
from
student s
left outer join
address a
on
s.addressid=a.addressid;
结果如下,可见name=mary的记录,addressid等于5,在address中不存在addressid等于5的记录,因此province和city字段都展示了NULL,而在前面使用inner join时,结果中没有这条记录:
Total MapReduce CPU Time Spent: 950 msec
OK
tom 11 1 guangdong guangzhou
jerry 12 2 guangdong shenzhen
mike 13 3 shanxi xian
john 14 4 shanxi hanzhong
mary 15 5 NULL NULL
Time taken: 18.442 seconds, Fetched: 5 row(s)
和左连接类似,只不过是以右表为主,语法是right outer join:
select
s.name, s.age, s.addressid,
a.province, a.city
from
student s
right outer join
address a
on
s.addressid=a.addressid;
结果如下,可见city=nanjing的记录,在student表中没有一条记录与之关联,因此结果中展示了address的字段,而student的字段为NULL:
Total MapReduce CPU Time Spent: 970 msec
OK
tom 11 1 guangdong guangzhou
jerry 12 2 guangdong shenzhen
mike 13 3 shanxi xian
john 14 4 shanxi hanzhong
NULL NULL NULL jiangshu nanjing
Time taken: 18.294 seconds, Fetched: 5 row(s)
查询结果等于左外连接和右外连接之和,语法是full outer join:
select
s.name, s.age, s.addressid,
a.province, a.city
from
student s
full outer join
address a
on
s.addressid=a.addressid;
结果如下:
Total MapReduce CPU Time Spent: 2 seconds 630 msec
OK
tom 11 1 guangdong guangzhou
jerry 12 2 guangdong shenzhen
mike 13 3 shanxi xian
john 14 4 shanxi hanzhong
mary 15 5 NULL NULL
NULL NULL NULL jiangshu nanjing
Time taken: 22.189 seconds, Fetched: 6 row(s)
如果您不想自己搭建kubernetes环境,推荐使用腾讯云容器服务TKE:无需自建,即可在腾讯云上使用稳定, 安全,高效,灵活扩展的 Kubernetes 容器平台;
如果您希望自己的镜像可以通过外网上传和下载,推荐腾讯云容器镜像服务TCR:像数据加密存储,大镜像多节点快速分发,跨地域镜像同步
微信搜索「程序员欣宸」,我是欣宸,期待与您一同畅游Java世界...
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。