前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >hive的order by操作

hive的order by操作

作者头像
bboy枫亭
发布2020-09-22 10:52:38
1.4K0
发布2020-09-22 10:52:38
举报
文章被收录于专栏:csdn_blog

Hive中常见的高级查询包括:group byOrder byjoindistribute bysort bycluster byUnion all。今天我们来看看order by操作,Order by表示按照某些字段排序,语法如下:

代码语言:javascript
复制
select col,col2...
from tableName
where condition
order by col1,col2 [asc|desc]

注意: (1) order by后面可以有多列进行排序,默认按字典排序。

(2) order by为全局排序。

(3) order by需要reduce操作,且只有一个reduce,无法配置(因为多个reduce无法完成全局排序)。

order by操作会受到如下属性的制约:

代码语言:javascript
复制
set hive.mapred.mode=nonstrict; (default value / 默认值)
 
set hive.mapred.mode=strict;

注:如果在strict模式下使用order by语句,那么必须要在语句中加上limit关键字,因为执行order by的时候只能启动单个reduce,如果排序的结果集过大,那么执行时间会非常漫长。

下面我们通过一个示例来深入体会order by的用法:

数据库有一个employees表,数据如下:

代码语言:javascript
复制
hive> select * from employees;
OK
lavimer	15000.0	["li","lu","wang"]	{"k1":1.0,"k2":2.0,"k3":3.0}	{"street":"dingnan","city":"ganzhou","num":101}	2015-01-24	love
liao	18000.0	["liu","li","huang"]	{"k4":2.0,"k5":3.0,"k6":6.0}	{"street":"dingnan","city":"ganzhou","num":102}	2015-01-24	love
zhang	19000.0	["xiao","wen","tian"]	{"k7":7.0,"k8":8.0,"k8":8.0}	{"street":"dingnan","city":"ganzhou","num":103}	2015-01-24	love

现在我要按第二列(salary)降序排列:

代码语言:javascript
复制
hive> select * from employees order by salary desc;
//执行MapReduce的过程
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 2.62 sec   HDFS Read: 415 HDFS Write: 245 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 620 msec
OK
zhang	19000.0	["xiao","wen","tian"]	{"k7":7.0,"k8":8.0}	{"street":"dingnan","city":"ganzhou","num":103}	2015-01-24	love
liao	18000.0	["liu","li","huang"]	{"k4":2.0,"k5":3.0,"k6":6.0}	{"street":"dingnan","city":"ganzhou","num":102}	2015-01-24	love
lavimer	15000.0	["li","lu","wang"]	{"k1":1.0,"k2":2.0,"k3":3.0}	{"street":"dingnan","city":"ganzhou","num":101}	2015-01-24	love
Time taken: 20.484 seconds
hive> 

此时的hive.mapred.mode属性为:

代码语言:javascript
复制
hive> set hive.mapred.mode;
hive.mapred.mode=nonstrict
hive> 

现在我们将它改为strict,然后再使用order by进行查询:

代码语言:javascript
复制
hive> set hive.mapred.mode=strict;
hive> select * from employees order by salary desc;
FAILED: Error in semantic analysis: 1:33 In strict mode, if ORDER BY is specified, LIMIT must also be specified. Error encountered near token 'salary'
hive> 

注:在strict模式下查询必须加上limit关键字。

代码语言:javascript
复制
hive> select * from employees order by salary desc limit 3;
FAILED: Error in semantic analysis: No partition predicate found for Alias "employees" Table "employees"

注:另外还有一个要注意的是strict模式也会限制分区表的查询,解决方案是必须指定分区

先来看看分区:

代码语言:javascript
复制
hive> show partitions employees;
OK
date_time=2015-01-24/type=love
Time taken: 0.096 seconds

在strict模式先使用order by查询:

代码语言:javascript
复制
hive> select * from employees where partition(date_time='2015-01-24',type='love') order by salary desc limit 3;
FAILED: Parse Error: line 1:30 cannot recognize input near 'partition' '(' 'date_time' in expression specification
 
hive                                                                                                              
    > select * from employees where date_time='2015-01-24' and type='love' order by salary desc limit 3;          
 
//执行MapReduce程序
 
Total MapReduce CPU Time Spent: 3 seconds 510 msec
OK
zhang	19000.0	["xiao","wen","tian"]	{"k7":7.0,"k8":8.0}	{"street":"dingnan","city":"ganzhou","num":103}	2015-01-24	love
liao	18000.0	["liu","li","huang"]	{"k4":2.0,"k5":3.0,"k6":6.0}	{"street":"dingnan","city":"ganzhou","num":102}	2015-01-24	love
lavimer	15000.0	["li","lu","wang"]	{"k1":1.0,"k2":2.0,"k3":3.0}	{"street":"dingnan","city":"ganzhou","num":101}	2015-01-24	love
Time taken: 19.861 seconds
hive> 
本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2020/07/29 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档