前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >HiveQL:查询

HiveQL:查询

作者头像
Michael阿明
发布2021-09-06 10:45:25
2660
发布2021-09-06 10:45:25
举报
文章被收录于专栏:Michael阿明学习之路

文章目录

学自《hive编程指南》

1. select from

代码语言:javascript
复制
hive (default)> create table employees(
              > name string,
              > salary float,
              > subordinates array<string>,
              > deductions map<string, float>,
              > address struct<street:string, city:string, state:string, zip:int>)
              > partitioned by(country string, state string);

hive (default)> load data local inpath "/home/hadoop/workspace/employees.txt"
              > overwrite into table employees
              > partition(country='US', state='CA');
Loading data to table default.employees partition (country=US, state=CA)

hive (default)> select * from employees;
John Doe	100000.0	["Mary Smith","Todd Jones"]	{"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1}	{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}	US	CA
Mary Smith	80000.0	["Bill King"]	{"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1}	{"street":"100 Ontario St.","city":"Chicago","state":"IL","zip":60601}	US	CA
Todd Jones	70000.0	[]	{"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1}	{"street":"200 Chicago Ave.","city":"Oak Park","state":"IL","zip":60700}	US	CA
Bill King	60000.0	[]	{"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1}	{"street":"300 Obscure Dr.","city":"Obscuria","state":"IL","zip":60100}	US	CA
Boss Man	200000.0	["John Doe","Fred Finance"]	{"Federal Taxes":0.3,"State Taxes":0.07,"Insurance":0.05}	{"street":"1 Pretentious Drive.","city":"Chicago","state":"IL","zip":60500}	US	CA
Fred Finance	150000.0	["Stacy Accountant"]	{"Federal Taxes":0.3,"State Taxes":0.07,"Insurance":0.05}	{"street":"2 Pretentious Drive.","city":"Chicago","state":"IL","zip":60500}	US	CA
Stacy Accountant	60000.0	[]	{"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1}	{"street":"300 Main St.","city":"Naperville","state":"IL","zip":60563}	US	CA              
  • 可以对表起别名
代码语言:javascript
复制
hive (default)> select name, salary from employees;
hive (default)> select e.name, e.salary from employees e;

John Doe	100000.0
Mary Smith	80000.0
Todd Jones	70000.0
Bill King	60000.0
Boss Man	200000.0
Fred Finance	150000.0
Stacy Accountant	60000.0
  • 提取数组元素 [idx],不存在为NULL,提取出的字符串也没有引号
代码语言:javascript
复制
hive (default)> select e.name, e.subordinates[0] from employees e;

John Doe	Mary Smith
Mary Smith	Bill King
Todd Jones	NULL
Bill King	NULL
Boss Man	John Doe
Fred Finance	Stacy Accountant
Stacy Accountant	NULL
  • 提取 map 元素 [key]
代码语言:javascript
复制
hive (default)> select e.name, e.deductions['State Taxes'] from employees e;

John Doe	0.05
Mary Smith	0.05
Todd Jones	0.03
Bill King	0.03
Boss Man	0.07
Fred Finance	0.07
Stacy Accountant	0.03
  • 提取 struct 中的元素,使用 .
代码语言:javascript
复制
hive (default)> select e.name, e.address.city from employees e;

John Doe	Chicago
Mary Smith	Chicago
Todd Jones	Oak Park
Bill King	Obscuria
Boss Man	Chicago
Fred Finance	Chicago
Stacy Accountant	Naperville

1.1 正则表达式指定列

代码语言:javascript
复制
select `price.*` from stocks;

以 price为前缀的列

1.2 使用列值计算

  • 计算税后薪资
代码语言:javascript
复制
hive (default)> select upper(name), salary, deductions['Federal Taxes'],
              > round(salary*(1-deductions['Federal Taxes'])) from employees;

JOHN DOE	100000.0	0.2	80000.0
MARY SMITH	80000.0	0.2	64000.0
TODD JONES	70000.0	0.15	59500.0
BILL KING	60000.0	0.15	51000.0
BOSS MAN	200000.0	0.3	140000.0
FRED FINANCE	150000.0	0.3	105000.0
STACY ACCOUNTANT	60000.0	0.15	51000.0

1.3 使用函数

  • 聚合函数
代码语言:javascript
复制
select count(*), avg(salary) from employees;
代码语言:javascript
复制
set hive.map.aggr=true; # 可以提高聚合性能,但需要更多内存
代码语言:javascript
复制
select distinct address.city from employees;
# distinct 去重
  • 表生成函数,将单列扩展为多行或者多列
代码语言:javascript
复制
hive (default)> select explode(subordinates) as sub from employees;

Mary Smith
Todd Jones
Bill King
John Doe
Fred Finance
Stacy Accountant
  • 内置函数

1.4 limit 限制返回行数

limit n 返回 n 行

1.5 别名 as name

1.6 case when then 语句

代码语言:javascript
复制
hive (default)> select name, salary,
              > case when salary < 50000 then 'low'
              > 	else 'high'
              > 	end as bracket from employees;

John Doe	100000.0	high
Mary Smith	80000.0	high
Todd Jones	70000.0	high
Bill King	60000.0	high
Boss Man	200000.0	high
Fred Finance	150000.0	high
Stacy Accountant	60000.0	high

2. where 语句

  • 过滤条件
  • like, rlike(正则)
代码语言:javascript
复制
hive (default)> select name, address.street from employees where address.street like "%Ave.";
OK
John Doe	1 Michigan Ave.
Todd Jones	200 Chicago Ave.

hive (default)> select name, address.street from employees where address.street like "%Chi%";
OK
Todd Jones	200 Chicago Ave.

hive (default)> select name, address.street from employees where address.street rlike ".*(Chicago|Ontario).*";
OK
Mary Smith	100 Ontario St.
Todd Jones	200 Chicago Ave.

3. JOIN 优化

多个表 join 把小的表放在左边

4. 抽样查询

  • 分桶抽样
代码语言:javascript
复制
hive> select name from employees tablesample(bucket 3 out of 4 on rand());
John Doe

hive> select name from employees tablesample(bucket 3 out of 4 on rand());
Boss Man
Fred Finance
  • 不使用 rand(), 每次结果是一样的
代码语言:javascript
复制
hive> select name from employees tablesample(bucket 3 out of 4 on name);
Mary Smith
Todd Jones

hive> select name from employees tablesample(bucket 3 out of 4 on name);
Mary Smith
Todd Jones
  • 百分比抽样
代码语言:javascript
复制
hive> select name from employees tablesample(70 percent);

John Doe
Mary Smith
Todd Jones
Bill King
Boss Man

5. union all

将多个表进行合并,每个表必须有相同的列,且字段类型一致

代码语言:javascript
复制
hive> select name from(
    > select e1.name from employees e1 where e1.name like "Mary%"
    > union all
    > select e2.name from employees e2 where e2.name like "Bill%"
    > ) name_tab
    > sort by name;
    
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20210411221203_b3dde291-8596-4b91-95e0-707eeaa873f6
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2021-04-11 22:12:04,856 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1468526053_0003
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 31360 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec

Bill King
Mary Smith
本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2021/04/11 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 文章目录
  • 1. select from
    • 1.1 正则表达式指定列
      • 1.2 使用列值计算
        • 1.3 使用函数
          • 1.4 limit 限制返回行数
            • 1.5 别名 as name
              • 1.6 case when then 语句
              • 2. where 语句
              • 3. JOIN 优化
              • 4. 抽样查询
              • 5. union all
              领券
              问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档