前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Hive与HBase实现数据互导

Hive与HBase实现数据互导

作者头像
汤高
发布2018-01-11 16:12:18
1K0
发布2018-01-11 16:12:18
举报
文章被收录于专栏:积累沉淀积累沉淀

建立与HBase的识别表

hive> create table hive_hbase_1(key int,value string)

    > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:info")

    > TBLPROPERTIES ("hbase.table.name" = "userinfo");

OK

Time taken: 8.896 seconds

hive>

查看Hbase中的表

hbase(main):001:0> list

TABLE                                                                                                                                          

blog                                                                                                                                           

friend                                                                                                                                         

friend02                                                                                                                                       

heroes                                                                                                                                         

heroesIndex                                                                                                                                    

myt                                                                                                                                            

stu                                                                                                                                            

table                                                                                                                                          

tanggao                                                                                                                                        

tanggao11                                                                                                                                      

tanggao111                                                                                                                                     

tanggaozhou                                                                                                                                    

test

userinfo

word                                                    

word2

16 row(s) in 0.7760 seconds

=> ["blog", "friend", "friend02", "heroes", "heroesIndex", "myt", "stu", "table", "tanggao", "tanggao11", "tanggao111", "tanggaozhou", "test", "userinfo", "word", "word2"]

hbase(main):002:0>

2.使用sql导入数据

i.预先准备数据 在hdfs的user/tg目录下放一个a.txt

1tanggao

2zhousiyuan

3mother

4father

a)新建hive的数据表

hive> create table  famaly(id int,name string) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile;

OK

Time taken: 0.483 seconds

2.0.0版本会默认到你的hdfs根目录下的user/hadoop用户下找 ,比如我的是user/tg

hive> load data inpath 'a.txt' overwrite into table famaly;

Loading data to table default.famaly

OK

Time taken: 2.139 seconds

查看信息

hive> select * from famaly;

OK

1tanggao

2zhousiyuan

3mother

4father

Time taken: 2.912 seconds, Fetched: 4 row(s)

查看表结构

hive> desc famaly;

OK

id                  int                 

name                string              

Time taken: 0.549 seconds, Fetched: 2 row(s)

第二种方法查看表结构  同上

hive> describe famaly;

OK

id                  int                 

name                string              

Time taken: 0.087 seconds, Fetched: 2 row(s)

hive>

使用sql导入数据到hive_hbase_1

hive> insert  overwrite table hive_hbase_1 select * from famaly where id=1;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.

Query ID = tg_20160528223128_abf71520-622b-42b5-94c3-3bbb5492b558

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1464498775870_0001, Tracking URL = http://master:8088/proxy/application_1464498775870_0001/

Kill Command = /software/hadoop-2.6.4/bin/hadoop job  -kill job_1464498775870_0001

Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0

2016-05-28 22:57:58,093 Stage-0 map = 0%,  reduce = 0%

2016-05-28 22:58:28,439 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 3.81 sec

MapReduce Total cumulative CPU time: 3 seconds 810 msec

Ended Job = job_1464498775870_0001

MapReduce Jobs Launched:

Stage-Stage-0: Map: 1   Cumulative CPU: 3.91 sec   HDFS Read: 10964 HDFS Write: 0 SUCCESS

Total MapReduce CPU Time Spent: 3 seconds 910 msec

OK

Time taken: 89.327 seconds

查看数据

会显示刚刚插入的数据

hive> select * from hive_hbase_1;

OK

1tanggao

Time taken: 0.916 seconds, Fetched: 1 row(s)

hive>

登录HBase 查看HBase数据

hbase(main):002:0> scan 'userinfo'

ROW                   COLUMN+CELL                                               

 1                    column=cf1:info, timestamp=1464501508097, value=tanggao   

1 row(s) in 0.7990 seconds

hbase(main):003:0>

是不是很神奇,在hive中添加的数据已经在Hbase中了

下面再看看在Hbase中插入数据,看看hive中是不是也有了?试试看

hbase(main):003:0> put 'userinfo','fid','cf1:info','tangshaoyan'

0 row(s) in 0.2270 seconds

hbase(main):004:0>

查看hive

hive> select * from hive_hbase_1;

OK

1tanggao

NULLtangshaoyan

Time taken: 0.235 seconds, Fetched: 2 row(s)

hive>

没错,刚刚在hbase中插入的数据,已经在hive里了

hive访问Hbase中已经存在的Hbase表

HBase表报备

:已经存在了heroes表

hbase(main):007:0> scan 'heroes'

ROW                                  COLUMN+CELL                                                                                               

 0                                   column=info:email, timestamp=1463743975381, value=0@qq.com                                                

 0                                   column=info:name, timestamp=1463743975381, value=peter                                                    

 0                                   column=info:power, timestamp=1463743975381, value=Idotknow                                                

 1                                   column=info:email, timestamp=1463743975391, value=1@qq.com                                                

 1                                   column=info:name, timestamp=1463743975391, value=hiro                                                     

 1                                   column=info:power, timestamp=1463743975391, value=Idotknow                                                

 2                                   column=info:email, timestamp=1463743975396, value=2@qq.com                                                

 2                                   column=info:name, timestamp=1463743975396, value=sylar                                                    

 2                                   column=info:power, timestamp=1463743975396, value=Idotknow                                                

 3                                   column=info:email, timestamp=1463743975399, value=3@qq.com                                                

 3                                   column=info:name, timestamp=1463743975399, value=claire                                                   

 3                                   column=info:power, timestamp=1463743975399, value=Idotknow                                                

 4                                   column=info:email, timestamp=1463743975403, value=4@qq.com                                                

 4                                   column=info:name, timestamp=1463743975403, value=noah                                                     

 4                                   column=info:power, timestamp=1463743975403, value=Idotknow                                                

5 row(s) in 0.2140 seconds

hbase(main):008:0>

使用CREATE EXTERNAL TABLE:

hive> create external table  hbase_hive_1(key int,value string)

    > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = "info:name")

    > TBLPROPERTIES("hbase.table.name" = "heroes");

OK

Time taken: 0.424 seconds

hive> select * from hbase_hive_1;

OK

0peter

1hiro

2sylar

3claire

4noah

Time taken: 0.222 seconds, Fetched: 5 row(s)

hive>

从上面的操作后,hive已经可以访问HBase中已经存在的原有数据了

三、多列和多列族(Multiple Columns and Families)

hive> CREATE TABLE hive_hbase_add1(key int, value1 string, value2 int, value3 int)

    > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:col1,info:col2,city:area")

    > TBLPROPERTIES("hbase.table.name" = "student_info");

OK

Time taken: 2.624 seconds

hive> select * from hbase_hive_1;

OK

0peter

1hiro

2sylar

3claire

4noah

Time taken: 0.225 seconds, Fetched: 5 row(s)

hive> set hive.cli.print.header=true;

hive> select * from hbase_hive_1;

OK

hbase_hive_1.keyhbase_hive_1.value

0peter

1hiro

2sylar

3claire

4noah

Time taken: 0.203 seconds, Fetched: 5 row(s)

hive> desc hbase_hive_1;

OK

col_namedata_typecomment

key                 int                 

value               string              

Time taken: 0.198 seconds, Fetched: 2 row(s)

hive> insert overwrite table hive_hbase_add1 select key ,value,key+1,value from hbase_hive_1;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.

Query ID = tg_20160528223128_abf71520-622b-42b5-94c3-3bbb5492b558

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1464498775870_0002, Tracking URL = http://master:8088/proxy/application_1464498775870_0002/

Kill Command = /software/hadoop-2.6.4/bin/hadoop job  -kill job_1464498775870_0002

Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0

2016-05-28 23:48:40,536 Stage-0 map = 0%,  reduce = 0%

2016-05-28 23:49:06,565 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 3.52 sec

MapReduce Total cumulative CPU time: 3 seconds 520 msec

Ended Job = job_1464498775870_0002

MapReduce Jobs Launched:

Stage-Stage-0: Map: 1   Cumulative CPU: 3.52 sec   HDFS Read: 5082 HDFS Write: 0 SUCCESS

Total MapReduce CPU Time Spent: 3 seconds 520 msec

OK

_col0_col1_col2_col3

Time taken: 86.323 seconds

hive> insert overwrite table hive_hbase_add1 select key ,value,key+1,key+1000 from hbase_hive_1;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.

Query ID = tg_20160528223128_abf71520-622b-42b5-94c3-3bbb5492b558

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1464498775870_0003, Tracking URL = http://master:8088/proxy/application_1464498775870_0003/

Kill Command = /software/hadoop-2.6.4/bin/hadoop job  -kill job_1464498775870_0003

Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0

2016-05-28 23:54:01,682 Stage-0 map = 0%,  reduce = 0%

2016-05-28 23:54:24,725 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 4.23 sec

MapReduce Total cumulative CPU time: 4 seconds 230 msec

Ended Job = job_1464498775870_0003

MapReduce Jobs Launched:

Stage-Stage-0: Map: 1   Cumulative CPU: 4.23 sec   HDFS Read: 5166 HDFS Write: 0 SUCCESS

Total MapReduce CPU Time Spent: 4 seconds 230 msec

OK

keyvaluec2c3

Time taken: 72.543 seconds

hive> select * from hive_hbase_add1;

OK

hive_hbase_add1.keyhive_hbase_add1.value1hive_hbase_add1.value2hive_hbase_add1.value3

0peter1NULL

1hiro2NULL

2sylar3NULL

3claire4NULL

4noah5NULL

Time taken: 0.454 seconds, Fetched: 5 row(s)

第四列类型不匹配,没有插进,都是null

下面换为正确的类型

hive> select * from hive_hbase_add1;

OK

hive_hbase_add1.keyhive_hbase_add1.value1hive_hbase_add1.value2hive_hbase_add1.value3

0peter11000

1hiro21001

2sylar31002

3claire41003

4noah51004

Time taken: 0.746 seconds, Fetched: 5 row(s)

hive>

登录HBase查看数据

hbase(main):008:0> list

TABLE                                            

blog                                             

friend                                           

friend02                                         

heroes                                           

heroesIndex                                      

myt                                              

stu                                              

student_info                                     

table                                            

tanggao                                          

tanggao11                                        

tanggao111                                       

tanggaozhou                                      

test                                             

userinfo                                         

17 row(s) in 0.1140 seconds

=> ["blog", "friend", "friend02", "heroes", "heroesIndex", "myt", "stu", "student_info", "table", "tanggao", "tanggao11", "tanggao111", "tanggaozhou", "test", "userinfo", "word", "word2"]

hbase(main):009:0> scan 'student_info'

ROW           COLUMN+CELL                        

 0            column=city:area, timestamp=1464504

              863521, value=1000                 

 0            column=info:col1, timestamp=1464504

              863521, value=peter                

 0            column=info:col2, timestamp=1464504

              863521, value=1                    

 1            column=city:area, timestamp=1464504

              863521, value=1001                 

 1            column=info:col1, timestamp=1464504

              863521, value=hiro                 

 1            column=info:col2, timestamp=1464504

              863521, value=2                    

 2            column=city:area, timestamp=1464504

              863521, value=1002                 

 2            column=info:col1, timestamp=1464504

              863521, value=sylar                

 2            column=info:col2, timestamp=1464504

              863521, value=3                    

 3            column=city:area, timestamp=1464504

              863521, value=1003                 

 3            column=info:col1, timestamp=1464504

              863521, value=claire               

 3            column=info:col2, timestamp=1464504

              863521, value=4                    

 4            column=city:area, timestamp=1464504

              863521, value=1004                 

 4            column=info:col1, timestamp=1464504

              863521, value=noah                 

 4            column=info:col2, timestamp=1464504

              863521, value=5                    

5 row(s) in 0.0720 seconds

hbase(main):010:0>

发现hive中的四列 key int, value1 string, value2 int, value3 int

key对应HBase的行健

value1,value2对应Hbase的列族info的col1,col2

value3  对应Hbase列族city:area

是不是发现了hive中的表,多列存放到hbase少量固定的列簇中。

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2016-05-29 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档