【HBase】HBase的环境搭建及基本使用

版权声明:本文为博主原创文章,转载请注明出处。 https://blog.csdn.net/gongxifacai_believe/article/details/81151090

1、HBase体系结构

2、HBase功能

HBase是一种Hadoop 数据库,用于存储数据和检索数据。与RDBMS 相比,HBase可以存储海量数据,数据条目数可达上亿条,可以准实时检索,检索的速度达到秒级别。HBase是基于HDFS的,具有HDFS的优势:存在多个副本,数据安全性高,普通商用PC或Server就可以,而RDBMS的服务器都很贵。

3、HBase表的设计

HBase是一种列式存储的数据库,也是一种NOSQL数据库(NOSQL = Not Only SQL),每一列可以存放多个版本的值,表中每条数据有唯一的标识符,即rowkey,就是这一条数据的主键。 每条数据的构成格式:rowkey + columnfamily + column01 + timestamp : value => cell。cell中用字节数组进行存储,可使用工具类Bytes进行字节数组和其他类型的转换。

4、HBase的安装

(1)进入/opt/software/目录,将hbase安装包上传虚拟机。 (2)对HBase安装包赋予执行权限: software]$ chmod u+x hbase-0.98.6-hadoop2-bin.tar.gz (3)解压HBase安装包: software]$ tar -zxf hbase-0.98.6-hadoop2-bin.tar.gz -C /opt/modules/ (4)进入/opt/modules/hadoop-2.5.0目录,启动namenode和datanode。 (5)修改配置文件/opt/modules/hbase-0.98.6-hadoop2/conf/hbase-site.xml。

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <property>
                <name>hbase.rootdir</name>
                <value>hdfs://hadoop-senior.ibeifeng.com:8020/hbase</value>
        </property>
        <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>
        </property>
        <property>
                <name>hbase.zookeeper.quorum</name>
                <value>hadoop-senior.ibeifeng.com</value>
        </property>
</configuration>

(6)修改配置文件/opt/modules/hbase-0.98.6-hadoop2/conf/hbase-env.sh。

export JAVA_HOME=/opt/modules/jdk1.7.0_67
# export HBASE_MANAGES_ZK=true

(7)修改配置文件/opt/modules/hbase-0.98.6-hadoop2/conf/regionservers。

hadoop-senior.ibeifeng.com

(8)进入/opt/modules/hbase-0.98.6-hadoop2/lib目录,hbase-0.98.6默认hadoop-2.2.0,换成我使用的hadoop版本hadoop-2.5.0。删除lib目录下的hadoop-2.2.0版本的所有jar包(以hadoop开头的所有jar包都删除),上传hadoop-2.5.0版本,并将zookeeper-3.4.6.jar替换为zookeeper-3.4.5.jar:

[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-annotations-2.2.0.jar 
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-auth-2.2.0.jar 
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-common-2.2.0.jar 
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-hdfs-2.2.0.jar 
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-mapreduce-client-app-2.2.0.jar 
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-mapreduce-client-common-2.2.0.jar 
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-mapreduce-client-core-2.2.0.jar 
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-mapreduce-client-jobclient-2.2.0.jar 
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-mapreduce-client-shuffle-2.2.0.jar 
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-yarn-api-2.2.0.jar 
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-yarn-client-2.2.0.jar 
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-yarn-common-2.2.0.jar 
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-yarn-server-common-2.2.0.jar 
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-yarn-server-nodemanager-2.2.0.jar 
[beifeng@hadoop-senior lib]$ rm -rf ./hadoop-client-2.2.0.jar 
[beifeng@hadoop-senior lib]$ rm -rf ./zookeeper-3.4.6.jar 

(9)hbase启动方式之一:进入/opt/modules/hbase-0.98.6-hadoop2目录,启动hbase进程,使用hbase自带的zookeeper(我们已经将zookeeper-3.4.6.jar替换为zookeeper-3.4.5.jar): hbase-0.98.6-hadoop2]$ bin/start-hbase.sh 查看hbase进程:

[beifeng@hadoop-senior hbase-0.98.6-hadoop2]$ jps
2813 HRegionServer
3162 Jps
2724 HMaster
2670 HQuorumPeer
2196 DataNode
2137 NameNode

(10)hbase启动方式之二:启动我们自己安装的zookeeper,并分别启动master和regionserver: zookeeper-3.4.5]$ bin/zkServer.sh start hbase-0.98.6-hadoop2]$ bin/hbase-daemon.sh start master hbase-0.98.6-hadoop2]$ bin/hbase-daemon.sh start regionserver 查看hbase进程: [beifeng@hadoop-senior hbase-0.98.6-hadoop2]$ jps

6283 QuorumPeerMain
6483 Jps
6334 HMaster
2196 DataNode
2137 NameNode
6431 HRegionServer

(11)停止hbase进程: hbase-0.98.6-hadoop2]$ bin/stop-hbase.sh

5、HBase的基本使用

(1)启动hbase shell命令行: hbase-0.98.6-hadoop2]$ bin/hbase shell (2)列出hbase中的表: hbase(main):001:0> list

TABLE                                                                                                                                 
2018-07-22 11:46:58,921 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
0 row(s) in 3.0660 seconds

=> []

(3)创建表,表名user,列簇info: hbase(main):002:0> create 'user','info'

0 row(s) in 0.6260 seconds

=> Hbase::Table - user

(4)查询表user的信息: hbase(main):003:0> describe 'user'

DESCRIPTION                                                                            ENABLED                                        
 'user', {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICA true                                           
 TION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL =                                                
 > 'FOREVER', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false                                                
 ', BLOCKCACHE => 'true'}                                                                                                             
1 row(s) in 0.0700 seconds

(5)向表user中插入数据。表名user,rowkey为10001,列簇info,列名name等,cell值为zhangsan: hbase(main):004:0> put 'user','10001','info:name','zhangsan' hbase(main):005:0> put 'user','10001','info:age','25' hbase(main):006:0> put 'user','10001','info:sex','male' hbase(main):007:0> put 'user','10001','info:address','shanghai'

HBase中的数据查询有三种方式: 1)依据rowkey查询,这是最快的,使用get命令; 2)依据范围查询,这是最常用的,使用scan range命令; 3)全表扫描,这是最慢的,使用scan命令。

(6)查询user表中列簇为10001的信息: hbase(main):008:0> get 'user','10001'

COLUMN                             CELL                                                                                               
 info:address                      timestamp=1532231767144, value=shanghai                                                            
 info:age                          timestamp=1532231729180, value=25                                                                  
 info:name                         timestamp=1532231687833, value=zhangsan                                                            
 info:sex                          timestamp=1532231746853, value=male                                                                
4 row(s) in 0.0300 seconds

查询user表中列簇为10001,列名为name的信息: hbase(main):009:0> get 'user','10001','info:name'

COLUMN                             CELL                                                                                               
 info:name                         timestamp=1532231687833, value=zhangsan                                                            
1 row(s) in 0.0160 seconds

(7)插入rowkey为10002的信息: hbase(main):010:0> put 'user','10002','info:name','wangwu' hbase(main):011:0> put 'user','10002','info:age','30' hbase(main):012:0> put 'user','10002','info:tel','25354212' hbase(main):013:0> put 'user','10002','info:qq','232523551' 全表扫描user表: hbase(main):014:0> scan 'user'

ROW                                COLUMN+CELL                                                                                        
 10001                             column=info:address, timestamp=1532231767144, value=shanghai                                       
 10001                             column=info:age, timestamp=1532231729180, value=25                                                 
 10001                             column=info:name, timestamp=1532231687833, value=zhangsan                                          
 10001                             column=info:sex, timestamp=1532231746853, value=male                                               
 10002                             column=info:age, timestamp=1532232249589, value=30                                                 
 10002                             column=info:name, timestamp=1532232223162, value=wangwu                                            
 10002                             column=info:qq, timestamp=1532232294714, value=232523551                                           
 10002                             column=info:tel, timestamp=1532232273419, value=25354212                                           
2 row(s) in 0.0450 seconds

(8)插入user表中列簇为10003的信息: hbase(main):015:0> put 'user','10003','info:name','zhaoliu' (9)范围查询:查询user表中的name列和age列的信息: hbase(main):016:0> scan 'user',{COLUMNS => ['info:name','info:age']}

ROW                                COLUMN+CELL                                                                                        
 10001                             column=info:age, timestamp=1532231729180, value=25                                                 
 10001                             column=info:name, timestamp=1532231687833, value=zhangsan                                          
 10002                             column=info:age, timestamp=1532232249589, value=30                                                 
 10002                             column=info:name, timestamp=1532232223162, value=wangwu                                            
 10003                             column=info:name, timestamp=1532232516020, value=zhaoliu                                           
3 row(s) in 0.0410 seconds

(10)范围查询:查询user表中起始rowkey为10002开始的行信息: hbase(main):017:0> scan 'user', {STARTROW=>'10002'}

ROW                                COLUMN+CELL                                                                                        
 10002                             column=info:age, timestamp=1532232249589, value=30                                                 
 10002                             column=info:name, timestamp=1532232223162, value=wangwu                                            
 10002                             column=info:qq, timestamp=1532232294714, value=232523551                                           
 10002                             column=info:tel, timestamp=1532232273419, value=25354212                                           
 10003                             column=info:name, timestamp=1532232516020, value=zhaoliu                                           
2 row(s) in 0.0340 seconds

(11)删除user表中rowkey为10001,列簇为info,列名为name的列数据: hbase(main):018:0> delete 'user','10001','info:name' (12)全表扫描user表: hbase(main):019:0> scan 'user'

ROW                                COLUMN+CELL                                                                                        
 10001                             column=info:address, timestamp=1532231767144, value=shanghai                                       
 10001                             column=info:age, timestamp=1532231729180, value=25                                                 
 10001                             column=info:sex, timestamp=1532231746853, value=male                                               
 10002                             column=info:age, timestamp=1532232249589, value=30                                                 
 10002                             column=info:name, timestamp=1532232223162, value=wangwu                                            
 10002                             column=info:qq, timestamp=1532232294714, value=232523551                                           
 10002                             column=info:tel, timestamp=1532232273419, value=25354212                                           
 10003                             column=info:name, timestamp=1532232516020, value=zhaoliu                                           
3 row(s) in 0.0340 seconds

(13)删除user表中rowkey为10001的全部信息: hbase(main):020:0> deleteall 'user','10001' 全表扫描user表: hbase(main):021:0> scan 'user'

ROW                                COLUMN+CELL                                                                                        
 10002                             column=info:age, timestamp=1532232249589, value=30                                                 
 10002                             column=info:name, timestamp=1532232223162, value=wangwu                                            
 10002                             column=info:qq, timestamp=1532232294714, value=232523551                                           
 10002                             column=info:tel, timestamp=1532232273419, value=25354212                                           
 10003                             column=info:name, timestamp=1532232516020, value=zhaoliu                                           
2 row(s) in 0.0230 seconds

(14)禁用user表: hbase(main):022:0> disable 'user' (15)启用user表: hbase(main):023:0> enable 'user' (16)删除user表: hbase(main):024:0> drop 'user' (17)退出hbase shell命令行: hbase(main):025:0> exit

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

扫码关注云+社区

领取腾讯云代金券