前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >[Hadoop大数据]——Hive部署入门教程

[Hadoop大数据]——Hive部署入门教程

作者头像
用户1154259
发布2018-01-17 11:00:00
1.6K0
发布2018-01-17 11:00:00
举报

Hive是为了解决hadoop中mapreduce编写困难,提供给熟悉sql的人使用的。只要你对SQL有一定的了解,就能通过Hive写出mapreduce的程序,而不需要去学习hadoop中的api。

在部署前需要确认安装jdk以及Hadoop

如果需要安装jdk以及hadoop可以参考我之前的博客:

Linux下安装jdk Linux下安装hadoop伪分布式

在安装之前,先了解下Hive都有哪些东西。

下载并解压缩

去主页选择镜像地址:

http://www.apache.org/dyn/closer.cgi/hive/

在镜像地址中选择下载的版本,我这里使用的是最新的2.1.0

https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.1.0/

下载后,解压缩

代码语言:javascript
复制
# 解压缩
tar -zxvf apache-hive-2.1.0-bin.tar.gz

# 拷贝到指定目录
mv apache-hive-2.1.0-bin /usr

修改环境变量

代码语言:javascript
复制
# 编辑/etc/profile
vi /etc/profile

# set hive environment
export HIVE_HOME=/usr/apache-hive-2.1.0-bin
export PATH=$PATH:$HIVE_HOME/bin

# 配置文件生效
source /etc/profile

创建并修改配置文件

代码语言:javascript
复制
# 进入conf目录
cd $HIVE_HOME/conf

# 拷贝hive-site.xml
cp hive-default.xml.template hive-site.xml

# 修改其中的参数
修改所有的system.io.tmpdir为指定的目录(自己定,我放在/usr/hive/tmp下了)
修改所有的system.user.name为自己的名字(我直接修改成xingoo了)

初始化schema

由于Hive中所有的原信息都需要存储到关系型数据库里面,因此需要初始化数据库表。我这里直接使用的是默认的derby数据库,关于数据的配置就不用修改了,直接使用默认的就好了。

代码语言:javascript
复制
# 进入指定的目录
cd $HIVE_HOME/bin

# 初始化,如果是mysql则derby可以直接替换成mysql
./schematool -initSchema -dbType derby

注意这里有可能报错:

代码语言:javascript
复制
[root@localhost bin]# ./schematool -initSchema -dbType derby createDatabaseIfNotExist=true
which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:    jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:   APP
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.derby.sql
Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***

这时需要修改derby初始化脚本

代码语言:javascript
复制
# 进入指定的目录
cd $HIVE_HOME/scripts/metastore/upgrade/derby

# 修改错误堆栈中的脚本hive-schema-2.1.0.derby.sql,注释上面两句
-- ----------------------------------------------
-- DDL Statements for functions
-- ----------------------------------------------

-- CREATE FUNCTION "APP"."NUCLEUS_ASCII" (C CHAR(1)) RETURNS INTEGER LANGUAGE JAVA PARAMETER STYLE JAVA READS SQL DATA CALLED ON NULL INPUT EXTERNAL NAME 'org.datanucleus.store.rdbms.adapter.DerbySQLFunction.ascii' ;

-- CREATE FUNCTION "APP"."NUCLEUS_MATCHES" (TEXT VARCHAR(8000),PATTERN VARCHAR(8000)) RETURNS INTEGER LANGUAGE JAVA PARAMETER STYLE JAVA READS SQL DATA CALLED ON NULL INPUT EXTERNAL NAME 'org.datanucleus.store.rdbms.adapter.DerbySQLFunction.matches' ;

再次执行,就可以了

代码语言:javascript
复制
[root@localhost bin]# vim ../scripts/metastore/upgrade/derby/hive-schema-2.1.0.derby.sql 
[root@localhost bin]# ./schematool -initSchema -dbType derby createDatabaseIfNotExist=true
which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:    jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:   APP
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.derby.sql
Initialization script completed
schemaTool completed

启动hive cli测试

代码语言:javascript
复制
[root@localhost bin]# hive
which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/apache-hive-2.1.0-bin/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show databases;
OK
default
Time taken: 1.931 seconds, Fetched: 1 row(s)

至此,Hive就算安装完啦!

踩过的坑

没有初始化数据库,定义Schema
代码语言:javascript
复制
[root@localhost bin]# hive
which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/apache-hive-2.1.0-bin/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql))
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:578)
    at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:518)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql))
    at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:226)
    at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:366)
    at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:310)
    at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:290)
    at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:266)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:545)
    ... 9 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql))
    at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3593)
    at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:236)
    at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:221)
    ... 14 more
Caused by: MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql))
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3364)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3336)
    at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3590)
    ... 16 more

解决办法:

执行命令$HIVE_HOME/bin/schematool -initSchema -dbType derby

derby数据库脚本定义有问题
代码语言:javascript
复制
[root@localhost bin]# ./schematool -initSchema -dbType derby createDatabaseIfNotExist=true
which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:    jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:   APP
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.derby.sql
Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***

解决办法:

注释掉sql脚本中的两句定义..

配置文件中${system.io.tmpdir}不识别
代码语言:javascript
复制
[root@localhost bin]# hive
which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/apache-hive-2.1.0-bin/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
    at org.apache.hadoop.fs.Path.initialize(Path.java:206)
    at org.apache.hadoop.fs.Path.<init>(Path.java:172)
    at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:631)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)
    at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:518)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
    at java.net.URI.checkPath(URI.java:1823)
    at java.net.URI.<init>(URI.java:745)
    at org.apache.hadoop.fs.Path.initialize(Path.java:203)
    ... 12 more

解决办法:

替换hive-site.xml中的${system.io.tmpdir}属性

hive-site.xml中${system.user.name}不识别
代码语言:javascript
复制
[root@localhost bin]# hive
which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/apache-hive-2.1.0-bin/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show databases;
OK
Failed with exception java.io.IOException:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:user.name%7D
Time taken: 1.781 seconds

解决办法:

替换hive-site.xml中的${system.user.name}属性

参考

1 FUNCTION 'NUCLEUS_ASCII' already exists. 2 如何在hive中定义schema 3 Hive部署注意事项 4 Hive配置入门

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2016-08-16 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 在部署前需要确认安装jdk以及Hadoop
  • 下载并解压缩
  • 修改环境变量
  • 创建并修改配置文件
  • 初始化schema
  • 启动hive cli测试
  • 踩过的坑
    • 没有初始化数据库,定义Schema
      • derby数据库脚本定义有问题
        • 配置文件中${system.io.tmpdir}不识别
          • hive-site.xml中${system.user.name}不识别
          • 参考
          相关产品与服务
          数据库
          云数据库为企业提供了完善的关系型数据库、非关系型数据库、分析型数据库和数据库生态工具。您可以通过产品选择和组合搭建,轻松实现高可靠、高可用性、高性能等数据库需求。云数据库服务也可大幅减少您的运维工作量,更专注于业务发展,让企业一站式享受数据上云及分布式架构的技术红利!
          领券
          问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档