1.下载tar包,在客户端窗口上执行
wget -c http://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.7.0.tar.gz
2.解压到app目录下
tar -zxvf hive-1.1.0-cdh5.7.0.tar.gz -C ~/app
3.配置~/.bash_profile
export HIVE_HOME=/home/hadoop/app/hive-1.1.0-cdh5.7.0 export PATH=$HIVE_HOME/bin:$PATH
4.环境配置
cd hive-1.1.0-cdh5.7.0/conf/ cp hive-env.sh.template hive-env.sh vi hive-env.sh
# Set HADOOP_HOME to point to a specific hadoop install directory # HADOOP_HOME=${bin}/../../hadoop HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
5.安装mysql,yum install mysql-xxx 6.配置hive-site.xml文件
<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hadoop000:3306/sparksql?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>root</value> <description>password to use against metastore database</description> </property> </configuration>
7.拷贝数据库驱动到hive-1.1.0-cdh5.7.0/lib
目录下
cp /home/hadoop/software/mysql-connector-java-5.1.27-bin.jar
8.启动hive,bin/hive
image.png
9.到mysql数据库里查看sparksql
mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | azkaban | | hive | | mysql | | spark | | sparksql | | sqoop | | ss | | test | +--------------------+ 9 rows in set (0.00 sec) mysql> use sparksql Database changed mysql> show tables; +---------------------------+ | Tables_in_sparksql | +---------------------------+ | BUCKETING_COLS | | CDS | | COLUMNS_V2 | | DATABASE_PARAMS | | DBS | | FUNCS | | FUNC_RU | | GLOBAL_PRIVS | | PARTITIONS | | PARTITION_KEYS | | PARTITION_KEY_VALS | | PART_COL_STATS | | ROLES | | SDS | | SEQUENCE_TABLE | | SERDES | | SKEWED_COL_NAMES | | SKEWED_COL_VALUE_LOC_MAP | | SKEWED_STRING_LIST | | SKEWED_STRING_LIST_VALUES | | SKEWED_VALUES | | SORT_COLS | | TAB_COL_STATS | | TBLS | | VERSION | +---------------------------+ 25 rows in set (0.00 sec)
10.在Hive中建表,
create table hive_count(context string );
报错:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)
解决:
这是由于字符集的问题,需要配置MySQL的字符集: mysql> alter database hive character set latin1;
11.建好表以后,查看sparksql
里的TBLS
和COLUMNS_V2
image.png
12.查看hive官方文档
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
13.在/home/hadoop/data/data.log
下面有外部数据,把它导入到hive表里
LOAD DATA LOCAL INPATH '/home/hadoop/data/data.log' INTO TABLE hive_count;
14.select
下结果:
image.png
15.hive ql提交执行后生成MR作业,并在yarn上运行。
select word,count(1) from hive_count lateral view explode(split(context,'\t')) wc as word group by word;
lateral view explode(split(context,'\t')) :把每行记录按照指定分割符进行分解。
16.确认结果
image.png
yarn页面,
http://hostname:8088/cluster/apps
image.png
本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。
我来说两句