Help & Documentation>Elastic MapReduce>EMR Development Guide>Sqoop Development Guide>Importing and Exporting Data Between Hive and TencentDB for MySQL

Importing and Exporting Data Between Hive and TencentDB for MySQL

Last updated: 2023-12-26 14:40:50

This document elucidates the methodology of utilizing Tencent Cloud's Sqoop service to reciprocally import and export data between MySQL and Hive.

Development Preparations

Ensure that Tencent Cloud has been activated and an EMR cluster has been established. During the creation of the EMR cluster, it is necessary to select Sqoop and Hive components in the software configuration interface.
Sqoop and related software are installed in the path /usr/local/service/ of the EMR cloud server.

Importing Relational Database into Hive

This section will continue to utilize the use case from the previous section.
Enter the Elastic MapReduce Console, copy the instance ID of the target cluster, which is the name of the cluster. Then, enter the relational database console, use Ctrl+F to search, find the MySQL database corresponding to the cluster, and view the internal address $mysqlIP of this database.
Log into any machine in the EMR cluster, preferably the Master node. For the method of logging into EMR, please refer to Logging into Linux Instance. Here, we can choose to log in using WebShell. Click on the login on the right side of the corresponding cloud server to enter the login interface. The username is set to root by default, and the password is the one entered by the user when creating EMR. After entering correctly, you can access the command line interface.
In the EMR command line, first use the following command to switch to the Hadoop user and enter the Hive folder:
[root@172 ~]# su hadoop
[hadoop@172 ~]# cd /usr/local/service/hive
Create a new Hive database:
[hadoop@172 hive]$ hive
hive> create database hive_from_sqoop;
OK
Time taken: 0.167 seconds
Use the sqoop-import command to import the MySQL database created in the previous section into Hive:
[hadoop@172 hive]# cd /usr/local/service/sqoop
[hadoop@172 sqoop]$ bin/sqoop-import --connect jdbc:mysql://$mysqlIP/test --username
root -P --table sqoop_test_back --hive-database hive_from_sqoop --hive-import --hive-table hive_from_sqoop
$mysqlIP: The internal address of Tencent Cloud Relational Database (CDB).
test: The name of the MySQL database.
--table: The name of the MySQL table to be exported.
--hive-database: The name of the Hive database.
--hive-table: The name of the Hive table to be imported.
Executing the command requires entering your MySQL password, which by default is the password set when creating your EMR cluster. Upon successful execution, the imported database can be viewed in Hive:
hive> select * from hive_from_sqoop;
OK
1 first 2018-07-03 16:07:46.0 spark
2 second 2018-07-03 15:30:57.0 mr
3 third 2018-07-03 15:31:07.0 yarn
4 forth 2018-07-03 15:39:38.0 hbase
5 fifth 2018-07-03 16:02:29.0 hive
6 sixth 2018-07-03 16:09:58.0 sqoop
Time taken: 1.245 seconds, Fetched: 6 row(s)

Importing Hive into a relational database.

Sqoop facilitates the import of data from Hive tables into relational databases. Initially, a new table is created in Hive and data is imported.
Log into any machine in the EMR cluster, preferably the Master node. In the EMR command line, first use the following command to switch to the Hadoop user and enter the Hive directory:
[root@172 ~]# su hadoop
[hadoop@172 ~]# cd /usr/local/service/hive
Create a new bash script file named gen_data.sh and add the following code to it:
#!/bin/bash
MAXROW=1000000 # Specifies the number of data rows to generate
for((i = 0; i < $MAXROW; i++))
do
   echo $RANDOM, \"$RANDOM\"
done
And execute in the following manner:
[hadoop@172 hive]$ ./gen_data.sh > hive_test.data
This script file will generate 1,000,000 pairs of random numbers and save them to the file named hive_test.data.
Use the following command to upload the generated test data to HDFS:
[hadoop@172 hive]$ hdfs dfs -put ./hive_test.data /$hdfspath
Where $hdfspath is the path on HDFS where you store your files.
Connect to Hive and create a test table:
[hadoop@172 hive]$ bin/hive
hive> create database hive_to_sqoop; #Create database hive_to_sqoop
OK
Time taken: 0.176 seconds
hive> use hive_to_sqoop; #Switch database
OK
Time taken: 0.176 seconds
hive> create table hive_test (a int, b string)
hive> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
                #Create data table hive_test, and specify the column separator as ','
OK
Time taken: 0.204 seconds
hive> load data inpath "/$hdfspath/hive_test.data" into table hive_test; #Import data
$hdfspath is the path where you store files on HDFS.
Upon successful completion, you can use the quit command to exit the Hive data warehouse. Connect to the relational database and create the corresponding tables:
[hadoop@172 hive]$ mysql -h $mysqlIP –p
Enter password:
Where $mysqlIP is the internal address of the database, and the password is the one you set when creating the cluster.
Create a table named test in MySQL, The field names in MySQL must match exactly with the field names in Hive:
mysql> create table table_from_hive (a int,b varchar(255));
Upon successful creation of the table, you may exit MySQL.
There are two methods to import data from the Hive data warehouse into the relational database using Sqoop. You can either use the Hive data stored directly in HDFS, or you can use Hcatalog for data import.

Utilizing the Hive data in HDFS

Switch to the Sqoop folder, then use the following command to export data from the Hive database to the relational database:
[hadoop@172 hive]$ cd ../sqoop/bin
[hadoop@172 bin]$ ./sqoop-export --connect jdbc:mysql://$mysqlIP/test --username root -P
--table table_from_hive --export-dir /usr/hive/warehouse/hive_to_sqoop.db/hive_test
Where $mysqlIP is the private IP address of your relational database, test is the name of the database in the relational database, the parameter following --table is the name of your table in the relational database, and the parameter following --export-dir is the location where the data in the Hive table is stored in HDFS.

Importing using Hcatalog

Switch to the Sqoop folder, then use the following command to export data from the Hive database to the relational database:
[hadoop@172 hive]$ cd ../sqoop/bin
[hadoop@172 bin]$ ./sqoop-export --connect jdbc:mysql://$mysqlIP/test --username root -P
--table table_from_hive --hcatalog-database hive_to_sqoop --hcatalog-table hive_test
Where $mysqlIP is the private IP address of your relational database, test is the name of the database in the relational database, the parameter following --table is the name of your table in the relational database, the parameter following --hcatalog-database is the name of the database where the Hive table to be exported is located, and the parameter following --hcatalog-table is the name of the table to be exported from Hive.
Upon completion of the operation, you can enter the relational database to verify whether the import was successful:
[hadoop@172 hive]$ mysql -h $mysqlIP –p #Connect to MySQL
Enter password:
mysql> use test;
Database changed
mysql> select count(*) from table_from_hive; #The table now contains 1,000,000 records.
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.03 sec)
mysql> select * from table_from_hive limit 10; #View the first 10 records in the table
+-------+----------+
| a | b |
+-------+----------+
| 28523 | "3394" |
| 31065 | "24583" |
| 399 | "23629" |
| 18779 | "8377" |
| 25376 | "30798" |
| 20234 | "22048" |
| 30744 | "32753" |
| 21423 | "6117" |
| 26867 | "16787" |
| 18526 | "5856" |
+-------+----------+
10 rows in set (0.00 sec)
For more parameters related to the sqoop-export command, you can view them using the following command:
[hadoop@172 bin]$ ./sqoop-export --help

Importing the Hive table in ORC format into the relational database.

ORC is a columnar file storage format that significantly enhances the performance of Hive. This section introduces how to create a table in ORC format, load data into it, and then use Tencent Cloud's Sqoop service to export data stored in Hive in ORC format to a relational database.
Note
Importing a Hive table in ORC storage format into a relational database cannot directly use the data stored in HDFS, but must be operated through Hcatalog.
This section will continue to utilize the use case from the previous section.
After logging into the Master node of the EMR cluster, switch to the Hadoop user in the EMR command line and enter the Hive folder using the following command:
[root@172 ~]# su hadoop
[hadoop@172 ~]# cd /usr/local/service/hive
Create a new table in the hive_from_sqoop database created in the previous section:
[hadoop@172 hive]$ hive
hive> use hive_to_sqoop;
OK
Time taken: 0.013 seconds
hive> create table if not exists orc_test(a int,b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as orc;
The storage format of the data in the table can be viewed using the following command:
hive> show create table orc_test;
OK
CREATE TABLE orc_test(
a int,
b string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
'field.delim'=',',
'serialization.format'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://HDFS2789/usr/hive/warehouse/hive_to_sqoop.db/orc_test'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1533563293')
Time taken: 0.041 seconds, Fetched: 21 row(s)
The returned data indicates that the data storage format in this table is ORC.
There are several methods to import data into a Hive table in ORC format. The following primarily introduces importing data into an ORC format table by creating a temporary Hive table in text storage format. Here, we use the hive_test table created in the previous section as a temporary table, and import data using the following command:
hive> insert into table orc_test select * from hive_test;
After successful import, the data in the table can be viewed using the select command.
Then, use Sqoop to export the Hive table in ORC format to MySQL. Connect to the relational database and create the corresponding table. For specific methods of connecting to the relational database, refer to the above text:
[hadoop@172 hive]$ mysql -h $mysqlIP –p
Enter password:
Where $mysqlIP is the internal address of the database, and the password is the one you set when creating the cluster.
Create a table named test in MySQL, The field names in MySQL must match exactly with the field names in Hive:
mysql> create table table_from_orc (a int,b varchar(255));
Upon successful creation of the table, you may exit MySQL.
Switch to the Sqoop folder, then use the following command to export data stored in ORC format in the Hive database to the relational database:
[hadoop@172 hive]$ cd ../sqoop/bin
[hadoop@172 bin]$ ./sqoop-export --connect jdbc:mysql://$mysqlIP/test --username root -P
--table table_from_orc --hcatalog-database hive_to_sqoop --hcatalog-table orc_test
Where $mysqlIP is the private IP address of your relational database, test is the name of the database in the relational database, the parameter following --table is the name of your table in the relational database, the parameter following --hcatalog-database is the name of the database where the Hive table to be exported is located, and the parameter following --hcatalog-table is the name of the table to be exported from Hive.
After successful import, you can view the data in the corresponding table in MySQL:
mysql> select count(*) from table_from_orc;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.24 sec)
mysql> select * from table_from_orc limit 10;
+-------+----------+
| a | b |
+-------+----------+
| 28523 | "3394" |
| 31065 | "24583" |
| 399 | "23629" |
| 18779 | "8377" |
| 25376 | "30798" |
| 20234 | "22048" |
| 30744 | "32753" |
| 21423 | "6117" |
| 26867 | "16787" |
| 18526 | "5856" |
+-------+----------+
10 rows in set (0.00 sec)
For more Sqoop operations, refer to the official documentation.