Import/Export of Relational Database and HDFS

Last updated: 2023-12-26 14:39:27

Sqoop is an open-source tool, primarily designed for data transfer between Hadoop and traditional databases such as MySQL, PostgreSQL, and others. It facilitates the importation of data from a relational database (like MySQL, Oracle, Postgres, etc.) into Hadoop's HDFS, and vice versa. A significant highlight of Sqoop is its ability to import data from relational databases into HDFS via Hadoop's MapReduce.
This document elucidates the methodology of utilizing Tencent Cloud's Sqoop service for importing/exporting data between MySQL and HDFS.

Development Preparations

Ensure that Tencent Cloud is activated and an EMR cluster has been established. During the creation of the EMR cluster, it is necessary to select the Sqoop component in the software configuration interface.
Sqoop and related software are installed in the path /usr/local/service/ of the EMR cloud server.

Create a New MySQL Table

Initially, connect to the already created MySQL database, enter the EMR console, and copy the instance ID of the target cluster, which is the name of the cluster. Then, enter the relational database console, use Ctrl+F to search, locate the MySQL database corresponding to the cluster, and view the internal network address $mysqlIP of this database.
Log into any machine in the EMR cluster, preferably the Master node. For the method of logging into EMR, please refer to Logging into Linux Instance. Here, we can choose to log in using WebShell. Click on the login on the right side of the corresponding cloud server to enter the login interface. The username is set to root by default, and the password is the one entered by the user when creating EMR. After entering correctly, you can access the command line interface.
In the EMR command line, first use the following command to switch to the Hadoop user and enter the Sqoop folder:
[root@172 ~]# su hadoop
[hadoop@172 ~]# cd /usr/local/service/sqoop
Connect to MySQL Database:
[hadoop@172 sqoop]$ mysql -h $mysqlIP -p
Enter password:
The password is the one you set when creating the EMR cluster.
After connecting to the MySQL database, enter the test database and create a new table. Users can also choose their target database:
mysql> use test;
Database changed

mysql> create table sqoop_test(id int not null primary key auto_increment, title varchar(64), time timestamp, content varchar(255));
Query ok , 0 rows affected(0.00 sec)
This command creates a MySQL table with a primary key of ID, and three additional columns titled 'title', 'time', and 'content'. Data is inserted into this table as follows:
mysql> insert into sqoop_test values(null, 'first', now(), 'hdfs');
Query ok, 1 row affected(0.00 sec)

mysql> insert into sqoop_test values(null, 'second', now(), 'mr');
Query ok, 1 row affected(0.00 sec)

mysql> insert into sqoop_test values(null, 'third', now(), 'yarn');
Query ok, 1 row affected(0.00 sec)
The following command can be used to view the data in the table:
Mysql> select * from sqoop_test;
+----+--------+---------------------+---------+
| id | title | time | content |
+----+--------+---------------------+---------+
| 1 | first | 2018-07-03 15:29:37 | hdfs |
| 2 | second | 2018-07-03 15:30:57 | mr |
| 3 | third | 2018-07-03 15:31:07 | yarn |
+----+--------+---------------------+---------+
3 rows in set (0.00 sec)
Exit MySQL Database:
Mysql> exit;

Importing MySQL data into HDFS

Use sqoop-import to import the data from the sqoop_test table created in the previous step into HDFS:
[hadoop@172 sqoop]$ bin/sqoop-import --connect jdbc:mysql://$mysqlIP/test --username root
-P --table sqoop_test --target-dir /sqoop
The --connect is used to connect to the MySQL database, 'test' can be replaced with your database name, -P indicates that a password will be required, --table is the name of the database you wish to export, and --target-dir is the path to export to in HDFS. The **/sqoop** directory was not created prior to executing the command, and an error will occur if the directory already exists.
After pressing enter, you will be prompted to input your password, which is the password you set when creating EMR.
Upon successful execution, you can view the imported data in the corresponding path in HDFS:
[hadoop@172 sqoop]$ hadoop fs -cat /sqoop/*
1, first, 2018-07-03 15:29:37.0,hdfs
2, second, 2018-07-03 15:30:57.0,mr
3, third, 2018-07-03 15:31:07.0,yarn

4. Importing data from HDFS into MySQL

Initially, a new table needs to be created in MySQL to store the data from HDFS:
[hadoop@172 sqoop]$ mysql -h $mysqlIP -p
Enter password:
mysql> use test;
Database changed

mysql> create table sqoop_test_back(id int not null primary key auto_increment, title varchar(64), time timestamp, content varchar(255));
Query ok , 0 rows affected(0.00 sec)
After verifying the successful creation of the table, exit MySQL:
mysql> show tables;
+-----------------+
| Tables_in_test |
+-----------------+
| sqoop_test |
| sqoop_test_back |
+-----------------+
2 rows in set (0.00 sec)

mysql> exit;
Utilize sqoop-export to import the data from HDFS, which was imported in the previous step, back into MySQL:
[hadoop@172 sqoop]$ bin/sqoop-export --connect jdbc:mysql://$mysqlIP/test --username
root -P --table sqoop_test_back --export-dir /sqoop
The parameters are similar to sqoop-import, except it has been changed to --export-dir, which is the path for storing data in HDFS. After pressing enter, you will also need to input the password.
Upon successful execution, the data in the sqoop_test_back database can be verified:
[hadoop@172 sqoop]$ mysql -h $mysqlIP -p
Enter password:
mysql> use test;
Database changed

mysql> select * from sqoop_test_back;
+----+---------+---------------------+---------+
| id | title | time | content |
+----+---------+---------------------+---------+
| 1 | first | 2018-07-03 15:29:37 | hdfs |
| 2 | second | 2018-07-03 15:30:57 | mr |
| 3 | third | 2018-07-03 15:31:07 | yarn |
+----+---------+---------------------+---------+
3 rows in set (0.00 sec)
For more Sqoop operations, refer to the official documentation.