Apache Sqoop 将mysql导入到Hadoop HDFS

第 21 章 Apache Sqoop

目录

  • 21.1. 安装 Sqoop
  • 21.2. sqoop2-tool
    • 21.2.1. verify
    • 21.2.2. upgrade
  • 21.3. sqoop2-shell
    • 21.3.6.1. link
    • 21.3.5.1. create job
    • 21.3.5.2. show job
    • 21.3.5.3. start job
    • 21.3.5.4. status job
    • 21.3.4.1. hdfs-connector
    • 21.3.4.2. generic-jdbc-connector
    • 21.3.2.1. server
    • 21.3.2.2. 要设置可查看具体出错信息
    • 21.3.1. show version
    • 21.3.2. set
    • 21.3.3. show connector
    • 21.3.4. link
    • 21.3.5. job
    • 21.3.6. update
  • 21.4. FAQ
    • 21.4.1. Unable to load native-hadoop library for your platform

21.1. 安装 Sqoop

OSCM 一键安装

curl -s https://raw.githubusercontent.com/oscm/shell/master/database/apache-sqoop/sqoop-1.99.7-bin-hadoop200.sh | bash		

启动 Sqoop

/srv/apache-sqoop/bin/sqoop.sh server start		

检查 Sqoop 线程

		[hadoop@netkiller ~]$ jps
2512 SecondaryNameNode
23729 SqoopJettyServer
2290 DataNode
871 ResourceManager
23885 Jps

21.2. sqoop2-tool

21.2.1. verify 配置文件检验工具

[hadoop@iZj6ciilv2rcpgauqg2uuwZ ~]$ sqoop2-tool verify
Setting conf dir: /srv/apache-sqoop/bin/../conf
Sqoop home directory: /srv/apache-sqoop
Sqoop tool executor:
	Version: 1.99.7
	Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
	Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.VerifyTool
0    [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.
6    [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread
Verification was successful.
Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.			

21.3. sqoop2-shell

进入 sqoop2-shell

		[hadoop@netkiller ~]$ sqoop2-shell 
Setting conf dir: /srv/apache-sqoop/bin/../conf
Sqoop home directory: /srv/apache-sqoop
Sqoop Shell: Type 'help' or '\h' for help.

sqoop:000>		

Sqoop client script:

		sqoop2-shell /path/to/your/script.sqoop		

21.3.1. show version

			sqoop:000> show version
client version:
  Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb 
  Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016			
			sqoop:000> show version --all 
client version:
  Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb 
  Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016
0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
server version:
  Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb 
  Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016
API versions:
  [v1]			

21.3.2. set

21.3.2.1. server

				sqoop:000> set server --host master --port 12000 --webapp sqoop
Server is set successfully			

21.3.2.2. 要设置可查看具体出错信息

				sqoop:000> set option --name verbose --value true
Verbose option was changed to true				

21.3.3. show connector

sqoop:000> show connector
0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+------------------------+---------+------------------------------------------------------------+----------------------+
|          Name          | Version |                           Class                            | Supported Directions |
+------------------------+---------+------------------------------------------------------------+----------------------+
| generic-jdbc-connector | 1.99.7  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector       | FROM/TO              |
| kite-connector         | 1.99.7  | org.apache.sqoop.connector.kite.KiteConnector              | FROM/TO              |
| oracle-jdbc-connector  | 1.99.7  | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO              |
| ftp-connector          | 1.99.7  | org.apache.sqoop.connector.ftp.FtpConnector                | TO                   |
| hdfs-connector         | 1.99.7  | org.apache.sqoop.connector.hdfs.HdfsConnector              | FROM/TO              |
| kafka-connector        | 1.99.7  | org.apache.sqoop.connector.kafka.KafkaConnector            | TO                   |
| sftp-connector         | 1.99.7  | org.apache.sqoop.connector.sftp.SftpConnector              | TO                   |
+------------------------+---------+------------------------------------------------------------+----------------------+
sqoop:000>			
sqoop list-databases --connect  jdbc:mysql://192.168.1.1:3306/ --username root --password 123456			
			sqoop:000> show connector --all

21.3.4. link

21.3.4.1. hdfs-connector

				sqoop:000> create link -connector hdfs-connector
0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: hdfs

HDFS cluster

URI: hdfs://127.0.0.1:9000
Conf directory: 
Additional configs:: 
There are currently 0 values in the map:
entry# 
New link was successfully created with validation status OK and name hdfs
sqoop:000> 
				sqoop:000> show link
+------+----------------+---------+
| Name | Connector Name | Enabled |
+------+----------------+---------+
| hdfs | hdfs-connector | true    |
+------+----------------+---------+			

21.3.4.2. generic-jdbc-connector

				sqoop:000> create link -connector generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: mysql

Database connection

Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://127.0.0.1:3306/test
Username: test
Password: ****
Fetch Size: 
Connection Properties: 
There are currently 0 values in the map:
entry# 

SQL Dialect

Identifier enclose: 
New link was successfully created with validation status OK and name mysql		
				sqoop:000> show link
+-------+------------------------+---------+
| Name  |     Connector Name     | Enabled |
+-------+------------------------+---------+
| mysql | generic-jdbc-connector | true    |
| hdfs  | hdfs-connector         | true    |
+-------+------------------------+---------+			

21.3.5. job

21.3.5.1. create job

				sqoop:000> create job -f "mysql" -t "hdfs"
Creating job for links with from name mysql and to name hdfs
Please fill following values to create new job object
Name: from-mysql-to-hdfs

Database source

Schema name: test
Table name: member
SQL statement: 
Column names: 
There are currently 0 values in the list:
element# 
Partition column: 
Partition column nullable: 
Boundary query: 

Incremental read

Check column: 
Last value: 

Target configuration

Override null value: 
Null value: 
File format: 
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
  2 : PARQUET_FILE
Choose: 0
Compression codec: 
  0 : NONE
  1 : DEFAULT
  2 : DEFLATE
  3 : GZIP
  4 : BZIP2
  5 : LZO
  6 : LZ4
  7 : SNAPPY
  8 : CUSTOM
Choose: 0
Custom codec: 
Output directory: /sqoop/member
Append mode: 

Throttling resources

Extractors: 
Loaders: 

Classpath configuration

Extra mapper jars: 
There are currently 0 values in the list:
element# 
New job was successfully created with validation status OK  and name from-mysql-to-hdfs				

21.3.5.2. show job

				sqoop:000> show job
+----+--------------------+--------------------------------+-----------------------+---------+
| Id |        Name        |         From Connector         |     To Connector      | Enabled |
+----+--------------------+--------------------------------+-----------------------+---------+
| 1  | from-mysql-to-hdfs | mysql (generic-jdbc-connector) | hdfs (hdfs-connector) | true    |
+----+--------------------+--------------------------------+-----------------------+---------+				

21.3.5.3. start job

				sqoop:000> start job -n from-mysql-to-hdfs		

sqoop:000> start job -n from-mysql-to-hdfs
Submission details
Job Name: from-mysql-to-hdfs
Server URL: http://localhost:12000/sqoop/
Created by: hadoop
Creation date: 2017-07-22 23:18:02 CST
Lastly updated by: hadoop
External ID: job_1499236611045_0001
	http://iZj6ciilv2rcpgauqg2uuwZ:8088/proxy/application_1499236611045_0001/
2017-07-22 23:18:02 CST: BOOTING  - Progress is not available				

启动后进入HDFS查看导入情况

				[hadoop@netkiller ~]$ hdfs dfs -ls /sqoop	

[hadoop@netkiller ~]$ hdfs dfs -ls /member
Found 10 items
-rw-r--r--   3 hadoop supergroup          0 2017-07-22 23:18 /member/310af608-5533-4bc2-bfb8-eaa45470b04d.txt
-rw-r--r--   3 hadoop supergroup         48 2017-07-22 23:18 /member/36bc39a5-bc73-4065-a361-ff2d61c4922c.txt
-rw-r--r--   3 hadoop supergroup          0 2017-07-22 23:18 /member/3e855400-84a9-422d-b50c-1baa9666a719.txt
-rw-r--r--   3 hadoop supergroup        140 2017-07-22 23:18 /member/3e8dad92-e0f1-4a74-a337-642cf4e6d634.txt
-rw-r--r--   3 hadoop supergroup         55 2017-07-22 23:18 /member/4a9f47f1-0413-4149-a93a-ed8b51efbc87.txt
-rw-r--r--   3 hadoop supergroup          0 2017-07-22 23:18 /member/4dc5bfe7-1cd9-4d9b-96a8-07e82ed79a71.txt
-rw-r--r--   3 hadoop supergroup          0 2017-07-22 23:18 /member/60dbcc60-61f2-4433-af39-1dfdfc048940.txt
-rw-r--r--   3 hadoop supergroup          0 2017-07-22 23:18 /member/6d02ed89-94d9-4d4b-87ed-d5da9d2bf9fe.txt
-rw-r--r--   3 hadoop supergroup        209 2017-07-22 23:18 /member/cf7b7185-3ab6-4077-943a-26228b769c57.txt
-rw-r--r--   3 hadoop supergroup          0 2017-07-22 23:18 /member/f2e0780d-ad33-4b35-a1c7-b3fbc23e303d.txt	

21.3.5.4. status job

				sqoop:000> status job -n from-mysql-to-hdfs		

21.3.6. update

21.3.6.1. link

				sqoop:000> update link -n  mysql
Updating link with name mysql
Please update link:
Name: mysql

Database connection

Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://127.0.0.1:3306/test
Username: test
Password: ****
Fetch Size: 
Connection Properties: 
There are currently 0 values in the map:
entry# 

SQL Dialect

Identifier enclose:  
link was successfully updated with status OK				

原文发布于微信公众号 - Netkiller(netkiller-ebook)

原文发表时间:2017-07-24

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏数据库新发现

Dataguard配置Step by Step

http://www.eygle.com/ha/dataguard-step-by-step.htm

602
来自专栏乐沙弥的世界

ORA-00119,ORA-00132 错误处理

最近系统启动时,收到了ORA-00119以及ORA-00132的错误,该错误实际上跟LISTENER有关,通常的处理办法是将spfile转储为pfile然后从...

812
来自专栏乐沙弥的世界

Oracle 11g RAC 环境下单实例非缺省监听及端口配置

      如果在Oracle 11g RAC环境下使用dbca创建单实例数据库后,Oracle会自动将其注册到缺省的1521端口及监听器。大多数情况下我们使用...

702
来自专栏杨建荣的学习笔记

ORA-17500 ODM err的问题排查(r2笔记78天)

今天在一套环境中做系统检查的时候,发现alert日志中有一段ODM的错误。 日志内容大体如下,可以看到是在半夜4点多报的错误。 Clearing Resourc...

2473
来自专栏杨建荣的学习笔记

一封备库报警邮件的分析(r6笔记第82天)

对于orabbix报出的警报,自己都是心怀敬畏,因为这些表面的现象如果深入分析没准就能有所收获,当然目的还是解决问题,不是一味追求指标。 今天收到的报警邮件如下...

2624
来自专栏杨建荣的学习笔记

记一次数据库重启后归档急剧增加的问题(98天)

在本地的环境中测试外部表的性能,由于空间有限,不一会儿归档的空间就爆了。然后文件貌似出现了系统级的问题,刚刚rm掉的归档日志文件。隔了几秒钟再ls,就出现了。怎...

3184
来自专栏杨建荣的学习笔记

11g rac配置scan ip(r6笔记第30天)

如果是从10g转战11g rac就会发现很多不同之处,其中一个比较大的改变就是在11g中有了一个新特性scan,其实这是一个简称,完整的名称为:SCAN(Si...

2444
来自专栏数据和云

RMAN CATALOG命令手动注册磁带库中的备份片

ORACLE 官方文档中介绍 CATALOG 命令只能注册在磁盘中的备份片,在现在多数环境中备份时,备份集都是放到磁带库中,那么 CATALOG 命令真就不支持...

961
来自专栏杨建荣的学习笔记

使用shell批量监控磁盘坏块(二) (r7笔记第80天)

之前分享了第一篇 使用shell批量监控磁盘坏块(一),今天来简单说说这个需求的实现内容 。 首先主要思路是通过中控的机器来发送监控的细节,然后返回结果。 ?...

3324
来自专栏数据库新发现

通过SQLNET.ora文件限制Ip地址访问

链接:http://www.eygle.com/archives/2008/01/sqlnetora_ip_limit.html

683

扫码关注云+社区