前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Datax助力轻松迁移SQLServer数据至GreatSQL

Datax助力轻松迁移SQLServer数据至GreatSQL

作者头像
GreatSQL社区
发布2024-04-25 18:24:47
630
发布2024-04-25 18:24:47
举报

1.环境说明 1.1源端SQLSserver 版本IP端口Microsoft SQL Server 2017192.168.140.1601433 1.2目标端GreatSQL 版本IP端口GreatSQL-8.0.32192.168.139.863308 2.安装环境 2.1安装SQLServer环境 环境说明:借助Docker使用镜像启动数据库 2.1.1安装docker 1.安装基础软件包 $ yum install -y wget net-tools nfs-utils lrzsz gcc gcc-c++ make cmake libxml2-devel openssl-devel curl curl-devel unzip sudo ntp libaio-devel wget vim ncurses-devel autoconf automake zlib-devel python-devel epel-release openssh-server socat ipvsadm conntrack yum-utils 2.配置 docker-ce 国内 yum 源(阿里云) $ yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo 3.安装 docker 依赖包 $ yum install -y device-mapper-persistent-data lvm2 4.安装 docker-ce $ yum install docker-ce -y 5.启动容器 $ systemctl start docker && systemctl enable docker 2.1.2 拉取镜像 $ docker pull mcr.microsoft.com/mssql/server:2017-latest 2.1.3运行容器 $ docker run -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=********" \ -p 1433:1433 --name sqlserver2017 \ -d mcr.microsoft.com/mssql/server:2017-latest 此处记得密码设置为复杂密码 参数解释:

  • -e "ACCEPT_EULA=Y":默认选择同意协议许可证
  • -e "SA_PASSWORD=********":设置连接密码,密码不能太短太简单,否则不满足sqlserver密码规范,容器会停止运行
  • -p 1433:1433: 宿主机端口映射到容器端口(前者为宿主机)
  • --name sqlserver2017:容器别名
  • -d: 后台运行
  • mcr.microsoft.com/mssql/server:2017-latest:镜像名称:标签

2.1.4使用数据库 1.进入容器 $ docker exec -it sqlserver2017 bash 2.连接数据库 $ /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "********" 3.查询数据库 1> select name from sys.Databases; 2> go 4.创建数据库 1> create database testdb; 2> go 5.创建表并插入数据 use testdb create table t1(id int) go Insert into t1 values(1),(2) go 2.2安装GreatSQL环境 使用Docker镜像进行安装,直接拉取GreatSQL镜像即可 $ docker pull greatsql/greatsql 并创建GreatSQL容器 $ docker run -d --name greatsql --hostname=greatsql -e MYSQL_ALLOW_EMPTY_PASSWORD=1 greatsql/greatsql 2.3安装datax DataX安装需要依赖的环境

  • JDK(1.8以上,推荐1.8)
  • Python(推荐Python2.6.X及以上)

安装步骤,解压即用,但是解压完成不做其他操作运行job会报错 $ cd /soft $ ll total 3764708 -rw-r--r-- 1 root root 853734462 Dec 9 04:06 datax.tar.gz $ tar xf datax.tar.gz $ python /soft/datax/bin/datax.py /soft/datax/job/job.json DataX (DATAX-OPENSOURCE-3.0), From Alibaba ! Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved. 2023-07-19 11:19:17.483 [main] WARN ConfigParser - 插件[streamreader,streamwriter]加载失败,1s后重试... Exception:Code:[Common-00], Describe:[您提供的配置文件存在错误信息,请检查您的作业配置 .] - 配置信息错误,您提供的配置文件[/soft/datax/plugin/reader/._mysqlreader/plugin.json]不存在. 请检查您的配置文件. 2023-07-19 11:19:18.488 [main] ERROR Engine - 经DataX智能分析,该任务最可能的错误原因是: com.alibaba.datax.common.exception.DataXException: Code:[Common-00], Describe:[您提供的配置文件存在错误信息,请检查您的作业配置 .] - 配置信息错误,您提供的配置文件[/soft/datax/plugin/reader/._mysqlreader/plugin.json]不存在. 请检查您的配置文件. at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:26) at com.alibaba.datax.common.util.Configuration.from(Configuration.java:95) at com.alibaba.datax.core.util.ConfigParser.parseOnePluginConfig(ConfigParser.java:153) at com.alibaba.datax.core.util.ConfigParser.parsePluginConfig(ConfigParser.java:125) at com.alibaba.datax.core.util.ConfigParser.parse(ConfigParser.java:63) at com.alibaba.datax.core.Engine.entry(Engine.java:137) at com.alibaba.datax.core.Engine.main(Engine.java:204) 解决报错:删除plugin目录下以及plugin/reader和plugin/writer目录下的所有._开头的文件 需要删除三个目录下的隐藏文件

  • plugin/
  • plugin/reader/
  • plugin/writer/

$ rm -rf /opt/app/datax/plugin/._* $ rm -rf /opt/app/datax/plugin/reader/._* $ rm -rf /opt/app/datax/plugin/writer/._* 运行一个测例,检测datax是否安装成功 $ python /soft/datax/bin/datax.py /soft/datax/job/job.json DataX (DATAX-OPENSOURCE-3.0), From Alibaba ! Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved. 2023-07-19 11:22:12.298 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl 2023-07-19 11:22:12.305 [main] INFO Engine - the machine info => osInfo: Oracle Corporation 1.8 25.251-b08 jvmInfo: Linux amd64 4.19.25-200.1.el7.bclinux.x86_64 cpu num: 48 totalPhysicalMemory: -0.00G freePhysicalMemory: -0.00G maxFileDescriptorCount: -1 currentOpenFileDescriptorCount: -1 GC Names [PS MarkSweep, PS Scavenge] MEMORY_NAME | allocation_size | init_size PS Eden Space | 256.00MB | 256.00MB Code Cache | 240.00MB | 2.44MB Compressed Class Space | 1,024.00MB | 0.00MB PS Survivor Space | 42.50MB | 42.50MB PS Old Gen | 683.00MB | 683.00MB Metaspace | -0.00MB | 0.00MB 2023-07-19 11:22:12.320 [main] INFO Engine - {"content":[{"reader":{"name":"streamreader", "parameter":{"column":[ {"type":"string","value":"DataX"}, {"type":"long","value":19890604}, {"type":"date","value":"1989-06-04 00:00:00"}, {"type":"bool","value":true}, {"type":"bytes","value":"test"} ],"sliceRecordCount":100000} },"writer":{"name":"streamwriter","parameter":{"encoding":"UTF-8","print":false}}}], "setting":{"errorLimit":{"percentage":0.02,"record":0}, "speed":{"byte":10485760}}} 2023-07-19 11:22:12.336 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null 2023-07-19 11:22:12.337 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0 2023-07-19 11:22:12.338 [main] INFO JobContainer - DataX jobContainer starts job. 2023-07-19 11:22:12.339 [main] INFO JobContainer - Set jobId = 0 2023-07-19 11:22:12.352 [job-0] INFO JobContainer - jobContainer starts to do prepare ... 2023-07-19 11:22:12.352 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do prepare work . 2023-07-19 11:22:12.352 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do prepare work . 2023-07-19 11:22:12.352 [job-0] INFO JobContainer - jobContainer starts to do split ... 2023-07-19 11:22:12.353 [job-0] INFO JobContainer - Job set Max-Byte-Speed to 10485760 bytes. 2023-07-19 11:22:12.354 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks. 2023-07-19 11:22:12.354 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks. 2023-07-19 11:22:12.371 [job-0] INFO JobContainer - jobContainer starts to do schedule ... 2023-07-19 11:22:12.375 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups. 2023-07-19 11:22:12.376 [job-0] INFO JobContainer - Running by standalone Mode. 2023-07-19 11:22:12.384 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks. 2023-07-19 11:22:12.388 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated. 2023-07-19 11:22:12.388 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated. 2023-07-19 11:22:12.396 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started 2023-07-19 11:22:12.697 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[302]ms 2023-07-19 11:22:12.698 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks. 2023-07-19 11:22:22.402 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.020s | All Task WaitReaderTime 0.033s | Percentage 100.00% 2023-07-19 11:22:22.402 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks. 2023-07-19 11:22:22.402 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do post work. 2023-07-19 11:22:22.403 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do post work. 2023-07-19 11:22:22.403 [job-0] INFO JobContainer - DataX jobId [0] completed successfully. 2023-07-19 11:22:22.403 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /soft/datax/hook 2023-07-19 11:22:22.404 [job-0] INFO JobContainer - [total cpu info] => averageCpu | maxDeltaCpu | minDeltaCpu -1.00% | -1.00% | -1.00% [total gc info] => NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime PS MarkSweep | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s PS Scavenge | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s 2023-07-19 11:22:22.404 [job-0] INFO JobContainer - PerfTrace not enable! 2023-07-19 11:22:22.404 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.020s | All Task WaitReaderTime 0.033s | Percentage 100.00% 2023-07-19 11:22:22.406 [job-0] INFO JobContainer - 任务启动时刻 : 2023-07-19 11:22:12 任务结束时刻 : 2023-07-19 11:22:22 任务总计耗时 : 10s 任务平均流量 : 253.91KB/s 记录写入速度 : 10000rec/s 读出记录总数 : 100000 读写失败总数 : 0 3.SQLServer2GreatSQL全量迁移 3.1 源端(SQLServer)造测试数据 $ docker exec -it 47bd0ed79c26 /bin/bash $ /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "********" 1> create database testdb 1> use testdb 1> insert into t1 values(1),(2),(3); 2> go 1> select * from t1; 2> go id \----------- 1 2 3 3.2 目标端(GreatSQL)创建表结构 greatsql> create database testdb; greatsql> use testdb; greatsql> create table t1 (id int primary key); 3.3 编写Datax的job文件 $ cat /soft/datax/job/sqlserver_to_greatsql.json { "job": { "content": [ { "reader": { "name": "sqlserverreader", "parameter": { "connection": [ { "jdbcUrl": ["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb"], "table": ["t1"] } ], "password": "********", "username": "SA", "column": ["*"] } }, "writer": { "name": "mysqlwriter", "parameter": { "column": ["*"], "connection": [ { "jdbcUrl": "jdbc:mysql://10.17.139.86:3308/testdb", "table": ["t1"] } ], "password": "******", "session": [], "username": "admin", "writeMode": "insert" } } } ], "setting": { "speed": { "channel": "5" } } } } 3.4 运行Datax迁移任务 $ python /soft/datax/bin/datax.py /soft/datax/job/sqlserver_to_greatsql.json DataX (DATAX-OPENSOURCE-3.0), From Alibaba ! Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved. 2023-11-28 09:58:44.087 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl 2023-11-28 09:58:44.104 [main] INFO Engine - the machine info => osInfo: Oracle Corporation 1.8 25.181-b13 jvmInfo: Linux amd64 3.10.0-957.el7.x86_64 cpu num: 8 totalPhysicalMemory: -0.00G freePhysicalMemory: -0.00G maxFileDescriptorCount: -1 currentOpenFileDescriptorCount: -1 GC Names [PS MarkSweep, PS Scavenge] MEMORY_NAME | allocation_size | init_size PS Eden Space | 256.00MB | 256.00MB Code Cache | 240.00MB | 2.44MB Compressed Class Space | 1,024.00MB | 0.00MB PS Survivor Space | 42.50MB | 42.50MB PS Old Gen | 683.00MB | 683.00MB Metaspace | -0.00MB | 0.00MB 2023-11-28 09:58:44.137 [main] INFO Engine - { "content":[ {"reader":{ "name":"sqlserverreader", "parameter":{ "column":["*"], "connection":[ {"jdbcUrl":["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb"], "table":["t1"]}], "password":"*************", "username":"SA"}}, "writer":{"name":"mysqlwriter","parameter":{"column":["*"], "connection":[{"jdbcUrl":"jdbc:mysql://10.17.139.86:3308/testdb", "table":["t1"]}], "password":"********", "session":[], "username":"admin", "writeMode":"insert"}}}], "setting":{"speed":{"channel":"5"}}} 2023-11-28 09:58:44.176 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null 2023-11-28 09:58:44.179 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0 2023-11-28 09:58:44.180 [main] INFO JobContainer - DataX jobContainer starts job. 2023-11-28 09:58:44.183 [main] INFO JobContainer - Set jobId = 0 2023-11-28 09:58:44.542 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb. 2023-11-28 09:58:44.544 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改. Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. 2023-11-28 09:58:45.099 [job-0] INFO OriginalConfPretreatmentUtil - table:[t1] all columns:[id]. 2023-11-28 09:58:45.099 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改. 2023-11-28 09:58:45.102 [job-0] INFO OriginalConfPretreatmentUtil - Write data [ insert INTO %s (id) VALUES(?) ], which jdbcUrl like:[jdbc:mysql://10..17.139.86:16310/testdb?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true] 2023-11-28 09:58:45.103 [job-0] INFO JobContainer - jobContainer starts to do prepare ... 2023-11-28 09:58:45.103 [job-0] INFO JobContainer - DataX Reader.Job [sqlserverreader] do prepare work 2023-11-28 09:58:45.104 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do prepare work . 2023-11-28 09:58:45.104 [job-0] INFO JobContainer - jobContainer starts to do split ... 2023-11-28 09:58:45.105 [job-0] INFO JobContainer - Job set Channel-Number to 5 channels. 2023-11-28 09:58:45.112 [job-0] INFO JobContainer - DataX Reader.Job [sqlserverreader] splits to [1] tasks. 2023-11-28 09:58:45.114 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks. 2023-11-28 09:58:45.135 [job-0] INFO JobContainer - jobContainer starts to do schedule ... 2023-11-28 09:58:45.139 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups. 2023-11-28 09:58:45.142 [job-0] INFO JobContainer - Running by standalone Mode. 2023-11-28 09:58:45.151 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks. 2023-11-28 09:58:45.157 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated. 2023-11-28 09:58:45.158 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated. 2023-11-28 09:58:45.173 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started 2023-11-28 09:58:45.181 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select * from t1 ] jdbcUrl:[jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb]. 2023-11-28 09:58:45.398 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select * from t1 ] jdbcUrl:[jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb]. 2023-11-28 09:58:45.454 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[284]ms 2023-11-28 09:58:45.455 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks. 2023-11-28 09:58:55.175 [job-0] INFO StandAloneJobContainerCommunicator - Total 3 records, 3 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00% 2023-11-28 09:58:55.175 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks. 2023-11-28 09:58:55.175 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do post work. 2023-11-28 09:58:55.176 [job-0] INFO JobContainer - DataX Reader.Job [sqlserverreader] do post work. 2023-11-28 09:58:55.176 [job-0] INFO JobContainer - DataX jobId [0] completed successfully. 2023-11-28 09:58:55.176 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /soft/datax/hook 2023-11-28 09:58:55.177 [job-0] INFO JobContainer - [total cpu info] => averageCpu | maxDeltaCpu | minDeltaCpu -1.00% | -1.00% | -1.00% [total gc info] => NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime PS MarkSweep | 1 | 1 | 1 | 0.061s | 0.061s | 0.061s PS Scavenge | 1 | 1 | 1 | 0.039s | 0.039s | 0.039s 2023-11-28 09:58:55.177 [job-0] INFO JobContainer - PerfTrace not enable! 2023-11-28 09:58:55.177 [job-0] INFO StandAloneJobContainerCommunicator - Total 3 records, 3 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00% 2023-11-28 09:58:55.179 [job-0] INFO JobContainer - 任务启动时刻 : 2023-11-28 09:58:44 任务结束时刻 : 2023-11-28 09:58:55 任务总计耗时 : 10s 任务平均流量 : 0B/s 记录写入速度 : 0rec/s 读出记录总数 : 3 读写失败总数 : 0 3.5 到目标端验证数据 greatsql> select * from t1; +----+ | id | +----+ | 1 | | 2 | | 3 | +----+ 3 rows in set (0.01 sec) 4. SQLServer to GreatSQL增量迁移 4.1 源端(SQLServer)创建测试数据 2> create table t2 (id int,createtime datetime); 3> go 1> insert into t2 values(1,GETDATE()); 2> g (1 rows affected) 1> insert into t2 values(2,GETDATE()); 2> go (1 rows affected) 1> insert into t2 values(3,GETDATE()); 2> go (1 rows affected) 1> insert into t2 values(4,GETDATE()); 2> go (1 rows affected) 1> insert into t2 values(5,GETDATE()); 2> go (1 rows affected) 1> insert into t2 values(6,GETDATE()); 2> go (1 rows affected) 1> select * from t2; 2> go id createtime ---------- ----------------------- 1 2023-11-28 02:18:20.790 2 2023-11-28 02:18:27.040 3 2023-11-28 02:18:32.103 4 2023-11-28 02:18:37.690 5 2023-11-28 02:18:41.450 6 2023-11-28 02:18:46.330 4.2 编写Datax的全量迁移job文件 $ cat sqlserver_to_greatsql_inc.json { "job": { "content": [ { "reader": { "name": "sqlserverreader", "parameter": { "connection": [ { "jdbcUrl": ["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb"], "table": ["t2"] } ], "password": "********", "username": "SA", "column": ["*"] } }, "writer": { "name": "mysqlwriter", "parameter": { "column": ["*"], "connection": [ { "jdbcUrl": "jdbc:mysql://10.17.139.86:3308/testdb", "table": ["t2"] } ], "password": "!QAZ2wsx", "session": [], "username": "admin", "writeMode": "insert" } } } ], "setting": { "speed": { "channel": "5" } } } } 4.3 运行Datax全量迁移任务 $ python /soft/datax/bin/datax.py /soft/datax/job/sqlserver_to_greatsql_inc.json DataX (DATAX-OPENSOURCE-3.0), From Alibaba ! Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved. 2023-11-28 10:19:59.279 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl 2023-11-28 10:19:59.286 [main] INFO Engine - the machine info => osInfo: Oracle Corporation 1.8 25.181-b13 jvmInfo: Linux amd64 3.10.0-957.el7.x86_64 cpu num: 8 totalPhysicalMemory: -0.00G freePhysicalMemory: -0.00G maxFileDescriptorCount: -1 currentOpenFileDescriptorCount: -1 GC Names [PS MarkSweep, PS Scavenge] MEMORY_NAME | allocation_size | init_size PS Eden Space | 256.00MB | 256.00MB Code Cache | 240.00MB | 2.44MB Compressed Class Space | 1,024.00MB | 0.00MB PS Survivor Space | 42.50MB | 42.50MB PS Old Gen | 683.00MB | 683.00MB Metaspace | -0.00MB | 0.00MB 2023-11-28 10:19:59.302 [main] INFO Engine - {"content":[{"reader":{"name":"sqlserverreader","parameter":{"column":[ "*"],"connection":[{"jdbcUrl":["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb"], "table":["t2"]}],"password":"*************","username":"SA"}}, "writer":{"name":"mysqlwriter","parameter":{"column":["*"], "connection":[{"jdbcUrl":"jdbc:mysql://10..17.139.86:16310/testdb","table":["t2"]}], "password":"********", "session":[], "username":"admin", "writeMode":"insert"}}}], "setting":{"speed":{"channel":"5"}}} 2023-11-28 10:19:59.319 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null 2023-11-28 10:19:59.321 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0 2023-11-28 10:19:59.321 [main] INFO JobContainer - DataX jobContainer starts job. 2023-11-28 10:19:59.324 [main] INFO JobContainer - Set jobId = 0 2023-11-28 10:19:59.629 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb. 2023-11-28 10:19:59.630 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改. Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. 2023-11-28 10:20:00.027 [job-0] INFO OriginalConfPretreatmentUtil - table:[t2] all columns:[ id,createtime]. 2023-11-28 10:20:00.027 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改. 2023-11-28 10:20:00.029 [job-0] INFO OriginalConfPretreatmentUtil - Write data [ insert INTO %s (id,createtime) VALUES(?,?) ], which jdbcUrl like:[jdbc:mysql://10..17.139.86:16310/testdb?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true] 2023-11-28 10:20:00.030 [job-0] INFO JobContainer - jobContainer starts to do prepare ... 2023-11-28 10:20:00.031 [job-0] INFO JobContainer - DataX Reader.Job [sqlserverreader] do prepare work . 2023-11-28 10:20:00.031 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do prepare work . 2023-11-28 10:20:00.032 [job-0] INFO JobContainer - jobContainer starts to do split ... 2023-11-28 10:20:00.032 [job-0] INFO JobContainer - Job set Channel-Number to 5 channels. 2023-11-28 10:20:00.037 [job-0] INFO JobContainer - DataX Reader.Job [sqlserverreader] splits to [1] tasks. 2023-11-28 10:20:00.038 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks. 2023-11-28 10:20:00.060 [job-0] INFO JobContainer - jobContainer starts to do schedule ... 2023-11-28 10:20:00.063 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups. 2023-11-28 10:20:00.066 [job-0] INFO JobContainer - Running by standalone Mode. 2023-11-28 10:20:00.073 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks. 2023-11-28 10:20:00.080 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated. 2023-11-28 10:20:00.080 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated. 2023-11-28 10:20:00.093 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started 2023-11-28 10:20:00.101 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select * from t2 ] jdbcUrl:[jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb]. 2023-11-28 10:20:00.262 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select * from t2 ] jdbcUrl:[jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb]. 2023-11-28 10:20:00.334 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[243]ms 2023-11-28 10:20:00.335 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks. 2023-11-28 10:20:10.087 [job-0] INFO StandAloneJobContainerCommunicator - Total 6 records, 54 bytes | Speed 5B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00% 2023-11-28 10:20:10.088 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks. 2023-11-28 10:20:10.088 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do post work. 2023-11-28 10:20:10.089 [job-0] INFO JobContainer - DataX Reader.Job [sqlserverreader] do post work. 2023-11-28 10:20:10.090 [job-0] INFO JobContainer - DataX jobId [0] completed successfully. 2023-11-28 10:20:10.091 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /soft/datax/hook 2023-11-28 10:20:10.094 [job-0] INFO JobContainer - [total cpu info] => averageCpu | maxDeltaCpu | minDeltaCpu -1.00% | -1.00% | -1.00% [total gc info] => NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime PS MarkSweep | 1 | 1 | 1 | 0.034s | 0.034s | 0.034s PS Scavenge | 1 | 1 | 1 | 0.031s | 0.031s | 0.031s 2023-11-28 10:20:10.094 [job-0] INFO JobContainer - PerfTrace not enable! 2023-11-28 10:20:10.095 [job-0] INFO StandAloneJobContainerCommunicator - Total 6 records, 54 bytes | Speed 5B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00% 2023-11-28 10:20:10.097 [job-0] INFO JobContainer - 任务启动时刻 : 2023-11-28 10:19:59 任务结束时刻 : 2023-11-28 10:20:10 任务总计耗时 : 10s 任务平均流量 : 5B/s 记录写入速度 : 0rec/s 读出记录总数 : 6 读写失败总数 : 0 4.4 验证全量迁移的数据 greatsql> select * from t2; +----+---------------------+ | id | createtime | +----+---------------------+ | 1 | 2023-11-28 02:18:21 | | 2 | 2023-11-28 02:18:27 | | 3 | 2023-11-28 02:18:32 | | 4 | 2023-11-28 02:18:38 | | 5 | 2023-11-28 02:18:41 | | 6 | 2023-11-28 02:18:46 | +----+---------------------+ 还可以用 checksum table x 来验证,大表就不能全表select * 了 4.5 源端(SQLServer)插入增量数据 2> insert into t2 values(7,'202311-28 03:18:46.330'); 3> go Changed database context to 'jem_db'. (1 rows affected) 1> insert into t2 values(8,'2023-11-28 03:20:46.330'); 2> go (1 rows affected) 1> insert into t2 values(9,'2023-11-28 03:25:46.330'); 2> go (1 rows affected) 1> insert into t2 values(10,'2023-11-28 03:30:46.330'); 2> go (1 rows affected) 1> select * from t2; 2> go id createtime ----------- ----------------------- 1 2023-11-28 02:18:20.790 2 2023-11-28 02:18:27.040 3 2023-11-28 02:18:32.103 4 2023-11-28 02:18:37.690 5 2023-11-28 02:18:41.450 6 2023-11-28 02:18:46.330 7 2023-11-28 03:18:46.330 8 2023-11-28 03:20:46.330 9 2023-11-28 03:25:46.330 10 2023-11-28 03:30:46.330 4.6 编写Datax增量迁移job文件 $ cat sqlserver_to_greatsql_inc.json { "job": { "content": [ { "reader": { "name": "sqlserverreader", "parameter": { "connection": [ { "jdbcUrl": ["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb"], "table": ["t2"] } ], "password": "********", "username": "SA", "column": ["*"], "where":"createtime > '${start_time}' and createtime < '${end_time}'" } }, "writer": { "name": "mysqlwriter", "parameter": { "column": ["*"], "connection": [ { "jdbcUrl": "jdbc:mysql://10..17.139.86:16310/testdb", "table": ["t2"] } ], "password": "!QAZ2wsx", "session": [], "username": "admin", "writeMode": "insert" } } } ], "setting": { "speed": { "channel": "5" } } } } 4.7 运行Datax增量迁移任务 $ python /soft/datax/bin/datax.py /soft/datax/job/sqlserver_to_mysql_inc.json -p "-Dstart_time='2023-11-28 03:17:46.330' -Dend_time='2023-11-28 03:31:46.330'" DataX (DATAX-OPENSOURCE-3.0), From Alibaba ! Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved. 2023-11-28 10:29:24.492 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl 2023-11-28 10:29:24.504 [main] INFO Engine - the machine info => osInfo: Oracle Corporation 1.8 25.181-b13 jvmInfo: Linux amd64 3.10.0-957.el7.x86_64 cpu num: 8 totalPhysicalMemory: -0.00G freePhysicalMemory: -0.00G maxFileDescriptorCount: -1 currentOpenFileDescriptorCount: -1 GC Names [PS MarkSweep, PS Scavenge] MEMORY_NAME | allocation_size | init_size PS Eden Space | 256.00MB | 256.00MB Code Cache | 240.00MB | 2.44MB Compressed Class Space | 1,024.00MB | 0.00MB PS Survivor Space | 42.50MB | 42.50MB PS Old Gen | 683.00MB | 683.00MB Metaspace | -0.00MB | 0.00MB 2023-11-28 10:29:24.524 [main] INFO Engine - {"content":[{"reader":{"name":"sqlserverreader","parameter":{"column":["*"], "connection":[{"jdbcUrl":["jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb"], "table":["t2"]}],"password":"*************","username":"SA", "where":"createtime > '2023-11-28 03:17:46.330' and createtime < '2023-11-28 03:31:46.330'"}}, "writer":{"name":"mysqlwriter","parameter":{"column":["*"],"connection":[{"jdbcUrl":"jdbc:mysql://10..17.139.86:16310/testdb","table":["t2"]}], "password":"********", "session":[], "username":"admin", "writeMode":"insert"}}}], "setting":{"speed":{"channel":"5"}}} 2023-11-28 10:29:24.542 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null 2023-11-28 10:29:24.544 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0 2023-11-28 10:29:24.544 [main] INFO JobContainer - DataX jobContainer starts job. 2023-11-28 10:29:24.546 [main] INFO JobContainer - Set jobId = 0 2023-11-28 10:29:24.830 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb. 2023-11-28 10:29:24.831 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改. Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. 2023-11-28 10:29:25.113 [job-0] INFO OriginalConfPretreatmentUtil - table:[t2] all columns:[id,createtime]. 2023-11-28 10:29:25.113 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改. 2023-11-28 10:29:25.115 [job-0] INFO OriginalConfPretreatmentUtil - Write data [ insert INTO %s (id,createtime) VALUES(?,?) ], which jdbcUrl like:[jdbc:mysql://10..17.139.86:16310/testdb?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true] 2023-11-28 10:29:25.116 [job-0] INFO JobContainer - jobContainer starts to do prepare ... 2023-11-28 10:29:25.117 [job-0] INFO JobContainer - DataX Reader.Job [sqlserverreader] do prepare work . 2023-11-28 10:29:25.117 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do prepare work . 2023-11-28 10:29:25.118 [job-0] INFO JobContainer - jobContainer starts to do split ... 2023-11-28 10:29:25.118 [job-0] INFO JobContainer - Job set Channel-Number to 5 channels. 2023-11-28 10:29:25.123 [job-0] INFO JobContainer - DataX Reader.Job [sqlserverreader] splits to [1] tasks. 2023-11-28 10:29:25.124 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks. 2023-11-28 10:29:25.146 [job-0] INFO JobContainer - jobContainer starts to do schedule ... 2023-11-28 10:29:25.150 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups. 2023-11-28 10:29:25.153 [job-0] INFO JobContainer - Running by standalone Mode. 2023-11-28 10:29:25.159 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks. 2023-11-28 10:29:25.165 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated. 2023-11-28 10:29:25.165 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated. 2023-11-28 10:29:25.176 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started 2023-11-28 10:29:25.183 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select * from t2 where (createtime > '2023-11-28 03:17:46.330' and createtime < '2023-11-28 03:31:46.330') ] jdbcUrl:[jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb]. 2023-11-28 10:29:25.344 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select * from t2 where (createtime > '2023-11-28 03:17:46.330' and createtime < '2023-11-28 03:31:46.330') ] jdbcUrl:[jdbc:sqlserver://127.0.0.1:1433;DatabaseName=testdb]. 2023-11-28 10:29:25.606 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[431]ms 2023-11-28 10:29:25.607 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks. 2023-11-28 10:29:35.173 [job-0] INFO StandAloneJobContainerCommunicator - Total 4 records, 37 bytes | Speed 3B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00% 2023-11-28 10:29:35.173 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks. 2023-11-28 10:29:35.174 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do post work. 2023-11-28 10:29:35.175 [job-0] INFO JobContainer - DataX Reader.Job [sqlserverreader] do post work. 2023-11-28 10:29:35.175 [job-0] INFO JobContainer - DataX jobId [0] completed successfully. 2023-11-28 10:29:35.177 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /soft/datax/hook 2023-11-28 10:29:35.179 [job-0] INFO JobContainer - [total cpu info] => averageCpu | maxDeltaCpu | minDeltaCpu -1.00% | -1.00% | -1.00% [total gc info] => NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime PS MarkSweep | 1 | 1 | 1 | 0.052s | 0.052s | 0.052s PS Scavenge | 1 | 1 | 1 | 0.024s | 0.024s | 0.024s 2023-11-28 10:29:35.180 [job-0] INFO JobContainer - PerfTrace not enable! 2023-11-28 10:29:35.181 [job-0] INFO StandAloneJobContainerCommunicator - Total 4 records, 37 bytes | Speed 3B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00% 2023-11-28 10:29:35.183 [job-0] INFO JobContainer - 任务启动时刻 : 2023-11-28 10:29:24 任务结束时刻 : 2023-11-28 10:29:35 任务总计耗时 : 10s 任务平均流量 : 3B/s 记录写入速度 : 0rec/s 读出记录总数 : 4 读写失败总数 : 0 4.8 到目标端(GreatSQL)验证增量数据 greatsql> select * from t2; +----+---------------------+ | id | createtime | +----+---------------------+ | 1 | 2023-11-28 02:18:21 | | 2 | 2023-11-28 02:18:27 | | 3 | 2023-11-28 02:18:32 | | 4 | 2023-11-28 02:18:38 | | 5 | 2023-11-28 02:18:41 | | 6 | 2023-11-28 02:18:46 | | 7 | 2023-11-28 03:18:46 | | 8 | 2023-11-28 03:20:46 | | 9 | 2023-11-28 03:25:46 | | 10 | 2023-11-28 03:30:46 | +----+---------------------+ 10 rows in set (0.00 sec) 增量迁移总结:通过增加过滤条件达到增量迁移的目的。主要是通过过滤条件过滤掉了全量迁移的数据,进而变相的完成了增量迁移。

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2024-04-21,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 GreatSQL社区 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1.环境说明 1.1源端SQLSserver 版本IP端口Microsoft SQL Server 2017192.168.140.1601433 1.2目标端GreatSQL 版本IP端口GreatSQL-8.0.32192.168.139.863308 2.安装环境 2.1安装SQLServer环境 环境说明:借助Docker使用镜像启动数据库 2.1.1安装docker 1.安装基础软件包 $ yum install -y wget net-tools nfs-utils lrzsz gcc gcc-c++ make cmake libxml2-devel openssl-devel curl curl-devel unzip sudo ntp libaio-devel wget vim ncurses-devel autoconf automake zlib-devel python-devel epel-release openssh-server socat ipvsadm conntrack yum-utils 2.配置 docker-ce 国内 yum 源(阿里云) $ yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo 3.安装 docker 依赖包 $ yum install -y device-mapper-persistent-data lvm2 4.安装 docker-ce $ yum install docker-ce -y 5.启动容器 $ systemctl start docker && systemctl enable docker 2.1.2 拉取镜像 $ docker pull mcr.microsoft.com/mssql/server:2017-latest 2.1.3运行容器 $ docker run -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=********" \ -p 1433:1433 --name sqlserver2017 \ -d mcr.microsoft.com/mssql/server:2017-latest 此处记得密码设置为复杂密码 参数解释:
相关产品与服务
容器服务
腾讯云容器服务(Tencent Kubernetes Engine, TKE)基于原生 kubernetes 提供以容器为核心的、高度可扩展的高性能容器管理服务,覆盖 Serverless、边缘计算、分布式云等多种业务部署场景,业内首创单个集群兼容多种计算节点的容器资源管理模式。同时产品作为云原生 Finops 领先布道者,主导开源项目Crane,全面助力客户实现资源优化、成本控制。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档