//pt-osc工具的一个细节//
在MySQL中,如果我们需要对大表进行变更,往往使用gh-ost或者pt-osc工具,我平日里使用pt-osc比较多,来说说这个工具使用过程中的一个细节吧。关于pt-osc工具,之前写过两篇文章,分别是:
有兴趣可以点击链接进行查看,今天来看pt工具的另外一个特性。
试想这样一个场景,假如你给一个4亿空间的表进行添加索引操作,添加到一半的时候,需要停止这个加索引的操作,这个时候,我们怎么停下来最快呢?怎样保证对线上的影响最小。其实这个场景也是我自己之前遇到的,加索引加到一般,发现主从复制发生了延迟,所以需要快速停止这个加索引的操作,如何操作?
今天做了个测试,我把测试的过程贴在这里,方便大家观看:
1、首先,创建一张表:
mysql--root@localhost:yeyztest 16:32:16>>show create table test\G
*************************** 1. row ***************************
Table: test
Create Table: CREATE TABLE `test` (
`id` int(11) NOT NULL,
`name` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
2、给这个表里面插入1000w数据
delimiter ;;
create procedure idata()
begin
declare i int;
set i=1;
while i<10000000 do
insert into test(id,name) values(i,'yeyz');
set i=i+1;
end while;
end;;
delimiter ;
3、使用pt-osc工具进行线上索引添加
[root@ ~]#pt-online-schema-change --user=dba_admin --password=xxxxxxx -h127.0.0.1 -P5720 --alter " ADD column score smallint(4) unsigned NOT NULL DEFAULT '0' COMMENT 'score字段'" D=yeyztest,t=test --alter-foreign-keys-method=auto --recursion-method=none --print --charset=utf8 --execute
No slaves found. See --recursion-method if host tk-dba-mysql10-202 has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
analyze_table, 10, 1
copy_rows, 10, 0.25
create_triggers, 10, 1
drop_triggers, 10, 1
swap_tables, 10, 1
update_foreign_keys, 10, 1
No foreign keys reference `yeyztest`.`test`; ignoring --alter-foreign-keys-method.
Altering `yeyztest`.`test`...
Creating new table...
CREATE TABLE `yeyztest`.`_test_new` (
`id` int(11) NOT NULL,
`name` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Created new table yeyztest._test_new OK.
Altering new table...
ALTER TABLE `yeyztest`.`_test_new` ADD column score smallint(4) unsigned NOT NULL DEFAULT '0' COMMENT 'scoreå段'
Altered `yeyztest`.`_test_new` OK.
2020-06-29T12:20:31 Creating triggers...
2020-06-29T12:20:31 Created triggers OK.
2020-06-29T12:20:31 Copying approximately 9304048 rows...
INSERT LOW_PRIORITY IGNORE INTO `yeyztest`.`_test_new` (`id`, `name`) SELECT `id`, `name` FROM `yeyztest`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= ?)) AND ((`id` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 10363 copy nibble*/
SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `yeyztest`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= ?)) ORDER BY `id` LIMIT ?, 2 /*next chunk boundary*/
Copying `yeyztest`.`test`: 24% 01:33 remain
可以看到,pt-osc工具创建了中间表_test_new,然后创建了触发器、开始拷贝数据、最后给出了一个拷贝的百分比,以及预估的剩余时间。
那么现在,就是考虑,如何停止这个操作的问题了。
01
直接ctrl+c停止pt-osc工具这个命令
使用ctrl+c的方法,停止这个pt-osc的命令,可以得到如下的输出:
^C# Exiting on SIGINT.
Not dropping triggers because the tool was interrupted. To drop the triggers, execute:
DROP TRIGGER IF EXISTS `yeyztest`.`pt_osc_yeyztest_test_del`
DROP TRIGGER IF EXISTS `yeyztest`.`pt_osc_yeyztest_test_upd`
DROP TRIGGER IF EXISTS `yeyztest`.`pt_osc_yeyztest_test_ins`
Not dropping the new table `yeyztest`.`_test_new` because the tool was interrupted. To drop the new table, execute:
DROP TABLE IF EXISTS `yeyztest`.`_test_new`;
`yeyztest`.`test` was not altered.
它会给出2个提示:
第一个提示就是没有drop掉触发器,然后给出我们drop触发器的语句。
第二个提示没有drop掉中间表,而且给出了drop中间表的语句,感觉这个提示还是很友好的。
我们看一下是否真如他所言,触发器和中间表都在:
mysql>>show triggers from yeyztest;
+--------------------------+--------+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+----------------------+----------------------+--------------------+
| Trigger | Event | Table | Statement | Timing | Created | sql_mode | Definer | character_set_client | collation_connection | Database Collation |
+--------------------------+--------+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+----------------------+----------------------+--------------------+
| pt_osc_yeyztest_test_ins | INSERT | test | REPLACE INTO `yeyztest`.`_test_new` (`id`, `name`) VALUES (NEW.`id`, NEW.`name`) | AFTER | 2020-06-29 16:32:52.22 | ONLY_FULL_GROUP_BY,NO_AUTO_VALUE_ON_ZERO,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION | dba_admin@127.0.0.1 | utf8 | utf8_general_ci | utf8_general_ci |
| pt_osc_yeyztest_test_upd | UPDATE | test | BEGIN DELETE IGNORE FROM `yeyztest`.`_test_new` WHERE !(OLD.`id` <=> NEW.`id`) AND `yeyztest`.`_test_new`.`id` <=> OLD.`id`;REPLACE INTO `yeyztest`.`_test_new` (`id`, `name`) VALUES (NEW.`id`, NEW.`name`);END | AFTER | 2020-06-29 16:32:52.22 | ONLY_FULL_GROUP_BY,NO_AUTO_VALUE_ON_ZERO,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION | dba_admin@127.0.0.1 | utf8 | utf8_general_ci | utf8_general_ci |
| pt_osc_yeyztest_test_del | DELETE | test | DELETE IGNORE FROM `yeyztest`.`_test_new` WHERE `yeyztest`.`_test_new`.`id` <=> OLD.`id` | AFTER | 2020-06-29 16:32:52.22 | ONLY_FULL_GROUP_BY,NO_AUTO_VALUE_ON_ZERO,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION | dba_admin@127.0.0.1 | utf8 | utf8_general_ci | utf8_general_ci |
+--------------------------+--------+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+----------------------+----------------------+--------------------+
mysql>>show tables;
+--------------------+
| Tables_in_yeyztest |
+--------------------+
| _test_new |
| test |
+--------------------+
2 rows in set (0.01 sec)
通过查看,我们发现这三个触发器真实存在。而且中间表也存在。
说明ctrl+c的方法,不会自动drop掉中间表和触发器,可以认为是不完善的一种方法。但是在这个过程中该表产生的新数据会写入_test_new中,数据相对完整。
02
直接drop中间表
这种方法,我们来看结果如何:
mysql--root@localhost:yeyztest 16:35:07>>show tables;
+--------------------+
| Tables_in_yeyztest |
+--------------------+
| _test_new |
| test |
+--------------------+
2 rows in set (0.01 sec)
mysql--root@localhost:yeyztest 16:35:11>>drop table _test_new;
Query OK, 0 rows affected (0.11 sec)
执行完成之后,pt-osc的命令输出结果如下:
2020-06-29T12:21:05 Dropping triggers...
DROP TRIGGER IF EXISTS `yeyztest`.`pt_osc_yeyztest_test_del`
DROP TRIGGER IF EXISTS `yeyztest`.`pt_osc_yeyztest_test_upd`
DROP TRIGGER IF EXISTS `yeyztest`.`pt_osc_yeyztest_test_ins`
2020-06-29T12:21:05 Dropped triggers OK.
`yeyztest`.`test` was not altered.
2020-06-29T12:21:05 Error copying rows from `yeyztest`.`test` to `yeyztest`.`_test_new`: 2020-06-29T12:21:05 DBD::mysql::st execute failed: Table 'yeyztest._test_new' doesn't exist [for Statement "INSERT LOW_PRIORITY IGNORE INTO `yeyztest`.`_test_new` (`id`, `name`) SELECT `id`, `name` FROM `yeyztest`.`test` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= ?)) AND ((`id` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 10363 copy nibble*/" with ParamValues: 0='2578309', 1='2617152'] at /usr/local/bin/pt-online-schema-change line 11284.
可以看到,使用drop中间表的方法,pt工具会自动drop掉触发器,同时,给出报错信息。
03
对比总结
1、ctrl+c的方法,结束pt-osc命令,不会drop中间表和触发器,因此触发器执行过程中可能会产生报错信息。
但是中间表_test_new还在,触发器写入的数据还在。
2、drop 中间表_test_new的方式结束pt-osc命令,pt工具会自动drop掉触发器,但是触发器写入_test_new表中的部分数据会丢失。
3、最好避免这种pt-osc执行了一半,要强制终止的操作,本身是不安全的,如果必须要终止,就需要从业务侧进行评估。
如果表中的内容本身是日志类数据,数据丢失几分钟业务可以接受,那其实使用drop中间表的方法是可以的;
如果表中的内容无法容忍丢失,那么使用ctrl+c的方法处理,这种方法需要额外执行drop 触发器和drop中间表的操作。