使用pt-table-checksum校验MySQL主从复制

pt-table-checksum是一个基于MySQL数据库主从架构在线数据一致性校验工具。其工作原理在主库上运行, 通过对同步的表在主从段执行checksum, 从而判断数据是否一致。在校验完毕时,该工具将列出与主库存在差异的对象结果。

一、主从不一致的情形

    Master端使用了不确定的语句(如:CURRENT_USER(), UUID())
    不正确的故障转移(failover)流程
    误操作或直接在Slave进行DML操作
    持续的升级更新(Rolling upgrades)
    混合使用事务引擎和非事务引擎的表
    跳过了复制事件 (SET GLOBAL SQL_SLAVE_SKIP_COUNTER = N)
    使用临时表(Temporary Tables)
    复制过滤(Replication Filters)
    使用含LIMIT且没有order by的更新语句(update/delete with LIMIT clause without order by)

二、pt-table-checksum特性

    pt-table-checksum connects to the server you specify, and finds databases and tables that 
    match the filters you specify  (if any). It works one table at a time, so it does not accumulate
    large amounts of memory or do a lot of work before beginning to checksum. This makes it usable
    on very large servers. We have used it on servers with hundreds of thousands of databases and tables, 
    and trillions of rows. No matter how large the server is, pt-table-checksum works equally well.

    One reason it can work on very large tables is that it divides each table into chunks of rows, 
    and checksums each chunk with a single REPLACE..SELECT query. It varies the chunk size to make 
    the checksum queries run in the desired amount of time. The goal of chunking the tables, instead of 
    doing each table with a single big query, is to ensure that checksums are unintrusive and don’t cause too
    much replication lag or load on the server. That’s why the target time for each chunk is 0.5 seconds by default.

    The tool keeps track of how quickly the server is able to execute the queries, and adjusts the chunks 
    as it learns more about the server’s performance. It uses an exponentially decaying weighted average 
    to keep the chunk size stable, yet remain responsive if the server’s performance changes during checksumming 
    for any reason. This means that the tool will quickly throttle itself if your server becomes heavily loaded during 
    a trafficc spike or a background task, for example.

    After pt-table-checksum finishes checksumming all of the chunks in a table, it pauses and waits for all 
    detected replicas to finish executing the checksum queries. Once that is finished, it checks all of the replicas to 
    see if they have the same data as the master, and then prints a line of output with the results. 

三、演示pt-table-checksum

-- 环境:Master 192.168.1.8, Slave 192.168.1.12,主从已构建
-- 演示中,mysql提示符为:用户名@主机名[库名]
-- 如master@localhost[test],表示master用户表示在主,slave表示用户在slave上
-- 复制过滤器如下:
[root@vdbsrv4 ~]# mysql -uroot -p -e "show slave status\G"|grep "Replicate
Enter password: 
              Replicate_Do_DB: sakila,test
          Replicate_Ignore_DB: mysql
a、环境准备
--对用于执行checksum的用户授权,注,如果主从复制未开启mysql系统库复制,则从库也同样执行用户创建
master@localhost[test]> grant select, process, super, replication slave on *.* to
 ->  'checksums'@'192.168.1.%' identified by 'xxx';
Query OK, 0 rows affected (0.00 sec)

--主库建表及插入记录
master@localhost[test]> create table t(id tinyint primary key auto_increment,ename varchar(20));
Query OK, 0 rows affected (0.01 sec)

master@localhost[test]> insert into t(ename) values('Leshami'),('Henry'),('Jack');
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

--从库查询结果
slave@localhost[test]> select * from t;
+----+---------+
| id | ename  |
+----+---------+
|  1 | Leshami |
|  2 | Henry  |
|  3 | Jack    |
+----+---------+

--模拟数据不一致,slave端删除记录
slave@localhost[test]> delete from t where id=2;

b、单表校验
-- 执行pt-table-checksum
[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 \
> -dtest -tt --nocheck-replication-filters \
> --no-check-binlog-format  --replicate=test.checksum
            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE
08-06T10:14:32      0      1        3      1      0  0.031 test.t

TS            :完成检查的时间。
ERRORS        :检查时候发生错误和警告的数量。
DIFFS        :0表示一致,1表示不一致。当指定--no-replicate-check时,
                会一直为0,当指定--replicate-check-only会显示不同的信息。
ROWS          :表的行数。
CHUNKS        :被划分到表中的块的数目。
SKIPPED      :由于错误或警告或过大,则跳过块的数目。
TIME          :执行的时间。
TABLE        :被检查的表名。

--基于从库端SQL脚本查看checksum结果
slave@localhost[test]> system more check_sync_stat.sql;
SELECT 
    db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks
FROM
    test.checksum
WHERE
    (master_cnt <> this_cnt
        OR master_crc <> this_crc
        OR ISNULL(master_crc) <> ISNULL(this_crc))
GROUP BY db , tbl;

slave@localhost[test]> source check_sync_stat.sql;
+------+-----+------------+--------+
| db  | tbl | total_rows | chunks |
+------+-----+------------+--------+
| test | t  |          2 |      1 |
+------+-----+------------+--------+

--从库端插入记录
slave@localhost[test]> insert into t(ename) values('Robin');
Query OK, 1 row affected (0.00 sec)

slave@localhost[test]> select * from t;
+----+---------+
| id | ename  |
+----+---------+
|  1 | Leshami |   #Author : Leshami
|  3 | Jack    |   #Blog     : http://blog.csdn.net/leshami
|  4 | Robin  |
+----+---------+

-- 再次在master端执行pt-table-checksum(此处略),后查看结果如下
slave@localhost[test]> source check_sync_stat.sql;
+------+-----+------------+--------+
| db  | tbl | total_rows | chunks |
+------+-----+------------+--------+
| test | t  |          3 |      1 |
+------+-----+------------+--------+

b、查看pt-table-checksum工作原理
-- 使用--explain参数,不执行checksum,列出checksum时真正执行的SQL语句
Show, but do not execute, checksum queries (disables --[no]empty-replicate-table). If specifed
twice, the tool actually iterates through the chunking algorithm, printing the upper and lower boundary values
for each chunk, but not executing the checksum queries.

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 \
> -dtest -tt --nocheck-replication-filters \
> --no-check-binlog-format  --replicate=test.checksum --explain
--
-- test.t
--

REPLACE INTO `test`.`checksum` (db, tbl, chunk, chunk_index, lower_boundary,
upper_boundary, this_cnt, this_crc) SELECT ?, ?, ?, ?, ?, ?, COUNT(*) AS cnt, 
COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, `ename`, 
CONCAT(ISNULL(`ename`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `test`.`t`
  /*checksum table*/

c、库级别校验
[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 \
> --databases=sakila --nocheck-replication-filters --no-check-binlog-format \
> --replicate=test.checksum 
            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE
08-06T13:52:17      0      0      200      1      0  0.083 sakila.actor
08-06T13:52:17      0      0      603      1      0  0.024 sakila.address
08-06T13:52:17      0      0      16      1      0  0.012 sakila.category
08-06T13:52:17      0      0      600      1      0  0.025 sakila.city
08-06T13:52:17      0      0      109      1      0  0.019 sakila.country
08-06T13:52:17      0      0      599      1      0  0.019 sakila.customer
08-06T13:52:17      0      0    1000      1      0  0.035 sakila.film
08-06T13:52:17      0      0    5462      1      0  0.295 sakila.film_actor
08-06T13:52:17      0      0    1000      1      0  0.019 sakila.film_category
08-06T13:52:17      0      0    1000      1      0  0.015 sakila.film_text
08-06T13:52:17      0      0    4581      1      0  0.041 sakila.inventory
08-06T13:52:17      0      0        6      1      0  0.012 sakila.language
08-06T13:52:18      0      0    16049      1      0  0.367 sakila.payment
08-06T13:52:18      0      0    16044      1      0  0.357 sakila.rental
08-06T13:52:18      0      0        2      1      0  0.013 sakila.staff
08-06T13:52:18      0      0        2      1      0  0.012 sakila.store

--在从库删除一张表
slave@localhost[test]> drop table sakila.payment;
Query OK, 0 rows affected (0.01 sec)

-- 再次执行pt-table-checksum,收到如下提示
08-06T13:56:42 Skipping table sakila.payment because it has problems on these replicas:
Table sakila.payment does not exist on replica vdbsrv4
This can break replication.  If you understand the risks, specify --no-check-slave-tables to disable this check.
08-06T13:56:42 Error checksumming table sakila.payment: DBD::mysql::db selectrow_hashref failed:
Table 'sakila.payment' doesn't exist 
[for Statement "EXPLAIN SELECT * FROM `sakila`.`payment` WHERE 1=1"] at /usr/bin/pt-table-checksum line 6530.

d、多从校验
-- 下面演示多个从库时主从一致性校验
-- 缺省情况下
-- 参数:--recursion-method ; type: array; default: processlist,hosts.
--            Preferred recursion method for discovering replicas.
--  pt-table-checksum performs several “REPLICACHECKS” before and while running.

master@localhost[(none)]> show slave hosts;
+-----------+------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID                          |
+-----------+------+------+-----------+--------------------------------------+
|        11 |      | 3307 |      1002 | 69fc46b6-3c06-11e5-94f0-000c29a05f26 |
|        1 |      | 3306 |      1002 | f2824060-e2cb-11e4-8f18-000c2926f457 |
+-----------+------+------+-----------+--------------------------------------+

root@localhost[(none)]> show variables like 'port';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| port          | 3307  |
+---------------+-------+

root@localhost[(none)]> delete from test.t where id=1;
Query OK, 1 row affected (0.00 sec)

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -dtest \
> -tt --nocheck-replication-filters --no-check-binlog-format --replicate=test.checksum \
> --recursion-method=hosts

# A software update is available:
#  * The current version for Percona::Toolkit is 2.2.14.

            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE
08-06T16:12:52      0      1        3      1      0  0.034 test.t

四、参数描述

–nocheck-replication-filters   不检查复制过滤器,建议启用。后面可以用–databases来指定需要检查的数据库。 –no-check-binlog-format   不检查复制的binlog模式,要是binlog模式是ROW,则会报错。 –replicate-check-only   只显示不同步的信息。 –replicate=   把checksum的信息写入到指定表中,建议直接写到被检查的数据库当中。 –databases=   指定需要被检查的数据库,多个则用逗号隔开。 –tables=   指定需要被检查的表,多个用逗号隔开   h=127.0.0.1 :Master的地址   u=root :用户名   p=123456 :密码   P=3306 :端口

五、常见问题

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -d mysql \
> --nocheck-replication-filters --replicate=test.checksums
Replica vdbsrv4 has binlog_format MIXED which could cause pt-table-checksum to break replication. 
Please read "Replicas using row-based replication" in the LIMITATIONS section of the tool's documentation.
  If you understand the risks, specify --no-check-binlog-format to disable this check.
上面描述的是关于使用mixed日志格式时的问题  

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -d mysql \
> --nocheck-replication-filters --no-check-binlog-format
DBD::mysql::db do failed: Access denied for user 'checksums'@'192.168.1.%' to database 'percona' 
[for Statement "CREATE DATABASE IF NOT EXISTS `percona` /* pt-table-checksum */"] 
at /usr/bin/pt-table-checksum line 10743.
07-29T08:42:03 --replicate database percona does not exist and it cannot be created automatically. 
You need to create the database.

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -dtest -tt \
> --nocheck-replication-filters --no-check-binlog-format  --replicate=test.checksum
Cannot connect to P=3306,h=vdbsrv4,p=...,u=checksums
Diffs cannot be detected because no slaves were found.  
Please read the --recursion-method documentation for information.
            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE
08-06T10:03:10      0      0        3      1      0  0.023 test.t

[root@vdbsrv3 ~]# pt-table-checksum h='192.168.1.8',u='checksums',p='xxx',P=3306 -dtest -tt \
> --nocheck-replication-filters --no-check-binlog-format \
> --replicate=test.checksum --recursion-method=hosts
Cannot connect to P=3306,h=,p=...,u=checksums
Cannot connect to P=3307,h=,p=...,u=checksums
Diffs cannot be detected because no slaves were found.  
Please read the --recursion-method documentation for information.
            TS ERRORS  DIFFS    ROWS  CHUNKS SKIPPED    TIME TABLE
08-06T16:02:27      0      0        3      1      0  0.016 test.t

master@localhost[(none)]> show slave hosts;
+-----------+------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID                          |
+-----------+------+------+-----------+--------------------------------------+
|        1 |      | 3306 |      1002 | f2824060-e2cb-11e4-8f18-000c2926f457 |
|        11 |      | 3307 |      1002 | 69fc46b6-3c06-11e5-94f0-000c29a05f26 |
+-----------+------+------+-----------+--------------------------------------+

-- 增加参数report_host后重启从库
[root@vdbsrv4 ~]# grep report_host /etc/my.cnf
report_host='192.168.1.12'

master@localhost[(none)]> show slave hosts;
+-----------+--------------+------+-----------+--------------------------------------+
| Server_id | Host        | Port | Master_id | Slave_UUID                          |
+-----------+--------------+------+-----------+--------------------------------------+
|        11 | 192.168.1.12 | 3307 |      1002 | 69fc46b6-3c06-11e5-94f0-000c29a05f26 |
|        1 | 192.168.1.12 | 3306 |      1002 | f2824060-e2cb-11e4-8f18-000c2926f457 |
+-----------+--------------+------+-----------+--------------------------------------+

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏杨建荣的学习笔记

使用shell生成orabbix自动化配置脚本(r6笔记第53天)

在使用Orabbix监控Oracle的时候,本身和zaabix agent最大的不同便是使用Orabbix不需要对每个数据库实例都安装单独的agent,而是一个...

32880
来自专栏杨建荣的学习笔记

系统级alias vs Oracle ADR功能(r5笔记第35天)

Oracle在11g中推出的新特性ADR,即Automatic Diagnostic Repository 个人理解这个工具就是能够高效的把一些日志文件轻松管理...

36880
来自专栏数据和云

Oracle数据库的初始化与跟踪学习方法

编辑说明:《Oracle性能优化与诊断案例精选》出版以来,收到很多读者的来信和评论,我们会通过连载的形式将书中内容公布出来,希望书中内容能够帮助到更多的读者朋友...

39990
来自专栏沃趣科技

数据库对象事件与属性统计 | performance_schema全方位介绍

上一篇《事件统计 | performance_schema全方位介绍》详细介绍了performance_schema的事件统计表,但这些统计数据粒度太粗,仅仅按...

41040
来自专栏Hadoop实操

如何使用StreamSets实现Oracle中变化数据实时写入Kudu

1K50
来自专栏杨建荣的学习笔记

由sqlplus中的一个小细节所做的折腾(r5笔记第11天)

sqlplus是Oracle DBA每天都要用到烂的工具了。如果没有sqlplus还真不知道怎么和oracle对话了。转念一想,自己已经很久没使用过toad了。...

37960
来自专栏杨建荣的学习笔记

分分钟搭建MySQL Group Replication测试环境(r11笔记第82天)

最近看了下MySQL 5.7中的闪亮特性Group Replication,也花了不少做了些测试,发现有些方面的表现确实不赖。当然要模拟这么一套环境还是需...

38570
来自专栏杨建荣的学习笔记

使用外部表关联MySQL数据到Oracle(r6笔记第100天)

因为业务需要,有个临时的活动需要DBA来支持一些数据业务,问题来了,需要从MySQL端同步一部分数据到Oracle端,然后从Oracle端匹配查 到相应的数据返...

31340
来自专栏「3306 Pai」社区

浅析ProxySQL用户管理

对于读写分离特别重要,保证了同一个事务中所有的语句都会路由到同一组示例,防止出现同一个事务中,上下文数据不一致的情况。例如,在不开启这个属性的情况下:

31710
来自专栏杨建荣的学习笔记

关于查看dba_data_files的一个小问题(r7笔记第72天)

今天帮一个朋友看一个pl/sql的问题,他已经钻到一个死胡同里列,可能明眼人一看就知道哪里有问题,但是当局者迷,所以我抽空看了一下这个pl/sql块。 pl/s...

38950

扫码关注云+社区

领取腾讯云代金券