主从复制延迟排查

导语 :目前生产环境中主从延迟常见于RO实例及备库中,由于历史原因目前部分用户的备库开启了只读服务,后续这块我们会逐步推荐客户使用RO实例。

背景:

接一线同时反馈,用户的RO实例延迟很大,达到1000多秒。

问题排查:

1.登上客户RO实例,通过执行show full processlist 查看确认是否有执行时长比较久的Select查询。

MySQL [(none)]> pager grep -v Sleep
PAGER set to 'grep -v Sleep'

MySQL [(none)]> show processlist;
+-----------+-------------+--------------+--------+---------+-------+----------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-----------+-------------+--------------+--------+---------+-------+----------------------------------+------------------+
| 196776913 | system user | | NULL | Connect | 7198 | Waiting for master to send event | NULL |
| 196776914 | system user | | NULL | Connect | 12957 | Reading event from the relay log | NULL |
| 197039676 | tencentroot | localhost | NULL | Query | 0 | init | show processlist |
+-----------+-------------+--------------+--------+---------+-------+----------------------------------+------------------+
69 rows in set (0.00 sec)

MySQL [(none)]> show processlist;
+-----------+-------------+--------------+--------+---------+-------+----------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-----------+-------------+--------------+--------+---------+-------+----------------------------------+------------------+
| 196776913 | system user | | NULL | Connect | 7200 | Waiting for master to send event | NULL |
| 196776914 | system user | | NULL | Connect | 12958 | Reading event from the relay log | NULL |
| 197039676 | tencentroot | localhost | NULL | Query | 0 | init | show processlist |
+-----------+-------------+--------------+--------+---------+-------+----------------------------------+------------------+
70 rows in set (0.00 sec)

这里看并没有发现执行时长的Select。

2.执行show slave status\G ,查看当前slave ,sql_thread执行到哪里.

MySQL [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.53.176.236
Master_User: tencentroot
Master_Port: 20133
Connect_Retry: 60
Master_Log_File: mysql-bin.000614
Read_Master_Log_Pos: 291987919
Relay_Log_File: relay-bin.001810
Relay_Log_Pos: 786595678
Relay_Master_Log_File: mysql-bin.000609 //当前Slave SQL_thread 执行事件所对应主库日志的名称
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB: 
Replicate_Ignore_DB: 
Replicate_Do_Table: 
Replicate_Ignore_Table: 
Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 
Last_Errno: 0
Last_Error: 
Skip_Counter: 0
Exec_Master_Log_Pos: 786595596////当前Slave SQL_thread 执行事件所对应主库日志的位置偏移量
Relay_Log_Space: 5661425134
Until_Condition: None
Until_Log_File: 
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File: 
Master_SSL_CA_Path: 
Master_SSL_Cert: 
Master_SSL_Cipher: 
Master_SSL_Key: 
Seconds_Behind_Master: 13108
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error: 
Last_SQL_Errno: 0
Last_SQL_Error: 
Replicate_Ignore_Server_Ids: 
Master_Server_Id: 87411
Master_UUID: 89421293-dd22-11e6-b4d7-6c0b84d538f9
Master_Info_File: /data1/mysql_root/data/20141/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Reading event from the relay log
...........

此时保存记录,在slave 上获取的主库日志名(mysql-bin.000609) 和位置偏移量(786595596)。

3.登录主库,根据步骤2中获取的信息,查看用户业务当时执行的什么操作。

         MySQL [(none)]> show binlog events in 'mysql-bin.000609' from 786595596 limit 100;
+------------------+-----------+-------------+-----------+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+------------------+-----------+-------------+-----------+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| mysql-bin.000609 | 786595596 | Gtid | 87411 | 786595640 | SET @@SESSION.GTID_NEXT= '89421293-dd22-11e6-b4d7-6c0b84d538f9:1688021788' |
| mysql-bin.000609 | 786595640 | Query | 87411 | 786595710 | BEGIN |
| mysql-bin.000609 | 786595710 | Table_map | 87411 | 786595786 | table_id: 698 (ultrax.pre_common_credit_log) |
| mysql-bin.000609 | 786595786 | Delete_rows | 87411 | 786603968 | table_id: 698 |
| mysql-bin.000609 | 786603968 | Delete_rows | 87411 | 786612150 | table_id: 698 |
| mysql-bin.000609 | 786612150 | Delete_rows | 87411 | 786620332 | table_id: 698 |
| mysql-bin.000609 | 786620332 | Delete_rows | 87411 | 786628514 | table_id: 698 |
| mysql-bin.000609 | 786628514 | Delete_rows | 87411 | 786636696 | table_id: 698 |
| mysql-bin.000609 | 786636696 | Delete_rows | 87411 | 786644878 | table_id: 698 |
| mysql-bin.000609 | 786644878 | Delete_rows | 87411 | 786653060 | table_id: 698 |
| mysql-bin.000609 | 786653060 | Delete_rows | 87411 | 786661242 | table_id: 698 |
| mysql-bin.000609 | 786661242 | Delete_rows | 87411 | 786669424 | table_id: 698 |
| mysql-bin.000609 | 786669424 | Delete_rows | 87411 | 786677606 | table_id: 698 |
| mysql-bin.000609 | 786677606 | Delete_rows | 87411 | 786685788 | table_id: 698 

..........................................

这里看执行是一个delete操作。

     查看表信息 


MySQL [ultrax]> show table status like 'pre_common_credit_log'\G
*************************** 1. row ***************************
Name: pre_common_credit_log
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 513486861
Avg_row_length: 78
Data_length: 40310423552
Max_data_length: 0
Index_length: 65080279040
Data_free: 84828749824
Auto_increment: NULL
Create_time: 2017-01-18 17:50:19
Update_time: NULL
Check_time: NULL
Collation: utf8_general_ci
Checksum: NULL
Create_options: 
Comment: 
1 row in set (0.00 sec)

这里看到该表的数据量已经5亿多。

CREATE TABLE `pre_common_credit_log` (
`uid` int(10) unsigned NOT NULL DEFAULT '0',
`operation` char(3) NOT NULL DEFAULT '',
`relatedid` int(10) unsigned NOT NULL,
`dateline` int(10) unsigned NOT NULL,
`extcredits1` int(10) NOT NULL,
`extcredits2` int(10) NOT NULL,
`extcredits3` int(10) NOT NULL,
`extcredits4` int(10) NOT NULL,
`extcredits5` int(10) NOT NULL,
`extcredits6` int(10) NOT NULL,
`extcredits7` int(10) NOT NULL,
`extcredits8` int(10) NOT NULL,
KEY `uid` (`uid`),
KEY `operation` (`operation`),
KEY `relatedid` (`relatedid`),
KEY `dateline` (`dateline`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

MySQL [ultrax]> show index from ultrax.pre_common_credit_log;
+-----------------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| pre_common_credit_log | 1 | uid | 1 | uid | A | 17116232 | NULL | NULL | | BTREE | | |
| pre_common_credit_log | 1 | operation | 1 | operation | A | 13116 | NULL | NULL | | BTREE | | |
| pre_common_credit_log | 1 | relatedid | 1 | relatedid | A | 155696 | NULL | NULL | | BTREE | | |
| pre_common_credit_log | 1 | dateline | 1 | dateline | A | 32092936 | NULL | NULL | | BTREE | | |
+-----------------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

4 rows in set (0.00 sec)

这里看表只有普通index,木有主键和唯一index,且index的Cardinality 也不高,表的数据量已经在5亿多,index 效率会很差。由于slave sql_thread是串行执行,再加上index效率问题,slave 不能够及时将接收的日志执行完毕,故引发主从延迟。

总结:

1.反馈用户此种业务基于主键或唯一性较高的index删除。

2.主从环境中,update、delete 操作时一定要留意当前表中是否有主键或唯一索引。若无, 可将自增列做为主键。如果没有主键的话,update、delete将会全表扫描,特别是在binlog row模式的时候,延迟将会很大。本文中binlog 中以row模

原创声明,本文系作者授权云+社区-专栏发表,未经许可,不得转载。

如有侵权,请联系 yunjia_community@tencent.com 删除。

编辑于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏python3

mysql-数据库的操作

nformation_schema: 虚拟库,不占用磁盘空间,存储的是数据库启动后的一些参数,如用户表信息、列信息、权限信息、字符信息等

802
来自专栏Java成神之路

Oracle学习笔记_01_SQL初步

   DML: Data Manipulation Language        数据操纵语言 DDL: Data Definition Languag...

663
来自专栏Java帮帮-微信公众号-技术文章全总结

【数据库】MySQL进阶一、主外键讲解

MySQL进阶主外键讲解 1.什么是外键: 主键:是唯一标识一条记录,不能有重复的,不允许为空,用来保证数据完整性 外键:是另一表的主键, 外键可...

3177
来自专栏沃趣科技

应用示例荟萃 | performance_schema全方位介绍(下)

使用performance_schema中的语句当前事件记录表和语句事件历史记录表可以查询数据库中最近执行的一些SQL语句,以及语句相关的信息,这里我们以eve...

1353
来自专栏听雨堂

Mysql高效插入/更新数据

从tushare抓取到的财务数据,最开始只是想存下来,用的办法想简单点,是:插入--报错—update 但发现这个方法太蠢,异常会导致大量无效连接,改为: ...

1757
来自专栏听雨堂

Mysql高效插入/更新数据

从tushare抓取到的财务数据,最开始只是想存下来,用的办法想简单点,是:插入--报错—update 但发现这个方法太蠢,异常会导致大量无效连接,改为: ...

1815
来自专栏ShaoYL

iOS---数据离线缓存

36212
来自专栏用户2442861的专栏

MySQL性能优化的最佳20+条经验

今天,数据库的操作越来越成为整个应用的性能瓶颈了,这点对于Web应用尤其明显。关于数据库的性能,这并不只是DBA才需要担心的事,而这更是我们程序员需要去关注的...

482
来自专栏JAVA烂猪皮

用 Node + MySQL 如何处理 100G 数据

通过这个 Node.js 和 MySQL 示例项目,我们将看看如何有效地处理 数十亿行 占用 数百GB 存储空间的数据。

1425
来自专栏积累沉淀

MySQL 事务没有提交导致 锁等待 Lock wait timeout exceeded

Lock wait timeout exceeded 当一个SQL执行完了,但未COMMIT,后面的SQL想要执行就是被锁,超时结束; select *...

3028

扫码关注云+社区