主从复制延迟排查

导语 :目前生产环境中主从延迟常见于RO实例及备库中,由于历史原因目前部分用户的备库开启了只读服务,后续这块我们会逐步推荐客户使用RO实例。

背景:

接一线同时反馈,用户的RO实例延迟很大,达到1000多秒。

问题排查:

1.登上客户RO实例,通过执行show full processlist 查看确认是否有执行时长比较久的Select查询。

MySQL [(none)]> pager grep -v Sleep
PAGER set to 'grep -v Sleep'

MySQL [(none)]> show processlist;
+-----------+-------------+--------------+--------+---------+-------+----------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-----------+-------------+--------------+--------+---------+-------+----------------------------------+------------------+
| 196776913 | system user | | NULL | Connect | 7198 | Waiting for master to send event | NULL |
| 196776914 | system user | | NULL | Connect | 12957 | Reading event from the relay log | NULL |
| 197039676 | tencentroot | localhost | NULL | Query | 0 | init | show processlist |
+-----------+-------------+--------------+--------+---------+-------+----------------------------------+------------------+
69 rows in set (0.00 sec)

MySQL [(none)]> show processlist;
+-----------+-------------+--------------+--------+---------+-------+----------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-----------+-------------+--------------+--------+---------+-------+----------------------------------+------------------+
| 196776913 | system user | | NULL | Connect | 7200 | Waiting for master to send event | NULL |
| 196776914 | system user | | NULL | Connect | 12958 | Reading event from the relay log | NULL |
| 197039676 | tencentroot | localhost | NULL | Query | 0 | init | show processlist |
+-----------+-------------+--------------+--------+---------+-------+----------------------------------+------------------+
70 rows in set (0.00 sec)

这里看并没有发现执行时长的Select。

2.执行show slave status\G ,查看当前slave ,sql_thread执行到哪里.

MySQL [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.53.176.236
Master_User: tencentroot
Master_Port: 20133
Connect_Retry: 60
Master_Log_File: mysql-bin.000614
Read_Master_Log_Pos: 291987919
Relay_Log_File: relay-bin.001810
Relay_Log_Pos: 786595678
Relay_Master_Log_File: mysql-bin.000609 //当前Slave SQL_thread 执行事件所对应主库日志的名称
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB: 
Replicate_Ignore_DB: 
Replicate_Do_Table: 
Replicate_Ignore_Table: 
Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 
Last_Errno: 0
Last_Error: 
Skip_Counter: 0
Exec_Master_Log_Pos: 786595596////当前Slave SQL_thread 执行事件所对应主库日志的位置偏移量
Relay_Log_Space: 5661425134
Until_Condition: None
Until_Log_File: 
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File: 
Master_SSL_CA_Path: 
Master_SSL_Cert: 
Master_SSL_Cipher: 
Master_SSL_Key: 
Seconds_Behind_Master: 13108
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error: 
Last_SQL_Errno: 0
Last_SQL_Error: 
Replicate_Ignore_Server_Ids: 
Master_Server_Id: 87411
Master_UUID: 89421293-dd22-11e6-b4d7-6c0b84d538f9
Master_Info_File: /data1/mysql_root/data/20141/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Reading event from the relay log
...........

此时保存记录,在slave 上获取的主库日志名(mysql-bin.000609) 和位置偏移量(786595596)。

3.登录主库,根据步骤2中获取的信息,查看用户业务当时执行的什么操作。

         MySQL [(none)]> show binlog events in 'mysql-bin.000609' from 786595596 limit 100;
+------------------+-----------+-------------+-----------+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+------------------+-----------+-------------+-----------+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| mysql-bin.000609 | 786595596 | Gtid | 87411 | 786595640 | SET @@SESSION.GTID_NEXT= '89421293-dd22-11e6-b4d7-6c0b84d538f9:1688021788' |
| mysql-bin.000609 | 786595640 | Query | 87411 | 786595710 | BEGIN |
| mysql-bin.000609 | 786595710 | Table_map | 87411 | 786595786 | table_id: 698 (ultrax.pre_common_credit_log) |
| mysql-bin.000609 | 786595786 | Delete_rows | 87411 | 786603968 | table_id: 698 |
| mysql-bin.000609 | 786603968 | Delete_rows | 87411 | 786612150 | table_id: 698 |
| mysql-bin.000609 | 786612150 | Delete_rows | 87411 | 786620332 | table_id: 698 |
| mysql-bin.000609 | 786620332 | Delete_rows | 87411 | 786628514 | table_id: 698 |
| mysql-bin.000609 | 786628514 | Delete_rows | 87411 | 786636696 | table_id: 698 |
| mysql-bin.000609 | 786636696 | Delete_rows | 87411 | 786644878 | table_id: 698 |
| mysql-bin.000609 | 786644878 | Delete_rows | 87411 | 786653060 | table_id: 698 |
| mysql-bin.000609 | 786653060 | Delete_rows | 87411 | 786661242 | table_id: 698 |
| mysql-bin.000609 | 786661242 | Delete_rows | 87411 | 786669424 | table_id: 698 |
| mysql-bin.000609 | 786669424 | Delete_rows | 87411 | 786677606 | table_id: 698 |
| mysql-bin.000609 | 786677606 | Delete_rows | 87411 | 786685788 | table_id: 698 

..........................................

这里看执行是一个delete操作。

     查看表信息 


MySQL [ultrax]> show table status like 'pre_common_credit_log'\G
*************************** 1. row ***************************
Name: pre_common_credit_log
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 513486861
Avg_row_length: 78
Data_length: 40310423552
Max_data_length: 0
Index_length: 65080279040
Data_free: 84828749824
Auto_increment: NULL
Create_time: 2017-01-18 17:50:19
Update_time: NULL
Check_time: NULL
Collation: utf8_general_ci
Checksum: NULL
Create_options: 
Comment: 
1 row in set (0.00 sec)

这里看到该表的数据量已经5亿多。

CREATE TABLE `pre_common_credit_log` (
`uid` int(10) unsigned NOT NULL DEFAULT '0',
`operation` char(3) NOT NULL DEFAULT '',
`relatedid` int(10) unsigned NOT NULL,
`dateline` int(10) unsigned NOT NULL,
`extcredits1` int(10) NOT NULL,
`extcredits2` int(10) NOT NULL,
`extcredits3` int(10) NOT NULL,
`extcredits4` int(10) NOT NULL,
`extcredits5` int(10) NOT NULL,
`extcredits6` int(10) NOT NULL,
`extcredits7` int(10) NOT NULL,
`extcredits8` int(10) NOT NULL,
KEY `uid` (`uid`),
KEY `operation` (`operation`),
KEY `relatedid` (`relatedid`),
KEY `dateline` (`dateline`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

MySQL [ultrax]> show index from ultrax.pre_common_credit_log;
+-----------------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| pre_common_credit_log | 1 | uid | 1 | uid | A | 17116232 | NULL | NULL | | BTREE | | |
| pre_common_credit_log | 1 | operation | 1 | operation | A | 13116 | NULL | NULL | | BTREE | | |
| pre_common_credit_log | 1 | relatedid | 1 | relatedid | A | 155696 | NULL | NULL | | BTREE | | |
| pre_common_credit_log | 1 | dateline | 1 | dateline | A | 32092936 | NULL | NULL | | BTREE | | |
+-----------------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

4 rows in set (0.00 sec)

这里看表只有普通index,木有主键和唯一index,且index的Cardinality 也不高,表的数据量已经在5亿多,index 效率会很差。由于slave sql_thread是串行执行,再加上index效率问题,slave 不能够及时将接收的日志执行完毕,故引发主从延迟。

总结:

1.反馈用户此种业务基于主键或唯一性较高的index删除。

2.主从环境中,update、delete 操作时一定要留意当前表中是否有主键或唯一索引。若无, 可将自增列做为主键。如果没有主键的话,update、delete将会全表扫描,特别是在binlog row模式的时候,延迟将会很大。本文中binlog 中以row模

原创声明,本文系作者授权云+社区发表,未经许可,不得转载。

如有侵权,请联系 yunjia_community@tencent.com 删除。

编辑于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏技术记录

java小技术之生成二维码

把我们需要的链接或者内容生成二维码其实是一件非常容易的事情,有很多办法可以实现,这里我们采用JS方法生成。 <!DOCTYPE HTML PUBLIC "-//...

1896
来自专栏一个会写诗的程序员的博客

Spring Reactor 项目核心库Reactor Core

Non-Blocking Reactive Streams Foundation for the JVM both implementing a Reactiv...

2712
来自专栏一个爱瞎折腾的程序猿

sqlserver使用存储过程跟踪SQL

USE [master] GO /****** Object: StoredProcedure [dbo].[sp_perfworkload_trace_s...

2810
来自专栏张善友的专栏

Mix 10 上的asp.net mvc 2的相关Session

Beyond File | New Company: From Cheesy Sample to Social Platform Scott Hansel...

2767
来自专栏闻道于事

js登录滑动验证,不滑动无法登陆

js的判断这里是根据滑块的位置进行判断,应该是用一个flag判断 <%@ page language="java" contentType="text/html...

8338
来自专栏我和未来有约会

Silverlight第三方控件专题

这里我收集整理了目前网上silverlight第三方控件的专题,若果有所遗漏请告知我一下。 名称 简介 截图 telerik 商 RadC...

4355
来自专栏魂祭心

原 canvas绘制clock

4994
来自专栏大内老A

The .NET of Tomorrow

Ed Charbeneau(http://developer.telerik.com/featured/the-net-of-tomorrow/) Exciti...

37810
来自专栏陈仁松博客

ASP.NET Core 'Microsoft.Win32.Registry' 错误修复

今天在发布Asp.net Core应用到Azure的时候出现错误InvalidOperationException: Cannot find compilati...

5198
来自专栏杨龙飞前端

scrollto 到指定位置

2894

扫码关注云+社区