线上遇到的让MySQL直接crash的bug

文章来源：企鹅号 - 数据库随笔

简短地描述：

过程1：机器无缘故异常宕机。

过程2：机器重启之后发现起不来。

过程3：然后更换硬件，更换了cpu之后机器起来了。

过程4：然后作者开心的把数据库起来了。然后登陆数据库，妥妥地，没有毛病。

过程5：该数据库之前是主库，机器宕机之后，自动发生主从切换了。所以，准备验证一下数据，验证新主库跟老主库之间的数据是否一致，新主库是否丢失数据。验证完毕之后，新主库接管时没丢失数据，妥妥滴。

过程6：因为数据没多没少，所以准备直接将该数据库作为新主库的从库。所以执行了change master 命令。

过程7：执行start slave 命令，然后瞬间发现mysqld 狗带了，自动重启。虽然是在mysql一线运维（干苦力）很多年的老dba, 但这种情况还真是蛮少遇到滴--因为机器宕机直接把mysql数据库搞歇菜。

然后看mysqld 的error log .

2018-09-06T18:33:47.475065+08:00 5 [Warning] Slave SQL for channel '': If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 02018-09-06T18:33:47.475172+08:00 5 [Note] Slave SQL thread for channel '' initialized, starting replication in log 'mysql-bin.000005' at position 1063042206, relay log '/mysqldata/myinst1/binlog/relay-log.000002' position: 4251212018-09-06T18:33:47.478127+08:00 5 [Note] Slave for channel '': MTS Recovery has completed at relay log /mysqldata/myinst1/binlog/relay-log.000002, position 473555 master log mysql-bin.000005, position 1063090640.2018-09-06T18:33:52.516543+08:00 6 [Warning] Timeout waiting for reply of binlog (file: mysql-bin.000015, pos: 7595), semi-sync up to file , position 0.2018-09-06T18:33:52.516580+08:00 6 [Note] Semi-sync replication switched OFF.2018-09-06 18:33:52 0x7fbce99d4700 InnoDB: Assertion failure in thread 140449349977856 in file fut0lst.ic line 85InnoDB: Failing assertion: addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATAInnoDB: We intentionally generate a memory trap.InnoDB: Submit a detailed bug report to http://bugs.mysql.com.InnoDB: If you get repeated assertion failures or crashes, evenInnoDB: immediately after the mysqld startup, there may beInnoDB: corruption in the InnoDB tablespace. Please refer toInnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.htmlInnoDB: about forcing recovery.10:33:52 UTC - mysqld got signal 6 ;This could be because you hit a bug. It is also possible that this binaryor one of the libraries it was linked against is corrupt, improperly built,or misconfigured. This error can also be caused by malfunctioning hardware.Attempting to collect some information that could help diagnose the problem.As this is a crash and something is definitely wrong, the informationcollection process might fail.key_buffer_size=268435456read_buffer_size=8388608max_used_connections=1max_threads=2000thread_count=23connection_count=1It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 20768831 K bytes of memoryHope that's ok; if not, decrease some variables in the equation.

上面的日志一大堆，但有用的信息就两行：

InnoDB: Failing assertion: addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA

根据in file fut0lst.ic line 85 找到下面的函数：

Reads a file address.

@return file address */

UNIV_INLINE

fil_addr_t

flst_read_addr(

/*===========*/

const fil_faddr_t* faddr, /*!

faddress */

mtr_t* mtr) /*!

{

fil_addr_t addr;

ut_ad(faddr && mtr);

addr.page = mtr_read_ulint(faddr + FIL_ADDR_PAGE, MLOG_4BYTES,

mtr);

addr.boffset = mtr_read_ulint(faddr + FIL_ADDR_BYTE,

MLOG_2BYTES,

mtr);

ut_a(addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA);

ut_a(ut_align_offset(faddr, UNIV_PAGE_SIZE) >= FIL_PAGE_DATA);

return(addr);

}

问题出在“ ut_a(addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA);“ 这里。因获取到的addr 信息，不满足上面的条件。

为啥获取的文件地址跟需要的有差异了？可能是服务器宕机时，破坏了这个一致性，问题在哪里？

继续捋代码。

/********************************************************************//**

Writes a file address. */

UNIV_INLINE

void

flst_write_addr(

/*============*/

fil_faddr_t* faddr, /*!

fil_addr_t addr, /*!

mtr_t* mtr) /*!

{

ut_ad(faddr && mtr);

ut_ad(mtr_memo_contains_page_flagged(mtr, faddr,

MTR_MEMO_PAGE_X_FIX

| MTR_MEMO_PAGE_SX_FIX));

ut_a(addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA);

ut_a(ut_align_offset(faddr, UNIV_PAGE_SIZE) >= FIL_PAGE_DATA);

mlog_write_ulint(faddr + FIL_ADDR_PAGE, addr.page, MLOG_4BYTES,

mtr);

mlog_write_ulint(faddr + FIL_ADDR_BYTE, addr.boffset,

MLOG_2BYTES, mtr);

}

问题在上面这个函数中的这两行，当执行完

mlog_write_ulint(faddr + FIL_ADDR_PAGE, addr.page, MLOG_4BYTES,

mtr); 这行代码，而下一行还没有执行时，服务器就宕机了。则这个faddr 记录的信息就不完整了，导致了上面的 ut_a(addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA); 判断不通过，造成mysqld crash .

如果没有搭建从库，也没有备份，大家会如何处理？请说说呗！

。。。。。。。。。。。。。

发表于: 2018-09-072018-09-07 06:23:07
原文链接：https://kuaibao.qq.com/s/20180907G078RV00?refer=cp_1026
腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号（企鹅号）传播渠道之一，根据《腾讯内容开放平台服务协议》转载发布内容。
如有侵权，请联系 cloudcommunity@tencent.com 删除。

扫码

添加站长进交流群

领取专属 10元无门槛券

私享最新 技术干货

线上遇到的让MySQL直接crash的bug

相关快讯

扫码

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐