前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >show status和set gtid_mode 导致线程死锁案例

show status和set gtid_mode 导致线程死锁案例

作者头像
用户1278550
发布2019-05-07 15:06:54
8880
发布2019-05-07 15:06:54
举报
文章被收录于专栏:idbaidba

特别感谢远在重庆的高鹏和我们组的小伙伴王航威一起分析本次案例,推荐横屏查看

一 现象

我们数据库组今年上半年的计划之一是将所有数据库实例打开GTID特性。在线上进行灰度开启GITD过程中遇到数据库hang。具体表现是执行如下命令时:

set global gtid_mode=OFF_PERMISSIVE;

数据库hang住,阻塞读写请求,新建连接失败,业务不可用。

二 复现

高频率执行show global status 和 set gtid_mode ,MySQL实例很容易就hang住了。

代码语言:javascript
复制
for i in `seq  1 50000`;
do
 my 3307 -e 'show  global status' > /tmp/1 ;
 sleep 1
done
代码语言:javascript
复制
for i in `seq  1 50000`;
do
  my 3307 -e 'SET @@GLOBAL.GTID_MODE = OFF_PERMISSIVE;';
  my 3307 -e 'SET @@GLOBAL.GTID_MODE = ON_PERMISSIVE;';
  my 3307 -e 'SET @@GLOBAL.GTID_MODE = ON;';
  my 3307 -e 'SET @@GLOBAL.GTID_MODE = ON_PERMISSIVE;';
  my 3307 -e 'SET @@GLOBAL.GTID_MODE = OFF_PERMISSIVE;';
  my 3307 -e 'SET @@GLOBAL.GTID_MODE = OFF;';
done

三 分析

3.1 获取数据库hang的时候的堆栈

pstack $pidofmysql

核心堆栈信息分别是 thread 2 执行show global status 命令 thread 4 执行set global gtid_mode 命令

3.2 使用gdb 分析当时阻塞到什么地方

注意: 堆栈信息的执行顺序实际上是从下往上的,#n大的先执行,我们分析的时候也需要向上到这来,比如thread 4 从#23开始

3.3 show global status 的堆栈
代码语言:javascript
复制
(gdb) thread 4
[Switching to thread 4 (Thread 0x7f8675f0b700 (LWP 49624))]#0  0x00007f86758ef334 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb)
[Switching to thread 4 (Thread 0x7f8675f0b700 (LWP 49624))]#0  0x00007f86758ef334 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f86758ef334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f86758ea60e in _L_lock_995 () from /lib64/libpthread.so.0
#2  0x00007f86758ea576 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000019219b9 in native_mutex_lock (mutex=0x2e20e48) at /root/percona-server-5.7.22-22/include/thr_mutex.h:84
#4  0x0000000001921c21 in safe_mutex_lock (mp=0x2e20e20, try_lock=0 '\000', file=0x21f2168 "/root/percona-server-5.7.22-22/sql/sql_show.cc", line=3127)
    at /root/percona-server-5.7.22-22/mysys/thr_mutex.c:109
#5  0x000000000162e890 in my_mutex_lock (mp=0x2e20e20, file=0x21f2168 "/root/percona-server-5.7.22-22/sql/sql_show.cc", line=3127)
    at /root/percona-server-5.7.22-22/include/thr_mutex.h:180
#6  0x000000000162e98b in inline_mysql_mutex_lock (that=0x2e20e20, src_file=0x21f2168 "/root/percona-server-5.7.22-22/sql/sql_show.cc", src_line=3127)
    at /root/percona-server-5.7.22-22/include/mysql/psi/mysql_thread.h:730
#7  0x0000000001636798 in show_status_array (thd=0x7f83a7c19000, wild=0x0, variables=0x2da0780, value_type=OPT_GLOBAL, status_var=0x7f8675f07ef0,
    prefix=0x7f8675f07dc0 "Innodb", tl=0x7f83a7c27c58, ucase_names=false, cond=0x0) at /root/percona-server-5.7.22-22/sql/sql_show.cc:3127
#8  0x0000000001636700 in show_status_array (thd=0x7f83a7c19000, wild=0x0, variables=0x7f8673c60420, value_type=OPT_GLOBAL, status_var=0x7f8675f07ef0, prefix=0x21f2f84 "",
    tl=0x7f83a7c27c58, ucase_names=false, cond=0x0) at /root/percona-server-5.7.22-22/sql/sql_show.cc:3116
#9  0x0000000001648deb in fill_status (thd=0x7f83a7c19000, tables=0x7f83a7c27c58, cond=0x0) at /root/percona-server-5.7.22-22/sql/sql_show.cc:8084
#10 0x000000000164b2e9 in do_fill_table (thd=0x7f83a7c19000, table_list=0x7f83a7c27c58, qep_tab=0x7f83a7c28d78) at /root/percona-server-5.7.22-22/sql/sql_show.cc:8791
#11 0x000000000164b7f3 in get_schema_tables_result (join=0x7f83a7c28698, executed_place=PROCESSED_BY_JOIN_EXEC) at /root/percona-server-5.7.22-22/sql/sql_show.cc:8921
#12 0x000000000161fc39 in JOIN::prepare_result (this=0x7f83a7c28698) at /root/percona-server-5.7.22-22/sql/sql_select.cc:909
#13 0x000000000157cb71 in JOIN::exec (this=0x7f83a7c28698) at /root/percona-server-5.7.22-22/sql/sql_executor.cc:124
#14 0x000000000161e7ff in handle_query (thd=0x7f83a7c19000, lex=0x7f83a7c1b630, result=0x7f83a7c285d0, added_options=0, removed_options=0)
    at /root/percona-server-5.7.22-22/sql/sql_select.cc:185
#15 0x00000000015d044f in execute_sqlcom_select (thd=0x7f83a7c19000, all_tables=0x7f83a7c27c58) at /root/percona-server-5.7.22-22/sql/sql_parse.cc:5445
#16 0x00000000015c8704 in mysql_execute_command (thd=0x7f83a7c19000, first_level=true) at /root/percona-server-5.7.22-22/sql/sql_parse.cc:2892
#17 0x00000000015d14b6 in mysql_parse (thd=0x7f83a7c19000, parser_state=0x7f8675f0a5c0) at /root/percona-server-5.7.22-22/sql/sql_parse.cc:5901
#18 0x00000000015c504a in dispatch_command (thd=0x7f83a7c19000, com_data=0x7f8675f0ad30, command=COM_QUERY) at /root/percona-server-5.7.22-22/sql/sql_parse.cc:1490
#19 0x00000000015c3dd7 in do_command (thd=0x7f83a7c19000) at /root/percona-server-5.7.22-22/sql/sql_parse.cc:1021
#20 0x000000000170c998 in handle_connection (arg=0x7f83ad7ff460) at /root/percona-server-5.7.22-22/sql/conn_handler/connection_handler_per_thread.cc:312
#21 0x0000000001943acc in pfs_spawn_thread (arg=0x7f8673d87120) at /root/percona-server-5.7.22-22/storage/perfschema/pfs.cc:2190
#22 0x00007f86758e8aa1 in start_thread () from /lib64/libpthread.so.0
#23 0x00007f8674852bcd in clone () from /lib64/libc.so.6
(gdb) f 7
#7  0x0000000001636798 in show_status_array (thd=0x7f83a7c19000, wild=0x0, variables=0x2da0780, value_type=OPT_GLOBAL, status_var=0x7f8675f07ef0,
    prefix=0x7f8675f07dc0 "Innodb", tl=0x7f83a7c27c58, ucase_names=false, cond=0x0) at /root/percona-server-5.7.22-22/sql/sql_show.cc:3127
3127            mysql_mutex_lock(&LOCK_global_system_variables);
分析

根据线程执行的顺序来分析:

关键函数 fill_status 先获取mutex锁 mysql_mutex_lock(&LOCK_status)

代码位置 sql_show.cc:8019

代码语言:javascript
复制
int fill_status(THD *thd, TABLE_LIST *tables, Item *cond)
{
  if (thd->fill_status_recursion_level++ == 0)
    mysql_mutex_lock(&LOCK_status);
  if (thd->fill_status_recursion_level-- == 1)
    mysql_mutex_unlock(&LOCK_status);
}

通 执行show global status的线程阻塞在申请mutex锁 mysql_mutex_lock(&LOCK_global_system_variables) 的地方。

关键函数:show_status_array: (代码片段) 每次获取一个系统状态,就请求一次mysql_mutex_lock(&LOCK_global_system_variables)锁。

代码语言:javascript
复制
 if (!(wild && wild[0] && wild_case_compare(system_charset_info,
                                                 name_buffer, wild)) &&
          (!partial_cond || partial_cond->val_int()))
      {
        const char *pos;
        size_t length;

        mysql_mutex_lock(&LOCK_global_system_variables);
        pos= get_one_variable(thd, var, value_type, show_type, status_var,
                              &charset, buff, &length);
        table->field[1]->store(pos, (uint32) length, charset);
        thd->count_cuted_fields= CHECK_FIELD_IGNORE;
        table->field[1]->set_notnull();
        mysql_mutex_unlock(&LOCK_global_system_variables);

而 LOCK_global_system_variables 则是 mysql_mutex_t 结构。代码位置 mysqld.cc:686

代码语言:javascript
复制
mysql_mutex_t
  LOCK_status, LOCK_uuid_generator,
  LOCK_crypt,
  LOCK_global_system_variables,
  LOCK_user_conn, LOCK_slave_list,
  LOCK_error_messages;
3.4 修改gtid_mode的线程调用栈
代码语言:javascript
复制
gdb) thread 2
[Switching to thread 2 (Thread 0x7f8675ea9700 (LWP 27921))]#0  0x00007f86758ef334 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f86758ef334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f86758ea60e in _L_lock_995 () from /lib64/libpthread.so.0
#2  0x00007f86758ea576 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000019219b9 in native_mutex_lock (mutex=0x2e20cc8) at /root/percona-server-5.7.22-22/include/thr_mutex.h:84
#4  0x0000000001921c21 in safe_mutex_lock (mp=0x2e20ca0, try_lock=0 '\000', file=0x22c7b38 "/root/percona-server-5.7.22-22/sql/binlog.cc", line=10145)
    at /root/percona-server-5.7.22-22/mysys/thr_mutex.c:109
#5  0x0000000001873fa9 in my_mutex_lock (mp=0x2e20ca0, file=0x22c7b38 "/root/percona-server-5.7.22-22/sql/binlog.cc", line=10145)
    at /root/percona-server-5.7.22-22/include/thr_mutex.h:180
#6  0x000000000187427e in inline_mysql_mutex_lock (that=0x2e20ca0, src_file=0x22c7b38 "/root/percona-server-5.7.22-22/sql/binlog.cc", src_line=10145)
    at /root/percona-server-5.7.22-22/include/mysql/psi/mysql_thread.h:730
#7  0x000000000188a903 in MYSQL_BIN_LOG::publish_coordinates_for_global_status (this=0x2e68720) at /root/percona-server-5.7.22-22/sql/binlog.cc:10145
#8  0x0000000001885115 in MYSQL_BIN_LOG::rotate (this=0x2e68720, force_rotate=true, check_purge=0x7f8675ea6aff) at /root/percona-server-5.7.22-22/sql/binlog.cc:7777
#9  0x00000000016ba82c in Sys_var_gtid_mode::global_update (this=0x2e52820, thd=0x7f83a8419000, var=0x7f83a8427ce8) at /root/percona-server-5.7.22-22/sql/sys_vars.h:2951
#10 0x00000000014f848d in sys_var::update (this=0x2e52820, thd=0x7f83a8419000, var=0x7f83a8427ce8) at /root/percona-server-5.7.22-22/sql/set_var.cc:184
#11 0x00000000014f9d0c in set_var::update (this=0x7f83a8427ce8, thd=0x7f83a8419000) at /root/percona-server-5.7.22-22/sql/set_var.cc:836
#12 0x00000000014f94d7 in sql_set_variables (thd=0x7f83a8419000, var_list=0x7f83a841ba08, free_joins=true) at /root/percona-server-5.7.22-22/sql/set_var.cc:672
#13 0x00000000015cb410 in mysql_execute_command (thd=0x7f83a8419000, first_level=true) at /root/percona-server-5.7.22-22/sql/sql_parse.cc:3858
#14 0x00000000015d14b6 in mysql_parse (thd=0x7f83a8419000, parser_state=0x7f8675ea85c0) at /root/percona-server-5.7.22-22/sql/sql_parse.cc:5901
#15 0x00000000015c504a in dispatch_command (thd=0x7f83a8419000, com_data=0x7f8675ea8d30, command=COM_QUERY) at /root/percona-server-5.7.22-22/sql/sql_parse.cc:1490
#16 0x00000000015c3dd7 in do_command (thd=0x7f83a8419000) at /root/percona-server-5.7.22-22/sql/sql_parse.cc:1021
#17 0x000000000170c998 in handle_connection (arg=0x7f835efff220) at /root/percona-server-5.7.22-22/sql/conn_handler/connection_handler_per_thread.cc:312
#18 0x0000000001943acc in pfs_spawn_thread (arg=0x7f8673d87320) at /root/percona-server-5.7.22-22/storage/perfschema/pfs.cc:2190
#19 0x00007f86758e8aa1 in start_thread () from /lib64/libpthread.so.0
#20 0x00007f8674852bcd in clone () from /lib64/libc.so.6
(gdb) f 7
#7  0x000000000188a903 in MYSQL_BIN_LOG::publish_coordinates_for_global_status (this=0x2e68720) at /root/percona-server-5.7.22-22/sql/binlog.cc:10145
10145     mysql_mutex_lock(&LOCK_status);
(gdb) f 10
#10 0x00000000014f848d in sys_var::update (this=0x2e52820, thd=0x7fb63c019000, var=0x7fb63c027ce8) at /root/percona-server-5.7.22-22/sql/set_var.cc:184
184       (on_update && on_update(this, thd, OPT_GLOBAL));
(gdb) l
179       to make it safe we'll need value_ptr_unlock().
180     */
181     AutoWLock lock1(&PLock_global_system_variables);
182     AutoWLock lock2(guard);
183     return global_update(thd, var) ||
184       (on_update && on_update(this, thd, OPT_GLOBAL));
185   }
186   else
187   {
188     bool locked= false;
分析

根据线程执行的顺序来分析: sys_var::update 函数执行的时候需要申请 AutoWLock lock1(&PLock_global_system_variables);

代码位置 set_var.cc :32行

代码语言:javascript
复制
static PolyLock_mutex PLock_global_system_variables(&LOCK_global_system_variables);

其实PLock_global_system_variables 也是需要申请mutex锁 &LOCK_global_system_variables。

代码位置 binlog.cc:10145

代码语言:javascript
复制
/** Copy the current binlog coordinates to the variables used for the
not-in-consistent-snapshot case of SHOW STATUS */

void MYSQL_BIN_LOG::publish_coordinates_for_global_status(void) const
{
  mysql_mutex_assert_owner(&LOCK_log);

  mysql_mutex_lock(&LOCK_status);
  strcpy(binlog_global_snapshot_file, log_file_name);
  binlog_global_snapshot_position=
      my_b_inited(&log_file) ? my_b_tell(&log_file) : 0;
  mysql_mutex_unlock(&LOCK_status);
}

设置gtid_mode的线程等待在第7个步骤,进行binlog切换时,此时需要申请mysql_mutex_lock(&LOCK_status)。然而&LOCK_status已经被show globa status的 fill_status 函数先申请获得。show global status需要申请mutex锁&LOCK_global_system_variables 被set gtid_mode的sys_var::update函数持有。于是相互等待死锁。

一图胜千言:

3.4 解释为什么会hang住,其他新的请求无法建立连接。

连接的初始化需要LOCK_global_system_variables, 此mutex被占用,导致新连接无法初始化

代码位置 sql_class.cc:1629

代码语言:javascript
复制
void THD::init(void)
{
  mysql_mutex_lock(&LOCK_global_system_variables);
  ...
  mysql_mutex_unlock(&LOCK_global_system_variables);
3.5 SQL查询时也需要LOCK_global_system_variables, 此mutex被占用,导致查询被阻塞.

代码位置 item_func.cc:7275

代码语言:javascript
复制
void Item_func_get_system_var::fix_length_and_dec()
{
  ...
  mysql_mutex_lock(&LOCK_global_system_variables);
}

四 总结

首先 再次感谢远在重庆的高鹏不辞辛苦的帮忙分析定位以及给出各种解释。通过这次分析我们对gdb和MySQL 源码又有进一步了解。

其次如何规避,我们目前的规避方式是减少show 命令的执行,以后做gtid的变更的时候 会关闭监控以及有赞专门开发的数据库性能快照系统以便和 set gtid_mode 命令冲突。

参考文章

1 深度分析 | GDB定位MySQL5.7特定版本hang死的故障分析

2 使用gdb调试程序完全教程

代码语言:javascript
复制
https://blog.csdn.net/gatieme/article/details/51671430

3 一个 Linux 上分析死锁的简单方法

代码语言:javascript
复制
https://www.ibm.com/developerworks/cn/linux/l-cn-deadlock/index.html
本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2019-04-19,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 yangyidba 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 二 复现
  • 三 分析
    • 3.1 获取数据库hang的时候的堆栈
      • 3.3 show global status 的堆栈
        • 分析
          • 3.4 修改gtid_mode的线程调用栈
            • 分析
              • 3.5 SQL查询时也需要LOCK_global_system_variables, 此mutex被占用,导致查询被阻塞.
              • 四 总结
              • 参考文章
              相关产品与服务
              云数据库 SQL Server
              腾讯云数据库 SQL Server (TencentDB for SQL Server)是业界最常用的商用数据库之一,对基于 Windows 架构的应用程序具有完美的支持。TencentDB for SQL Server 拥有微软正版授权,可持续为用户提供最新的功能,避免未授权使用软件的风险。具有即开即用、稳定可靠、安全运行、弹性扩缩等特点。
              领券
              问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档