MySQL 案例：analyze，慢查询，与查询无响应

原创

王文安@DBA

修改于 2020-10-26 20:19:13

2.8K0

修改于 2020-10-26 20:19:13

问题描述

有时候，遇到同样的 SQL 语句在正式环境的主库和只读实例的执行时间相距甚远时，第一时间就会想到是不是采样信息不一致，导致执行计划不准，从一个高效的查询变成了慢查询。找到问题所在之后，自然是 analyze 一下，重新采集信息就好，这个时候，却发现 analyze 表上的所有 select 突然卡住了，不返回任何结果。

解决方案

如果这种现象已经发生了，可以尝试 kill 掉“最早的”那些慢查询。

即如果 tb1 上有慢查询，且进行了 analyze 后遇到了问题，找一下 tb1 上在 analyze 之前已经开始执行，但是没结束的慢查询，然后全部 kill 掉。

问题还原

先来构造一下场景：

CREATE TABLE `stu` (
  `id` int(11) NOT NULL,
  `name` varchar(16) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_name` (`name`),
  KEY `idx_age` (`age`),
  KEY `idx_n_a` (`name`,`age`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

INSERT INTO `stu` VALUES (9,'adam',25),(7,'carlos',25),(1,'dave',19),(5,'sam',22),(3,'tom',22),(11,'zoe',29);

这时候来伪造一个长时间执行的慢查询：

mysql> select sleep(3600) from stu;

然后在其他的 session 模拟 analyze 和 select 的操作：

mysql> analyze table stu;
+----------+---------+----------+----------+
| Table    | Op      | Msg_type | Msg_text |
+----------+---------+----------+----------+
| test.stu | analyze | status   | OK       |
+----------+---------+----------+----------+
1 row in set (0.00 sec)

mysql> select * from stu limit 1;

这时候会发现这个 limit 1 的语句也会被阻塞，而且也不会触发innodb_lock_wait_timeout。

如果在其他的 session 看 processlist，会发现如下等待事件：

mysql> show processlist;
+-----+------+-----------------+--------------------+---------+------+-------------------------+-----------------------------+
| Id  | User | Host            | db                 | Command | Time | State                   | Info                        |
+-----+------+-----------------+--------------------+---------+------+-------------------------+-----------------------------+
| 457 | root | 127.0.0.1:48650 | sbtest             | Sleep   | 4860 |                         | NULL                        |
| 458 | root | 127.0.0.1:48652 | sbtest             | Sleep   | 4851 |                         | NULL                        |
| 473 | root | 127.0.0.1:49512 | performance_schema | Sleep   | 4834 |                         | NULL                        |
| 477 | root | 127.0.0.1:52364 | test               | Query   |   26 | User sleep              | select sleep(3600) from stu |
| 478 | root | 127.0.0.1:53124 | test               | Query   |   10 | Waiting for table flush | select * from stu limit 1   |
| 479 | root | 127.0.0.1:53944 | sbtest             | Query   |    0 | starting                | show processlist            |
| 480 | root | 127.0.0.1:53946 | sbtest             | Sleep   |  958 |                         | NULL                        |
+-----+------+-----------------+--------------------+---------+------+-------------------------+-----------------------------+
7 rows in set (0.00 sec)

mysql>

原因分析

那么问题已经明了，且等待事件也清楚地指向了Waiting for table flush，那么从这个地方入手，看看原因是什么。先看看官方文档的解释：

Waiting for table flush The thread is executing FLUSH TABLES and is waiting for all threads to close their tables, or the thread got a notification that the underlying structure for a table has changed and it needs to reopen the table to get the new structure. However, to reopen the table, it must wait until all other threads have closed the table in question.This notification takes place if another thread has used FLUSH TABLES or one of the following statements on the table in question: FLUSH TABLES tbl_name, ALTER TABLE, RENAME TABLE, REPAIR TABLE, ANALYZE TABLE, or OPTIMIZE TABLE.

可以看到这个状态出现的原因已经写得很清楚了：因为这个表的结构已经“改变”，所以新线程在打开 table 的时候需要等其他的线程先关闭。

那么再看一下analyze到底干了什么，引用官方文档的内容：

ANALYZE TABLE removes the table from the table definition cache, which requires a flush lock. If there are long running statements or transactions still using the table, subsequent statements and transactions must wait for those operations to finish before the flush lock is released. Because ANALYZE TABLE itself typically finishes quickly, it may not be apparent that delayed transactions or statements involving the same table are due to the remaining flush lock. ...... ANALYZE TABLE clears table statistics from the INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS table and sets the STATS_INITIALIZED column to Uninitialized. Statistics are collected again the next time the table is accessed.

通过描述可以看到analyze会尝试获取 flush 锁，同时重新做数据取样的操作其实是由下一个select发起的。

那么问题变成了：实际阻塞的时候，是在重新做数据取样时，还是在等待其他线程关闭 table？

一些准备知识

首先要了解一下 MySQL 的两个东西：table_defination 和 table_open_cache，简单来说，一个 Client 想 open table 的时候，会先尝试从 cache 里面拿，如果这个表有“新版本”，或者 cache 没有的时候，就会从 table_defination 去 copy 一份最新的数据。

详细的分析

在上文构造的环境里面，扫一下堆栈的信息，看看发生了什么，去掉干扰信息，找到 select 语句的信息：

futex_abstimed_wait_cancelable,
  __pthread_cond_wait_common,
    __pthread_cond_timedwait,
      MDL_wait::timed_wait,
        TABLE_SHARE::wait_for_old_version,
          open_table,
            open_tables,
              open_tables_for_query,
                ::??,
                  mysql_execute_command,
                    mysql_parse,
                      dispatch_command,
                        do_command,
                          handle_connection,
                            pfs_spawn_thread,
                              start_thread,clone

很明显的看到语句处于等待状态，而且是wait for old version，看起来有点奇怪，那么看看这个open_table 函数在干嘛：

open_tables()
{
...
  if (!(flags & MYSQL_OPEN_IGNORE_FLUSH))
  {
    if (share->has_old_version()) // 如果存在 old_version
    {
      release_table_share(share);
      mysql_mutex_unlock(&LOCK_open);

      MDL_deadlock_handler mdl_deadlock_handler(ot_ctx);
      bool wait_result;
...

      wait_result= tdc_wait_for_old_version(thd, table_list->db,
                                            table_list->table_name,
                                            ot_ctx->get_timeout(),
                                            deadlock_weight);

      thd->pop_internal_handler();
...
    if (thd->open_tables && thd->open_tables->s->version != share->version)
    //如果存在不同的version，那么需要释放掉所有该表的cache，然后reopen
    {
      release_table_share(share);
      mysql_mutex_unlock(&LOCK_open);
      (void)ot_ctx->request_backoff_action(Open_table_context::OT_REOPEN_TABLES,
                                           NULL);
      DBUG_RETURN(TRUE);
    }
}
......

tdc_wait_for_old_version(THD *thd, const char *db, const char *table_name,
                         ulong wait_timeout, uint deadlock_weight)
{
  TABLE_SHARE *share;
  bool res= FALSE;

  mysql_mutex_lock(&LOCK_open);
  if ((share= get_cached_table_share(thd, db, table_name)) &&
      share->has_old_version())  
  //在这里获取表并进行表的version判断，如果old_version一直存在的话，进入if代码
  {
    struct timespec abstime;
    set_timespec(&abstime, wait_timeout);
    res= share->wait_for_old_version(thd, &abstime, deadlock_weight);
  }
  mysql_mutex_unlock(&LOCK_open);
  return res;
}

可以看到 open_table 发现有 old_version 存在的时候，会调用 tdc_wait_for_old_version，如果这个表的 old_version 一直存在，则会一直等待。所以这个 select 语句其实一直处于等待状态，等待 old_version 的表 cache 被释放。

而这个 version，在 MySQL 中用来标记 table_defination 的版本，这个 version 更新了，则代表这个表的结构“发生了变化”，所有该表的 cache 都是失效的，不能再继续使用。这个变量在 MySQL 中是refresh_version。

所以可以判断出，analyze table 递增了这个 refresh_version，虽然代码注释中写明了目前仅在 flush_table 的时候才会变更，不过测试环境中也只有 analyze 这个操作，联系 analyze 操作会尝试获取 flush 锁，所以可能 analyze 在实现的时候也利用了 flush 的机制吧。

PS：实际上如果后来执行的不是 select，而是继续对这个表进行 analyze 的话，也会被阻塞。