MySQL的执行过程包括多个子阶段:语法分析、语义检查、逻辑优化、物理优化和执行。其中逻辑优化和物理优化统称为查询优化。一个查询优化器的输入是查询树,输出是查询执行计划。

逻辑优化也称为基于规则的查询优化(Rule Based Optimization,简称RBO)。主要是对查询进行逻辑上的等价变换,目的是通过这些变换提高查询的性能。 物理优化也称为基于代价的查询优化(Cost-based Optimization,简称CBO)。主要是通过一些模型,预测一个查询使用某种执行计划时的成本,并选择其中成本最小的一个。
SemiJoin是表上的一个运算。假定R和S是两个表,R SemiJoin S可记为R ⋉ S,结果是在S中满足条件的元组。例如下面的例子是“雇员”和“部门”和它们的半连接的表:
雇员SemiJoin部门 where 雇员.DeptName=部门.DeptName
雇员Join部门 where 雇员.DeptName=部门.DeptName
从上面的结果可以得出如下结论: ①半连接的结果是左表的子集。 ②增加右表一条重复记录,结果并不随之增多。
通用的转换格式如下:
SELECT ...
FROM ot1 ... otN
WHERE (oe1, ... oeM) IN (SELECT ie1, ..., ieM
FROM it1 ... itK
[WHERE inner-cond])
[AND outer-cond]
[GROUP BY ...] [HAVING ...] [ORDER BY ...]如上格式的子查询被转换为semijoin的格式如下(ot表示外表,it表示内表):
SELECT ...
FROM (ot1 ... otN) SJ (it1 ... itK)
ON (oe1, ... oeM) = (ie1, ..., ieM)
[AND inner-cond]
[WHERE outer-cond]
[GROUP BY ...] [HAVING ...] [ORDER BY ...]从上面的结构可以看到,转化是支持多列的。
当执行如下SQL时,也是先遍历雇员表,然后在部门表里查找是否匹配。实际上和上面SemiJoin的语义是一致的,结果也一样。
select * from 雇员 where 雇员.DeptName in (select 部门.DeptName from 部门)
具体Mysql将In转化成SemiJoin的代码逻辑实现在如下函数里:
bool SELECT_LEX::flatten_subqueries()
1. 必须是In或any子查询
2. 不包含UNION 按照扁平化规则后,UNION会导致结果不一致。比如:t1,t2和t3表定义和数据如下:

则执行如下SQL:select a from t1 where a in (select b from t2 UNION select c from t3); 正确结果应该为:

但是SemiJoin转换后的SQL为:select a from t1 semijoin t2 where t1.a=t2.b UNION select a from t1 semijoin t3 where t1.a=t3.c; 由于UNION的去重,导致结果为:

另外即使是UNION ALL,如果t2与t3存在相同记录,结果也是不正确的。
3. 不包含GROUP BY 或 HAVING 含有groupby的SQL,转换后(即扁平化后),导致与原来语义不一致了。比如 原来是:select a from t1 where b in (select b from t2 group by b) 按照规则转换成semijoin后为:select a from t1 semijoin (t2) where t1.b = t2.b group by t2.b; 由于a1不在group by内,语法错误。
4. 不包含聚集函数。通过测试,确实没有走SemiJoin。
mysql> explain select * from t1 where a in (select count(*) from t2 where t1.b=t2.b group by a);
+----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 25.00 | Using where; Using temporary |
+----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
2 rows in set, 2 warnings (0.00 sec)
mysql> show warnings;
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1276 | Field or reference 'ldy.t1.b' of SELECT #2 was resolved in SELECT #1 |
| Note | 1003 | /* select#1 */ select `ldy`.`t1`.`a` AS `a`,`ldy`.`t1`.`b` AS `b` from `ldy`.`t1` where <in_optimizer>(`ldy`.`t1`.`a`,<exists>(/* select#2 */ select 1 from `ldy`.`t2` where (`ldy`.`t1`.`b` = `ldy`.`t2`.`b`) group by `ldy`.`t2`.`a` having (<cache>(`ldy`.`t1`.`a`) = <ref_null_helper>(count(0))))) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)5. 不包含LIMIT。通过测试,控制台直接报错。
mysql> select * from t1 where a in (select a from t2 limit 2);
ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
mysql>6. 子查询有ORDER BY时,转换后去掉ORDER BY
7. 外部查询没有STRAIGHT_JOIN(错) 关键字STRAIGHT_JOIN表明,该查询的JOIN顺序不需要优化,按照原来SQL的顺序依次进行。 MYSQL官方文档说有限制,经测试没有限制。比如,如下依然能转换为semijoin:
mysql> select * from t3;
mysql> explain select t1.*,t2.* from t2 STRAIGHT_JOIN t1 on t1.b=t2.b where t1.a in (select a from t3);
+----+--------------+-------------+------------+--------+---------------+------------+---------+----------+------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+-------------+------------+--------+---------------+------------+---------+----------+------+----------+----------------------------------------------------+
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | NULL |
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 8 | 12.50 | Using where; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE | <subquery2> | NULL | eq_ref | <auto_key> | <auto_key> | 5 | ldy.t1.a | 1 | 100.00 | NULL |
| 2 | MATERIALIZED | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | NULL |
+----+--------------+-------------+------------+--------+---------------+------------+---------+----------+------+----------+----------------------------------------------------+
4 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `ldy`.`t1`.`a` AS `a`,`ldy`.`t1`.`b` AS `b`,`ldy`.`t2`.`a` AS `a`,`ldy`.`t2`.`b` AS `b` from `ldy`.`t2` semi join (`ldy`.`t3`) straight_join `ldy`.`t1` where ((`ldy`.`t1`.`b` = `ldy`.`t2`.`b`) and (`<subquery2>`.`a` = `ldy`.`t1`.`a`)) |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)8. 是否相关子查询均支持。
注:测试所用的数据参考如下 数据库名为ldy,表如下:
mysql> select * from t1;
+------+------+
| a | b |
+------+------+
| 0 | 1 |
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
| 4 | 5 |
| 5 | 6 |
| 6 | 7 |
| 7 | 8 |
+------+------+
8 rows in set (0.00 sec)
mysql> select * from t2;
+------+------+
| a | b |
+------+------+
| 0 | 1 |
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
+------+------+
4 rows in set (0.00 sec)
mysql> select * from t3;
+------+------+
| a | b |
+------+------+
| 0 | 1 |
| 2 | 3 |
| 3 | 4 |
+------+------+
3 rows in set (0.00 sec)例子1: 二重In子查询嵌套SQL通过两次SemiJoin优化成单层SQL,参考如下:
mysql> explain select * from t1 where t1.a in (select a from t2 where b in (select b from t3));
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+
| 1 | SIMPLE | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Start temporary |
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 25.00 | Using where; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 8 | 12.50 | Using where; End temporary; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+
3 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `ldy`.`t1`.`a` AS `a`,`ldy`.`t1`.`b` AS `b` from `ldy`.`t1` semi join (`ldy`.`t3` join `ldy`.`t2`) where ((`ldy`.`t1`.`a` = `ldy`.`t2`.`a`) and (`ldy`.`t2`.`b` = `ldy`.`t3`.`b`)) |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)例子2: 并列in查询,and条件下能够转成SemiJoin:
mysql> explain select * from t1 where t1.a in (select a from t2) and t1.b in (select b from t3);
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------------------------------+
| 1 | SIMPLE | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Start temporary |
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 8 | 12.50 | Using where; End temporary; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 25.00 | Using where; FirstMatch(t1); Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------------------------------+
3 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `ldy`.`t1`.`a` AS `a`,`ldy`.`t1`.`b` AS `b` from `ldy`.`t1` semi join (`ldy`.`t2`) semi join (`ldy`.`t3`) where ((`ldy`.`t1`.`b` = `ldy`.`t3`.`b`) and (`ldy`.`t2`.`a` = `ldy`.`t1`.`a`)) |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)例子3,并列in查询,or条件下不能转成semijoin:
mysql> explain select * from t1 where t1.a in (select a from t2) or t1.b in (select b from t3);
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 8 | 100.00 | Using where |
| 3 | SUBQUERY | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | NULL |
| 2 | SUBQUERY | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `ldy`.`t1`.`a` AS `a`,`ldy`.`t1`.`b` AS `b` from `ldy`.`t1` where (<in_optimizer>(`ldy`.`t1`.`a`,`ldy`.`t1`.`a` in ( <materialize> (/* select#2 */ select `ldy`.`t2`.`a` from `ldy`.`t2` where 1 ), <primary_index_lookup>(`ldy`.`t1`.`a` in <temporary table> on <auto_key> where ((`ldy`.`t1`.`a` = `materialized-subquery`.`a`))))) or <in_optimizer>(`ldy`.`t1`.`b`,`ldy`.`t1`.`b` in ( <materialize> (/* select#3 */ select `ldy`.`t3`.`b` from `ldy`.`t3` where 1 ), <primary_index_lookup>(`ldy`.`t1`.`b` in <temporary table> on <auto_key> where ((`ldy`.`t1`.`b` = `materialized-subquery`.`b`)))))) |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)例子4: 支持多列SemiJoin:
mysql> explain select * from t1 where (a, b) in (select a, b from t2);
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | Start temporary |
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 8 | 12.50 | Using where; End temporary; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `ldy`.`t1`.`a` AS `a`,`ldy`.`t1`.`b` AS `b` from `ldy`.`t1` semi join (`ldy`.`t2`) where ((`ldy`.`t1`.`b` = `ldy`.`t2`.`b`) and (`ldy`.`t1`.`a` = `ldy`.`t2`.`a`)) |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)3. SemiJoin的几种执行策略
转换成SemiJoin后,MySQL将要根据权重,采用如下执行策略(以下策略也能通过Mysql的修改系统变量optimizer_switch,将某些策略关闭或打开)。假设有两个表:国家表(Country)和城市表(City),查找存在某一城市人口大于该国人口33%的所有国家。查询语句如下:
select *
from Country
where
Country.code IN (select City.Country
from City
where
City.Population > 0.33 * Country.Population and
City.Population > 1*1000*1000);1. Duplicate Weedout:将semi-join当做普通innerJoin,之后使用临时表将重复项去除。 举例:Country SemiJoin City的执行过程如下:

Country与City内连接后有German有三条记录,通过建立临时表,由于临时表的主键是Country表的RowId,重复项将不能插入。具体可以看到,去重过程如下:

2. FirstMatch:执行join只挑选第一次匹配的项,非常类似于In语义。 举例,Country InnerJoin City的嵌套循环流程如下,先遍历Country再遍历City:

当是SemiJoin时,在City里找到一个,就不需要继续找了。

3. LooseScan:sj-inner表将重复项分组,选出一个即返回。 查询拥有人造卫星的国家,数据库有国家表(Country)和人造卫星表(Satellite)。查询SQL如下:
select * from Country
where
Country.code in (select country_code from Satellite)假如Satellite.country_code是索引,通过按索引顺序遍历,发现已经是分组的了。

利用索引很容易从一个组调到另一组,而一组内只需要判断第一条记录即可,具体如下:

4. Materialize:
假如不是相关子查询,可以先物化内表为一个临时表,由于该临时表在条件字段上采用了索引,保证了唯一性(即消除了重复字段),SemiJoin的结果就和innerJoin相同了。该临时表与外表执行JOIN。

根据临时表与外表Join过程,采用类似NestedLoop形式,根据先遍历临时表还是外表,进一步包括Lookup和Scan两种。Lookup为物化inner表(该表包含一个索引,该索引既可以删除重复元素,也可用于后面执行Join运算),先循环遍历outer表,并在物化表查找。
Scan为物化inner表(该表包含一个索引,该索引既可以删除重复元素,也可用于后面执行Join运算),先循环遍历物化表,并在outer表查找。
https://mariadb.com/kb/en/library/optimization-strategies/ https://dev.mysql.com/doc/refman/8.0/en/semi-joins.html

腾讯数据库技术团队对内支持微信红包,彩票、数据银行等集团内部业务,对外为腾讯云提供各种数据库产品,如CDB、CTSDB、CKV、CMongo, 腾讯数据库技术团队专注于增强数据库内核功能,提升数据库性能,保证系统稳定性并解决用户在生产过程中遇到的问题,并对生产环境中遇到的问题及知识进行分享。
var first_sceen__time = (+new Date());if ("" == 1 && document.getElementById('js_content')) { document.getElementById('js_content').addEventListener("selectstart",function(e){ e.preventDefault(); }); } (function(){ if (navigator.userAgent.indexOf("WindowsWechat") != -1){ var link = document.createElement('link'); var head = document.getElementsByTagName('head')[0]; link.rel = 'stylesheet'; link.type = 'text/css'; link.href = "//res.wx.qq.com/mmbizwap/zh_CN/htmledition/style/page/appmsg_new/winwx45ba31.css"; head.appendChild(link); } })();
腾讯数据库技术
赞赏
长按二维码向我转账

受苹果公司新规定影响,微信 iOS 版的赞赏功能被关闭,可通过二维码转账支持公众号。
阅读
分享 在看
已同步到看一看
取消 发送
我知道了
确定

已同步到看一看写下你的想法
最多200字,当前共字 发送
已发送
确定
写下你的想法...
取消
确定
最多200字,当前共字
发送中
微信扫一扫 关注该公众号
微信扫一扫 使用小程序
即将打开""小程序
取消 打开