首页
学习
活动
专区
圈层
工具
发布
社区首页 >专栏 >MySQL去重:DISTINCT vs GROUP BY,谁才是效率之王?

MySQL去重:DISTINCT vs GROUP BY,谁才是效率之王?

作者头像
俊才
发布2026-02-04 14:35:46
发布2026-02-04 14:35:46
340
举报
文章被收录于专栏:数据库干货铺数据库干货铺

在日常数据库查询中,去除重复数据是最常见的需求之一。面对这种需求,很多开发者习惯随手用 DISTINCT,也有人觉得 GROUP BY 更 “万能”,但很少有人深究:这两者底层原理有何不同?在不同场景下谁的效率更高?今天我们就从原理、效率、使用场景三个维度,彻底讲清楚这对 “去重兄弟” 的区别。

一、一个实际问题的起源

今天,团队的新成员小陈遇到了一个难题:他需要从用户表中统计不重复的城市列表,但查询速度极其缓慢。他尝试了两种写法:

代码语言:javascript
复制
SELECT DISTINCT city FROM users;

以及

代码语言:javascript
复制
SELECT city FROM users GROUP BY city;

但不确定哪种写法更好,以及如何优化。他说看到网上有人说禁止使用distinct,但是他实际测的时候发现其实distinct有的情况下效率比group by又要好,所以更加不确定该怎么用了。对于今天这个经典问题,促使我们重新进行一番审视。

二、 DISTINCT与GROUP BY的本质差异

DISTINCT是SQL专门为去重设计的关键字,它的核心使命很简单——去除重复行,返回唯一值。

代码语言:javascript
复制
-- 单列去重
SELECT DISTINCT department FROM employees;
-- 多列去重(所有列组合相同时才去重)
SELECT DISTINCT department, job_title FROM employees;

GROUP BY的设计初衷是对数据进行分组,以便进行聚合计算(如COUNT、SUM、AVG等)。

代码语言:javascript
复制
-- 计算每个部门的平均工资
SELECT department, AVG(salary) 
FROM employees GROUP BY department;

虽然GROUP BY也可以实现去重效果,但这其实是它分组行为的"副产品"。

三、 性能对决

说到大家最关心的性能问题,其实和是否有索引索引有很大的关系。

  • 当去重列有索引时,DISTINCT和GROUP BY都可以利用索引进行高效扫描,性能差异微乎其微。两者甚至可能使用相同的执行计划。
代码语言:javascript
复制
mysql> explain select  distinct registration_time from customers2;
+----+-------------+------------+------------+-------+---------------------------------------+-----------------------+---------+------+-------+----------+-------------+
| id | select_type | table      | partitions | type  | possible_keys                         | key                   | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+------------+------------+-------+---------------------------------------+-----------------------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | customers2 | NULL       | index | idx_registration_time,idx_vip_regtime | idx_registration_time | 4       | NULL | 53014 |   100.00 | Using index |
+----+-------------+------------+------------+-------+---------------------------------------+-----------------------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select   registration_time from customers2 group by registration_time;
+----+-------------+------------+------------+-------+---------------------------------------+-----------------------+---------+------+-------+----------+-------------+
| id | select_type | table      | partitions | type  | possible_keys                         | key                   | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+------------+------------+-------+---------------------------------------+-----------------------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | customers2 | NULL       | index | idx_registration_time,idx_vip_regtime | idx_registration_time | 4       | NULL | 53014 |   100.00 | Using index |
+----+-------------+------------+------------+-------+---------------------------------------+-----------------------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

  • 真正的性能分水岭出现在无索引的情况下:

在MySQL 8.0之前,GROUP BY会进行隐式排序,可能触发filesort,导致性能下降;而DISTINCT不需要这种排序,因此在无索引时性能更优。

代码语言:javascript
复制
mysql> explain select distinct c1 from  tb2  ;
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows     | filtered | Extra           |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-----------------+
|  1 | SIMPLE      | tb2   | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 15538816 |   100.00 | Using temporary |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-----------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select  c1 from  tb2 group by c1 ;
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows     | filtered | Extra                           |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+---------------------------------+
|  1 | SIMPLE      | tb2   | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 15538816 |   100.00 | Using temporary; Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+---------------------------------+
1 row in set, 1 warning (0.00 sec)

不过,如果查询结果无需排序,则可以显示写上ORDER BY NULL来取消group by结果的排序,例如:

代码语言:javascript
复制
mysql> explain select  c1 from  tb2 group by c1 ;
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows     | filtered | Extra                           |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+---------------------------------+
|  1 | SIMPLE      | tb2   | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 15538816 |   100.00 | Using temporary; Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+---------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select  c1 from  tb2 group by c1  ORDER BY NULL;
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows     | filtered | Extra           |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-----------------+
|  1 | SIMPLE      | tb2   | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 15538816 |   100.00 | Using temporary |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-----------------+
1 row in set, 1 warning (0.00 sec)

但这里有个重要更新:从MySQL 8.0开始,GROUP BY的隐式排序已被移除,两者在无索引情况下的性能差异大大缩小。

代码语言:javascript
复制

mysql> explain select   age from customers2 group by age;
+----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
| id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra           |
+----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
|  1 | SIMPLE      | customers2 | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 53014 |   100.00 | Using temporary |
+----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select  distinct age from customers2;
+----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
| id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra           |
+----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
|  1 | SIMPLE      | customers2 | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 53014 |   100.00 | Using temporary |
+----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
1 row in set, 1 warning (0.00 sec)

四、 总结

下面我们通过一张表来总结一下搞定逻辑

场景

推荐用法

核心原因

纯单字段 、多字段去重(无聚合)

DISTINCT

逻辑更简洁,无索引时效率略高

去重 + 聚合(COUNT/SUM/AVG 等)

GROUP BY

唯一能实现聚合的方式

去重 + 过滤分组结果

GROUP BY + HAVING

DISTINCT 无分组过滤能力

有索引的任意去重场景

两者皆可

索引消除了效率差异,按可读性选择

最后记住核心原则:DISTINCT 是去重的 “专用工具”,GROUP BY 是分组聚合的 “全能工具” —— 按需选择,而非盲目依赖某一个

在实际开发中,不用纠结 “谁绝对更好”,而是先明确业务需求(是否聚合、是否过滤),再结合索引情况选择;写完 SQL 后,养成看执行计划(EXPLAIN)的习惯,通过 Extra 字段(Using temporary/Using filesort)判断是否需要优化,这才是提升 MySQL 性能的核心思路。

关注微信公众号「数据库干货铺」,获取更多数据库运维干货。

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2026-01-29,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 数据库干货铺 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 下面我们通过一张表来总结一下搞定逻辑
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档