
在日常数据库查询中,去除重复数据是最常见的需求之一。面对这种需求,很多开发者习惯随手用 DISTINCT,也有人觉得 GROUP BY 更 “万能”,但很少有人深究:这两者底层原理有何不同?在不同场景下谁的效率更高?今天我们就从原理、效率、使用场景三个维度,彻底讲清楚这对 “去重兄弟” 的区别。
一、一个实际问题的起源
今天,团队的新成员小陈遇到了一个难题:他需要从用户表中统计不重复的城市列表,但查询速度极其缓慢。他尝试了两种写法:
SELECT DISTINCT city FROM users;以及
SELECT city FROM users GROUP BY city;但不确定哪种写法更好,以及如何优化。他说看到网上有人说禁止使用distinct,但是他实际测的时候发现其实distinct有的情况下效率比group by又要好,所以更加不确定该怎么用了。对于今天这个经典问题,促使我们重新进行一番审视。
二、 DISTINCT与GROUP BY的本质差异
DISTINCT是SQL专门为去重设计的关键字,它的核心使命很简单——去除重复行,返回唯一值。
-- 单列去重
SELECT DISTINCT department FROM employees;
-- 多列去重(所有列组合相同时才去重)
SELECT DISTINCT department, job_title FROM employees;GROUP BY的设计初衷是对数据进行分组,以便进行聚合计算(如COUNT、SUM、AVG等)。
-- 计算每个部门的平均工资
SELECT department, AVG(salary)
FROM employees GROUP BY department;虽然GROUP BY也可以实现去重效果,但这其实是它分组行为的"副产品"。
三、 性能对决
说到大家最关心的性能问题,其实和是否有索引索引有很大的关系。
mysql> explain select distinct registration_time from customers2;
+----+-------------+------------+------------+-------+---------------------------------------+-----------------------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------------------------------+-----------------------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | customers2 | NULL | index | idx_registration_time,idx_vip_regtime | idx_registration_time | 4 | NULL | 53014 | 100.00 | Using index |
+----+-------------+------------+------------+-------+---------------------------------------+-----------------------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select registration_time from customers2 group by registration_time;
+----+-------------+------------+------------+-------+---------------------------------------+-----------------------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------------------------------+-----------------------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | customers2 | NULL | index | idx_registration_time,idx_vip_regtime | idx_registration_time | 4 | NULL | 53014 | 100.00 | Using index |
+----+-------------+------------+------------+-------+---------------------------------------+-----------------------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

在MySQL 8.0之前,GROUP BY会进行隐式排序,可能触发filesort,导致性能下降;而DISTINCT不需要这种排序,因此在无索引时性能更优。
mysql> explain select distinct c1 from tb2 ;
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-----------------+
| 1 | SIMPLE | tb2 | NULL | ALL | NULL | NULL | NULL | NULL | 15538816 | 100.00 | Using temporary |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-----------------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select c1 from tb2 group by c1 ;
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+---------------------------------+
| 1 | SIMPLE | tb2 | NULL | ALL | NULL | NULL | NULL | NULL | 15538816 | 100.00 | Using temporary; Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+---------------------------------+
1 row in set, 1 warning (0.00 sec)

不过,如果查询结果无需排序,则可以显示写上ORDER BY NULL来取消group by结果的排序,例如:
mysql> explain select c1 from tb2 group by c1 ;
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+---------------------------------+
| 1 | SIMPLE | tb2 | NULL | ALL | NULL | NULL | NULL | NULL | 15538816 | 100.00 | Using temporary; Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+---------------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select c1 from tb2 group by c1 ORDER BY NULL;
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-----------------+
| 1 | SIMPLE | tb2 | NULL | ALL | NULL | NULL | NULL | NULL | 15538816 | 100.00 | Using temporary |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-----------------+
1 row in set, 1 warning (0.00 sec)

但这里有个重要更新:从MySQL 8.0开始,GROUP BY的隐式排序已被移除,两者在无索引情况下的性能差异大大缩小。
mysql> explain select age from customers2 group by age;
+----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
| 1 | SIMPLE | customers2 | NULL | ALL | NULL | NULL | NULL | NULL | 53014 | 100.00 | Using temporary |
+----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select distinct age from customers2;
+----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
| 1 | SIMPLE | customers2 | NULL | ALL | NULL | NULL | NULL | NULL | 53014 | 100.00 | Using temporary |
+----+-------------+------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
1 row in set, 1 warning (0.00 sec)

四、 总结
场景 | 推荐用法 | 核心原因 |
|---|---|---|
纯单字段 、多字段去重(无聚合) | DISTINCT | 逻辑更简洁,无索引时效率略高 |
去重 + 聚合(COUNT/SUM/AVG 等) | GROUP BY | 唯一能实现聚合的方式 |
去重 + 过滤分组结果 | GROUP BY + HAVING | DISTINCT 无分组过滤能力 |
有索引的任意去重场景 | 两者皆可 | 索引消除了效率差异,按可读性选择 |
最后记住核心原则:DISTINCT 是去重的 “专用工具”,GROUP BY 是分组聚合的 “全能工具” —— 按需选择,而非盲目依赖某一个。
在实际开发中,不用纠结 “谁绝对更好”,而是先明确业务需求(是否聚合、是否过滤),再结合索引情况选择;写完 SQL 后,养成看执行计划(EXPLAIN)的习惯,通过 Extra 字段(Using temporary/Using filesort)判断是否需要优化,这才是提升 MySQL 性能的核心思路。
关注微信公众号「数据库干货铺」,获取更多数据库运维干货。