在SQL中,当需要仅返回表中重复的行时,通常是指查找在特定列或多个列组合中具有相同值的行。这与简单的去重操作(DISTINCT)相反,而是专门找出那些有重复值的记录。
这是最常用的方法,通过分组和计数来识别重复项:
SELECT column1, column2, ..., COUNT(*) as count
FROM table_name
GROUP BY column1, column2, ...
HAVING COUNT(*) > 1;
WITH duplicates AS (
SELECT *,
COUNT(*) OVER (PARTITION BY column1, column2, ...) as count
FROM table_name
)
SELECT * FROM duplicates WHERE count > 1;
SELECT DISTINCT a.*
FROM table_name a
JOIN table_name b ON a.column1 = b.column1
AND a.column2 = b.column2
AND ...
AND a.primary_key <> b.primary_key;
假设有一个员工表(employees),要找出重复的姓名和部门组合:
-- 方法1: GROUP BY
SELECT first_name, last_name, department_id, COUNT(*) as duplicate_count
FROM employees
GROUP BY first_name, last_name, department_id
HAVING COUNT(*) > 1;
-- 方法2: 窗口函数
WITH dup_check AS (
SELECT *,
COUNT(*) OVER (PARTITION BY first_name, last_name, department_id) as dup_count
FROM employees
)
SELECT * FROM dup_check WHERE dup_count > 1;
-- 方法3: 自连接
SELECT DISTINCT e1.*
FROM employees e1
JOIN employees e2 ON e1.first_name = e2.first_name
AND e1.last_name = e2.last_name
AND e1.department_id = e2.department_id
AND e1.employee_id <> e2.employee_id;
选择哪种方法取决于数据库类型、数据量和具体需求。窗口函数通常性能较好且更灵活,但需要数据库支持。