在数据库中,重复行指的是表中具有相同值的行(在某些或所有列上)。查询重复行是数据分析和数据清洗中的常见需求,可以帮助识别数据质量问题或业务异常。
SELECT column1, column2, ..., COUNT(*) as count
FROM table_name
GROUP BY column1, column2, ...
HAVING COUNT(*) > 1;
SELECT *
FROM (
SELECT *,
COUNT(*) OVER (PARTITION BY column1, column2, ...) as duplicate_count
FROM table_name
) t
WHERE duplicate_count > 1;
SELECT DISTINCT a.*
FROM table_name a
JOIN table_name b
ON a.column1 = b.column1 AND a.column2 = b.column2 AND ...
WHERE a.primary_key_column <> b.primary_key_column;
原因:在大表上查询重复行可能导致性能问题 解决方案:
原因:选择的列组合不足以唯一标识记录 解决方案:
原因:NULL值的比较行为特殊可能导致意外结果 解决方案:
-- 查找姓名和邮箱相同的重复用户
SELECT first_name, last_name, email, COUNT(*) as count
FROM users
GROUP BY first_name, last_name, email
HAVING COUNT(*) > 1;
-- 使用窗口函数查找重复订单
SELECT *
FROM (
SELECT *,
COUNT(*) OVER (PARTITION BY customer_id, order_date, total_amount) as dup_count
FROM orders
) subquery
WHERE dup_count > 1;
-- 查找并删除重复记录(保留一条)
WITH CTE AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY product_code, batch_number ORDER BY id) as rn
FROM inventory
)
DELETE FROM CTE WHERE rn > 1;
没有搜到相关的文章