在 MySQL 中删除重复数据需要保留一条记录而删除其他重复项,具体方法取决于表结构和重复判断标准。以下是几种常见场景的实现方法:
如果表中有唯一标识列(如 id),可以通过分组找到重复记录,保留 ID 最小(或最大)的那条。
示例表结构:
CREATE TABLE users (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(50),
email VARCHAR(50)
);
删除重复邮箱,保留 ID 最小的记录:
DELETE FROM users
WHERE id NOT IN (
SELECT min_id FROM (
SELECT MIN(id) AS min_id FROM users GROUP BY email
) AS temp
);
说明:
-
内层子查询
SELECT MIN(id) AS min_id FROM users GROUP BY email 用于获取每个重复组中要保留的记录 ID
-
外层 DELETE 语句删除所有不在保留 ID 列表中的记录
-
使用临时表(
AS temp)是为了避免 MySQL 的 "不能从正在修改的表中选择" 限制
如果表中没有唯一标识列,可以通过创建临时表来处理:
CREATE TABLE temp_users AS
SELECT DISTINCT * FROM users;
TRUNCATE TABLE users;
INSERT INTO users SELECT * FROM temp_users;
DROP TABLE temp_users;
注意:
-
TRUNCATE 会清空表并重置自增 ID,谨慎使用
-
此方法适合小型表,大型表可能影响性能
如果需要根据多列组合判断重复(如 name + email):
DELETE FROM users
WHERE id NOT IN (
SELECT min_id FROM (
SELECT MIN(id) AS min_id
FROM users
GROUP BY name, email
) AS temp
);
如果需要保留最新插入的记录(假设表中有 create_time 字段):
DELETE FROM users
WHERE id NOT IN (
SELECT id FROM (
SELECT id FROM users
WHERE (name, email, create_time) IN (
SELECT name, email, MAX(create_time)
FROM users
GROUP BY name, email
)
) AS temp
);
-
操作前备份数据:
CREATE TABLE users_backup AS SELECT * FROM users;
-
先查询再删除:执行删除前,先通过查询确认要删除的记录:
SELECT name, email, COUNT(*)
FROM users
GROUP BY name, email
HAVING COUNT(*) > 1;
SELECT MIN(id) AS min_id FROM users GROUP BY email;
-
添加唯一索引防止 future 重复:
ALTER TABLE users ADD UNIQUE INDEX idx_email (email);
根据表的具体结构和重复判断标准选择合适的方法,对于大型表,建议分批次删除以避免锁表时间过长。