我有一个带有varchar列的表,我想在此列中查找具有重复值的所有记录。我可以用来查找重复项的最佳查询是什么?
我有一个带有varchar列的表,我想在此列中查找具有重复值的所有记录。我可以用来查找重复项的最佳查询是什么?
Answers:
SELECT
用GROUP BY
子句做一个。假设name是您要在其中查找重复项的列:
SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;
这将返回在第一列中具有名称值的结果,并计算该值在第二列中出现的次数。
GROUP_CONCAT(id)
,它将列出ID。请参阅我的答案作为示例。
ERROR: column "c" does not exist LINE 1
?
SELECT varchar_col
FROM table
GROUP BY varchar_col
HAVING COUNT(*) > 1;
IN()
/ NOT IN()
。
SELECT *
FROM mytable mto
WHERE EXISTS
(
SELECT 1
FROM mytable mti
WHERE mti.varchar_column = mto.varchar_column
LIMIT 1, 1
)
该查询返回完整的记录,而不仅仅是唯一varchar_column
的。
此查询不使用COUNT(*)
。如果有很多重复项,COUNT(*)
则很昂贵,并且您不需要全部COUNT(*)
,只需知道是否有两行具有相同的值。
varchar_column
当然,使用索引可以大大加快此查询的速度。
ORDER BY varchar_column DESC
到查询的末尾。
GROUP BY
它HAVING
仅返回可能的重复项之一。此外,使用索引字段代替的性能也COUNT(*)
可以ORDER BY
将重复的记录分组。
根据levik的答案来获取重复行的ID,GROUP_CONCAT
如果服务器支持,则可以执行以下操作(这将返回逗号分隔的ID列表)。
SELECT GROUP_CONCAT(id), name, COUNT(*) c FROM documents GROUP BY name HAVING c > 1;
SELECT id, GROUP_CONCAT(id), name, COUNT(*) c [...]
它启用内联编辑,并且应更新所有涉及的行(或至少匹配的第一行),但是不幸的是,该编辑会产生Javascript错误。 ..
假设您的表名为TableABC,而您想要的列为Col,而T1的主键为Key。
SELECT a.Key, b.Key, a.Col
FROM TableABC a, TableABC b
WHERE a.Col = b.Col
AND a.Key <> b.Key
这种方法相对于以上答案的优势在于它提供了密钥。
为了获得所有包含重复的数据,我使用了以下方法:
SELECT * FROM TableName INNER JOIN(
SELECT DupliactedData FROM TableName GROUP BY DupliactedData HAVING COUNT(DupliactedData) > 1 order by DupliactedData)
temp ON TableName.DupliactedData = temp.DupliactedData;
TableName =您正在使用的表。
DupliactedData =您要查找的重复数据。
以@ maxyfc的答案进一步,我需要找到所有与重复的值返回的行,这样我就可以在编辑MySQL工作台:
SELECT * FROM table
WHERE field IN (
SELECT field FROM table GROUP BY field HAVING count(*) > 1
) ORDER BY field
我看到了上述结果,如果您需要检查重复的单列值,则查询将正常工作。例如电子邮件。
但是,如果您需要检查更多的列,并且想要检查结果的组合,那么此查询将正常工作:
SELECT COUNT(CONCAT(name,email)) AS tot,
name,
email
FROM users
GROUP BY CONCAT(name,email)
HAVING tot>1 (This query will SHOW the USER list which ARE greater THAN 1
AND also COUNT)
SELECT COUNT(CONCAT(userid,event,datetime)) AS total, userid, event, datetime FROM mytable GROUP BY CONCAT(userid, event, datetime ) HAVING total>1
我更喜欢使用窗口函数(MySQL 8.0+)查找重复项,因为我可以看到整行:
WITH cte AS (
SELECT *
,COUNT(*) OVER(PARTITION BY col_name) AS num_of_duplicates_group
,ROW_NUMBER() OVER(PARTITION BY col_name ORDER BY col_name2) AS pos_in_group
FROM table
)
SELECT *
FROM cte
WHERE num_of_duplicates_group > 1;
SELECT
t.*,
(SELECT COUNT(*) FROM city AS tt WHERE tt.name=t.name) AS count
FROM `city` AS t
WHERE
(SELECT count(*) FROM city AS tt WHERE tt.name=t.name) > 1 ORDER BY count DESC
以下将查找所有已使用多次的所有product_id。对于每个product_id,您只会获得一条记录。
SELECT product_id FROM oc_product_reward GROUP BY product_id HAVING count( product_id ) >1
代码取自:http : //chandreshrana.blogspot.in/2014/12/find-duplicate-records-based-on-any.html
CREATE TABLE tbl_master
(`id` int, `email` varchar(15));
INSERT INTO tbl_master
(`id`, `email`) VALUES
(1, 'test1@gmail.com'),
(2, 'test2@gmail.com'),
(3, 'test1@gmail.com'),
(4, 'test2@gmail.com'),
(5, 'test5@gmail.com');
QUERY : SELECT id, email FROM tbl_master
WHERE email IN (SELECT email FROM tbl_master GROUP BY email HAVING COUNT(id) > 1)
SELECT DISTINCT a.email FROM `users` a LEFT JOIN `users` b ON a.email = b.email WHERE a.id != b.id;
a.email
为a.*
并获取重复的行的所有ID。
SELECT DISTINCT a.*
几乎可以立即解决。
要删除具有多个字段的重复行,请首先将它们归类到为唯一的不同行指定的新唯一键,然后使用“ group by”命令删除具有相同新唯一键的重复行:
Create TEMPORARY table tmp select concat(f1,f2) as cfs,t1.* from mytable as t1;
Create index x_tmp_cfs on tmp(cfs);
Create table unduptable select f1,f2,... from tmp group by cfs;
CREATE TEMPORARY TABLE ...
?稍微解释一下您的解决方案就可以了。
一个很晚的贡献...以防万一,它可以帮助任何人继续前进...我的任务是在银行应用中找到匹配的交易对(实际上是帐户到帐户转账的双方),以识别哪些交易对是每个帐户间转帐交易的“从”和“到”,因此我们得出以下结论:
SELECT
LEAST(primaryid, secondaryid) AS transactionid1,
GREATEST(primaryid, secondaryid) AS transactionid2
FROM (
SELECT table1.transactionid AS primaryid,
table2.transactionid AS secondaryid
FROM financial_transactions table1
INNER JOIN financial_transactions table2
ON table1.accountid = table2.accountid
AND table1.transactionid <> table2.transactionid
AND table1.transactiondate = table2.transactiondate
AND table1.sourceref = table2.destinationref
AND table1.amount = (0 - table2.amount)
) AS DuplicateResultsTable
GROUP BY transactionid1
ORDER BY transactionid1;
结果是,DuplicateResultsTable
提供的行包含匹配的(即重复的)交易,但第二次匹配相同的交易对时,它也提供了相反的交易ID,因此外部SELECT
可以按第一个交易ID进行分组通过使用LEAST
和GREATEST
确保结果中两个transactionid的顺序始终相同,这使得GROUP
第一个交易ID变得安全,从而消除了所有重复的匹配项。在不到2秒的时间内浏览了近一百万条记录,识别出12,000多次比赛。当然,transactionid是主要索引,这确实有所帮助。
Select column_name, column_name1,column_name2, count(1) as temp from table_name group by column_name having temp > 1
SELECT ColumnA, COUNT( * )
FROM Table
GROUP BY ColumnA
HAVING COUNT( * ) > 1
如果要删除重复使用 DISTINCT
否则,请使用以下查询:
SELECT users.*,COUNT(user_ID) as user FROM users GROUP BY user_name HAVING user > 1;
尝试使用以下查询:
SELECT name, COUNT(*) value_count FROM company_master GROUP BY name HAVING value_count > 1;