Answers:
关键是重写此查询,以便可以将其用作子查询。
SELECT firstname,
lastname,
list.address
FROM list
INNER JOIN (SELECT address
FROM list
GROUP BY address
HAVING COUNT(id) > 1) dup
ON list.address = dup.address;
SELECT date FROM logs group by date having count(*) >= 2
->having(DB::raw('count(*)'), '>', 2)
到查询中即可。非常感谢!
>=2
呢 只需使用HAVING COUNT(*) > 1
为什么不将INNER JOIN本身与桌子相连?
SELECT a.firstname, a.lastname, a.address
FROM list a
INNER JOIN list b ON a.address = b.address
WHERE a.id <> b.id
如果地址可以存在两次以上,则需要DISTINCT。
WHERE a.id > b.id
仅过滤出较新的重复项,这样我就可以DELETE
直接对结果进行处理。切换比较以列出较旧的重复项。
通过此查询通过电子邮件地址查找重复的用户...
SELECT users.name, users.uid, users.mail, from_unixtime(created)
FROM users
INNER JOIN (
SELECT mail
FROM users
GROUP BY mail
HAVING count(mail) > 1
) dupes ON users.mail = dupes.mail
ORDER BY users.mail;
查找重复的地址比看起来要复杂得多,尤其是在需要准确性的情况下。在这种情况下,MySQL查询是不够的...
我在SmartyStreets工作,我们致力于解决验证和重复数据删除等问题,并且我遇到了许多类似问题的各种挑战。
有一些第三方服务会为您标记列表中的重复项。仅使用MySQL子查询执行此操作不会解决地址格式和标准的差异。USPS(用于美国地址)具有制定这些标准的某些准则,但是只有少数供应商被认证可以执行此类操作。
因此,我建议您最好的答案是将表导出到CSV文件中,然后将其提交给有能力的列表处理器。SmartyStreets 批量地址验证工具就是其中之一,它将在几秒钟到几分钟内自动为您完成。它将使用称为“ Duplicate”的新字段及其中的值来标记重复的行Y
。
另一种解决方案是使用表别名,如下所示:
SELECT p1.id, p2.id, p1.address
FROM list AS p1, list AS p2
WHERE p1.address = p2.address
AND p1.id != p2.id
在这种情况下,您真正要做的只是获取原始列表表,创建两个p retend表-p 1和p 2,然后在address列上执行(第3行)。第4行确保同一条记录不会在您的结果集中多次显示(“重复重复”)。
这将在一个表遍中选择重复项,而不选择子查询。
SELECT *
FROM (
SELECT ao.*, (@r := @r + 1) AS rn
FROM (
SELECT @_address := 'N'
) vars,
(
SELECT *
FROM
list a
ORDER BY
address, id
) ao
WHERE CASE WHEN @_address <> address THEN @r := 0 ELSE 0 END IS NOT NULL
AND (@_address := address ) IS NOT NULL
) aoo
WHERE rn > 1
该查询实际上模拟ROW_NUMBER()
了Oracle
和SQL Server
有关详细信息,请参见我的博客中的文章:
MySQL
。FROM (SELECT ...) aoo
子查询:-P
SELECT firstname, lastname, address FROM list
WHERE
Address in
(SELECT address FROM list
GROUP BY address
HAVING count(*) > 1)
SELECT users.name, users.uid, users.mail, from_unixtime(created) FROM users INNER JOIN ( SELECT mail FROM users GROUP BY mail HAVING count(mail) > 1 ) dup ON users.mail = dup.mail ORDER BY users.mail, users.created;
select * from table_name t1 inner join (select distinct <attribute list> from table_name as temp)t2 where t1.attribute_name = t2.attribute_name
对于您的桌子,它就像
select * from list l1 inner join (select distinct address from list as list2)l2 where l1.address=l2.address
该查询将为您提供列表中所有不同的地址条目...如果您有名称等任何主键值,我不确定这将如何工作。
最快的重复项删除查询过程:
/* create temp table with one primary column id */
INSERT INTO temp(id) SELECT MIN(id) FROM list GROUP BY (isbn) HAVING COUNT(*)>1;
DELETE FROM list WHERE id IN (SELECT id FROM temp);
DELETE FROM temp;
我个人这个查询解决了我的问题:
SELECT `SUB_ID`, COUNT(SRV_KW_ID) as subscriptions FROM `SUB_SUBSCR` group by SUB_ID, SRV_KW_ID HAVING subscriptions > 1;
该脚本的作用是在表中显示不止一次存在的所有订户ID,并找到重复的数量。
这是表格列:
| SUB_SUBSCR_ID | int(11) | NO | PRI | NULL | auto_increment |
| MSI_ALIAS | varchar(64) | YES | UNI | NULL | |
| SUB_ID | int(11) | NO | MUL | NULL | |
| SRV_KW_ID | int(11) | NO | MUL | NULL | |
希望对您有帮助!
Find duplicate Records:
Suppose we have table : Student
student_id int
student_name varchar
Records:
+------------+---------------------+
| student_id | student_name |
+------------+---------------------+
| 101 | usman |
| 101 | usman |
| 101 | usman |
| 102 | usmanyaqoob |
| 103 | muhammadusmanyaqoob |
| 103 | muhammadusmanyaqoob |
+------------+---------------------+
Now we want to see duplicate records
Use this query:
select student_name,student_id ,count(*) c from student group by student_id,student_name having c>1;
+--------------------+------------+---+
| student_name | student_id | c |
+---------------------+------------+---+
| usman | 101 | 3 |
| muhammadusmanyaqoob | 103 | 2 |
+---------------------+------------+---+
要快速查看重复的行,您可以运行一个简单的查询
在这里,我要查询表并列出具有相同user_id,market_place和sku的所有重复行:
select user_id, market_place,sku, count(id)as totals from sku_analytics group by user_id, market_place,sku having count(id)>1;
要删除重复的行,您必须确定要删除的行。例如ID较低的人(通常是年龄较大的人)或其他一些日期信息。就我而言,我只想删除较低的ID,因为较新的ID是最新信息。
首先,请仔细检查是否删除了正确的记录。在这里,我从重复项中选择要删除的记录(通过唯一ID)。
select a.user_id, a.market_place,a.sku from sku_analytics a inner join sku_analytics b where a.id< b.id and a.user_id= b.user_id and a.market_place= b.market_place and a.sku = b.sku;
然后,我运行删除查询以删除重复对象:
delete a from sku_analytics a inner join sku_analytics b where a.id< b.id and a.user_id= b.user_id and a.market_place= b.market_place and a.sku = b.sku;
备份,仔细检查,验证,验证备份然后执行。
Powerlord的答案的确是最好的,我建议再进行一次更改:使用LIMIT来确保db不会过载:
SELECT firstname, lastname, list.address FROM list
INNER JOIN (SELECT address FROM list
GROUP BY address HAVING count(id) > 1) dup ON list.address = dup.address
LIMIT 10
如果没有WHERE和进行联接,则使用LIMIT是一个好习惯。从较小的值开始,检查查询的强度,然后增加限制。