如何跨多个列查找重复项?


98

所以我想在下面做这样的SQL代码:

select s.id, s.name,s.city 
from stuff s
group by s.name having count(where city and name are identical) > 1

要产生以下内容((但请忽略仅名称或城市匹配的地方,必须在两列中同时出现):

id      name  city   
904834  jim   London  
904835  jim   London  
90145   Fred  Paris   
90132   Fred  Paris
90133   Fred  Paris

Answers:


137

idname和重复city

select s.id, t.* 
from [stuff] s
join (
    select name, city, count(*) as qty
    from [stuff]
    group by name, city
    having count(*) > 1
) t on s.name = t.name and s.city = t.city

请注意,如果包含name或,则它们将不会在外部查询中报告,但会在内部查询中匹配。citynull
亚当·帕金

3
如果这些值可能包含null然后(除非我遗漏了一些东西),则需要将其更改为CROSS JOIN(完整的笛卡尔积),然后添加一个WHERE子句,例如:WHERE ((s.name = t.name) OR (s.name is null and t.name is null)) AND ((s.city = t.city) OR (s.city is null and t.city is null))
Adam Parkin 2015年

55
 SELECT name, city, count(*) as qty 
 FROM stuff 
 GROUP BY name, city HAVING count(*)> 1

10

这样的事情将解决问题。不了解性能,因此请进行一些测试。

select
  id, name, city
from
  [stuff] s
where
1 < (select count(*) from [stuff] i where i.city = s.city and i.name = s.name)

6

使用count(*) over(partition by...)提供了一种简单有效的方法来查找不需要的重复,同时还列出了所有受影响的行和所有需要的列:

SELECT
    t.*
FROM (
    SELECT
        s.*
      , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
    FROM stuff s
    ) t
WHERE t.qty > 1
ORDER BY t.name, t.city

尽管最新的RDBMS版本支持count(*) over(partition by...) MySQL V 8.0引入了“窗口函数”,如下所示(在MySQL 8.0中)

CREATE TABLE stuff(
   id   INTEGER  NOT NULL
  ,name VARCHAR(60) NOT NULL
  ,city VARCHAR(60) NOT NULL
);
INSERT INTO stuff(id,name,city) VALUES 
  (904834,'jim','London')
, (904835,'jim','London')
, (90145,'Fred','Paris')
, (90132,'Fred','Paris')
, (90133,'Fred','Paris')

, (923457,'Barney','New York') # not expected in result
;
SELECT
    t.*
FROM (
    SELECT
        s.*
      , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
    FROM stuff s
    ) t
WHERE t.qty > 1
ORDER BY t.name, t.city
    id | 名称| 城市| 数量
-----:| :- :----- | -:
 90145 | 弗雷德| 巴黎| 3
 90132 | 弗雷德| 巴黎| 3
 90133 | 弗雷德| 巴黎| 3
904834 | 吉姆| 伦敦| 2
904835 | 吉姆| 伦敦| 2

db <> 在这里拨弄

窗口功能。 MySQL现在支持窗口函数,对于查询的每一行,都使用与该行相关的行来执行计算。这些包括诸如RANK(),LAG()和NTILE()之类的函数。另外,现在可以将几个现有的聚合函数用作窗口函数;例如SUM()和AVG()。有关更多信息,请参见第12.21节“窗口函数”


3

在这篇文章上游戏有点晚了,但是我发现这种方式非常灵活/高效

select 
    s1.id
    ,s1.name
    ,s1.city 
from 
    stuff s1
    ,stuff s2
Where
    s1.id <> s2.id
    and s1.name = s2.name
    and s1.city = s2.city

2

您必须自行加入内容并匹配姓名和城市。然后按计数分组。

select 
   s.id, s.name, s.city 
from stuff s join stuff p ON (
   s.name = p.city OR s.city = p.name
)
group by s.name having count(s.name) > 1

在SQL Server中失败:所有非聚合列都必须位于GROUP BY中
gbn

0

给定一个临时表,其中有70列,而只有4列表示重复项,此代码将返回有问题的列:

SELECT 
    COUNT(*)
    ,LTRIM(RTRIM(S.TransactionDate)) 
    ,LTRIM(RTRIM(S.TransactionTime))
    ,LTRIM(RTRIM(S.TransactionTicketNumber)) 
    ,LTRIM(RTRIM(GrossCost)) 
FROM Staging.dbo.Stage S
GROUP BY 
    LTRIM(RTRIM(S.TransactionDate)) 
    ,LTRIM(RTRIM(S.TransactionTime))
    ,LTRIM(RTRIM(S.TransactionTicketNumber)) 
    ,LTRIM(RTRIM(GrossCost)) 
HAVING COUNT(*) > 1

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.