Answers:
基本思想是将嵌套查询与计数聚合一起使用:
select * from yourTable ou
where (select count(*) from yourTable inr
where inr.sid = ou.sid) > 1
您可以调整内部查询中的where子句以缩小搜索范围。
对于注释中提到的问题,还有一个很好的解决方案(但不是每个人都阅读它们):
select Column1, Column2, count(*)
from yourTable
group by Column1, Column2
HAVING count(*) > 1
或更短:
SELECT (yourTable.*)::text, count(*)
FROM yourTable
GROUP BY yourTable.*
HAVING count(*) > 1
select co1, col2, count(*) from tbl group by col1, col2 HAVING count(*)>1
从“ 使用PostgreSQL查找重复的行 ”中,这是一个聪明的解决方案:
select * from (
SELECT id,
ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id asc) AS Row
FROM tbl
) dups
where
dups.Row > 1
SELECT * FROM ( SELECT *, LEAD(row,1) OVER () AS nextrow FROM ( SELECT *, ROW_NUMBER() OVER(w) AS row FROM tbl WINDOW w AS (PARTITION BY col1, col2 ORDER BY col3) ) x ) y WHERE row > 1 OR nextrow > 1;
ROW_NUMBER()
用COUNT(*)
,并添加rows between unbounded preceding and unbounded following
后ORDER BY id asc
DELETE ...USING
一些小的调整方面也同样有效
您可以在将要重复的字段上联接到同一表,然后在id字段上进行反联接。从第一个表别名(tn1)中选择id字段,然后在第二个表别名的id字段上使用array_agg函数。最后,为了使array_agg函数正常工作,您将根据tn1.id字段对结果进行分组。这将产生一个结果集,其中包含记录的ID和适合联接条件的所有ID的数组。
select tn1.id,
array_agg(tn2.id) as duplicate_entries,
from table_name tn1 join table_name tn2 on
tn1.year = tn2.year
and tn1.sid = tn2.sid
and tn1.user_id = tn2.user_id
and tn1.cid = tn2.cid
and tn1.id <> tn2.id
group by tn1.id;
显然,将在一个ID的plicate_entries数组中的ID在结果集中也将具有自己的条目。您将必须使用此结果集来确定要成为“真相”来源的ID。一条不应删除的记录。也许您可以执行以下操作:
with dupe_set as (
select tn1.id,
array_agg(tn2.id) as duplicate_entries,
from table_name tn1 join table_name tn2 on
tn1.year = tn2.year
and tn1.sid = tn2.sid
and tn1.user_id = tn2.user_id
and tn1.cid = tn2.cid
and tn1.id <> tn2.id
group by tn1.id
order by tn1.id asc)
select ds.id from dupe_set ds where not exists
(select de from unnest(ds.duplicate_entries) as de where de < ds.id)
选择具有重复项的最低编号的ID(假设ID递增int PK)。这些就是您将保留的ID。
为了简化起见,我假设您希望仅对列year应用唯一约束,并且主键是名为id的列。
为了找到重复的值,您应该运行,
SELECT year, COUNT(id)
FROM YOUR_TABLE
GROUP BY year
HAVING COUNT(id) > 1
ORDER BY COUNT(id);
使用上面的sql语句,您将获得一个包含表中所有重复年份的表。为了删除除最新重复项以外的所有重复项,应使用上述sql语句。
DELETE
FROM YOUR_TABLE A USING YOUR_TABLE_AGAIN B
WHERE A.year=B.year AND A.id<B.id;