删除SQL Server中的重复记录？

93

考虑一个名为EmployeeNametable 的列Employee。目标是根据该EmployeeName字段删除重复的记录。

EmployeeName
------------
Anand
Anand
Anil
Dipak
Anil
Dipak
Dipak
Anil

我要使用一个查询，删除重复的记录。

如何在SQL Server中使用TSQL做到这一点？

— usr021986
source

您是说删除重复记录，对不对？

— Sarfraz

您可以选择不同的值及其相关的ID，并删除ID不在已选择列表中的那些记录？

— DaeMoohn 2010年

1

您有唯一的ID列吗？

— 安德鲁·布洛克

1

如果表缺少唯一ID，您如何接受John Gibb给出的答案？empIdJohn在您的示例中使用的列在哪里？

— 阿曼，

2

如果您没有唯一的ID列，或者没有其他有意义的订单依据，那么您也可以按employeename列进行订购...因此您的rn将是row_number() over (partition by EmployeeName order by EmployeeName)...这将为每个名称选择任意一条记录。

— 约翰·吉布

225

您可以使用窗口功能执行此操作。它将根据empId对重复对象进行排序，并删除除第一个对象之外的所有对象。

delete x from (
  select *, rn=row_number() over (partition by EmployeeName order by empId)
  from Employee 
) x
where rn > 1;

运行它作为选择以查看将要删除的内容：

select *
from (
  select *, rn=row_number() over (partition by EmployeeName order by empId)
  from Employee 
) x
where rn > 1;

— 约翰·吉布
source

2

如果您没有主键，则可以使用ORDER BY (SELECT NULL) stackoverflow.com/a/4812038

— Arithmomaniac

35

假设您的Employee表也有一个唯一的列（ID在下面的示例中），则可以使用以下内容：

delete from Employee 
where ID not in
(
    select min(ID)
    from Employee 
    group by EmployeeName 
);

这将使表中的ID最低。

编辑
Re McGyver的评论-自SQL 2012起

MIN 可以与数字，char，varchar，uniqueidentifier或日期时间列一起使用，但不能与位列一起使用

对于2008 R2和更早版本，

MIN可以用于数字，char，varchar或日期时间列，但不能用于位列（它也不适用于GUID的列）

对于2008R2，您需要将强制GUID转换为所支持的类型MIN，例如

delete from GuidEmployees
where CAST(ID AS binary(16)) not in
(
    select min(CAST(ID AS binary(16)))
    from GuidEmployees
    group by EmployeeName 
);

SqlFiddle用于Sql 2008中的各种类型

适用于Sql 2012中各种类型的SqlFiddle

— 斯图尔特
source

另外，在Oracle中，如果没有其他唯一ID列，则可以使用“ rowid”。

— 布兰登·霍斯利

+1即使没有ID列，也可以添加一个作为标识字段。

— Kyle B.

极好的答案。敏锐而有效。即使表没有ID；最好包括一个执行该方法的对象。

— MiBol

8

您可以尝试以下操作：

delete T1
from MyTable T1, MyTable T2
where T1.dupField = T2.dupField
and T1.uniqueField > T2.uniqueField

（这假设您有一个基于整数的唯一字段）

尽管就我个人而言，我还是建议您最好先纠正以下事实：将重复条目添加到数据库之前将其添加到数据库中，而不是作为后修复操作。

— 本·考利
source

我的表格中没有唯一的字段（ID）。那我该如何执行操作。

— usr021986 2010年

3

DELETE
FROM MyTable
WHERE ID NOT IN (
     SELECT MAX(ID)
     FROM MyTable
     GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

WITH TempUsers (FirstName, LastName, duplicateRecordCount)
AS
(
    SELECT FirstName, LastName,
    ROW_NUMBER() OVER (PARTITIONBY FirstName, LastName ORDERBY FirstName) AS duplicateRecordCount
    FROM dbo.Users
)
DELETE
FROM TempUsers
WHERE duplicateRecordCount > 1

— 库玛·马尼什（Kumar Manish）
source

3

WITH CTE AS
(
   SELECT EmployeeName, 
          ROW_NUMBER() OVER(PARTITION BY EmployeeName ORDER BY EmployeeName) AS R
   FROM employee_table
)
DELETE CTE WHERE R > 1;

公用表表达式的魔力。

— Mostafa Elmoghazi
source

SubPortal / a_horse_with_no_name-这不应该从实际表中选择吗？另外，ROW_NUMBER应该是ROW_NUMBER（），因为它是一个函数，对吗？

— MacGyver 2014年

1

尝试

DELETE
FROM employee
WHERE rowid NOT IN (SELECT MAX(rowid) FROM employee
GROUP BY EmployeeName);

— 阿努拉格·加格
source

1

如果您正在寻找一种删除重复项的方法，但是您有一个外键指向包含重复项的表，则可以使用缓慢而有效的游标采取以下方法。

它将重新定位外键表上的重复键。

create table #properOlvChangeCodes(
    id int not null,
    name nvarchar(max) not null
)

DECLARE @name VARCHAR(MAX);
DECLARE @id INT;
DECLARE @newid INT;
DECLARE @oldid INT;

DECLARE OLVTRCCursor CURSOR FOR SELECT id, name FROM Sales_OrderLineVersionChangeReasonCode; 
OPEN OLVTRCCursor;
FETCH NEXT FROM OLVTRCCursor INTO @id, @name;
WHILE @@FETCH_STATUS = 0  
BEGIN  
        -- determine if it should be replaced (is already in temptable with name)
        if(exists(select * from #properOlvChangeCodes where Name=@name)) begin
            -- if it is, finds its id
            Select  top 1 @newid = id
            from    Sales_OrderLineVersionChangeReasonCode
            where   Name = @name

            -- replace terminationreasoncodeid in olv for the new terminationreasoncodeid
            update Sales_OrderLineVersion set ChangeReasonCodeId = @newid where ChangeReasonCodeId = @id

            -- delete the record from the terminationreasoncode
            delete from Sales_OrderLineVersionChangeReasonCode where Id = @id
        end else begin
            -- insert into temp table if new
            insert into #properOlvChangeCodes(Id, name)
            values(@id, @name)
        end

        FETCH NEXT FROM OLVTRCCursor INTO @id, @name;
END;
CLOSE OLVTRCCursor;
DEALLOCATE OLVTRCCursor;

drop table #properOlvChangeCodes

— 彼得
source

0

delete from person 
where ID not in
(
        select t.id from 
        (select min(ID) as id from person 
         group by email 
        ) as t
);

— ohsoifelse
source

-1

请也参见下面的删除方式。

Declare @Employee table (EmployeeName varchar(10))

Insert into @Employee values 
('Anand'),('Anand'),('Anil'),('Dipak'),
('Anil'),('Dipak'),('Dipak'),('Anil')

Select * from @Employee

创建了一个名为的示例表@Employee，并使用给定数据加载了该表。

Delete  aliasName from (
Select  *,
        ROW_NUMBER() over (Partition by EmployeeName order by EmployeeName) as rowNumber
From    @Employee) aliasName 
Where   rowNumber > 1

Select * from @Employee

结果：

我知道，这是六年前提出的要求，以防万一它对任何人都有用。

— 吉辛沙吉
source

-1

这是对表中的记录进行重复数据删除的一种好方法，该表具有基于所需主键的标识列，您可以在运行时定义该主键。在开始之前，我将使用以下代码填充要使用的示例数据集：

if exists (select 1 from sys.all_objects where type='u' and name='_original')
drop table _original

declare @startyear int = 2017
declare @endyear int = 2018
declare @iterator int = 1
declare @income money = cast((SELECT round(RAND()*(5000-4990)+4990 , 2)) as money)
declare @salesrepid int = cast(floor(rand()*(9100-9000)+9000) as varchar(4))
create table #original (rowid int identity, monthyear varchar(max), salesrepid int, sale money)
while @iterator<=50000 begin
insert #original 
select (Select cast(floor(rand()*(@endyear-@startyear)+@startyear) as varchar(4))+'-'+ cast(floor(rand()*(13-1)+1) as varchar(2)) ),  @salesrepid , @income
set  @salesrepid  = cast(floor(rand()*(9100-9000)+9000) as varchar(4))
set @income = cast((SELECT round(RAND()*(5000-4990)+4990 , 2)) as money)
set @iterator=@iterator+1
end  
update #original
set monthyear=replace(monthyear, '-', '-0') where  len(monthyear)=6

select * into _original from #original

接下来，我将创建一个名为ColumnNames的类型：

create type ColumnNames AS table   
(Columnnames varchar(max))

最后，我将创建一个具有以下3个警告的存储proc：1. proc将采用必需的参数@tablename，该参数定义要从数据库中删除的表的名称。2. proc具有可选参数@columns，可用于定义构成要删除的所需主键的字段。如果将此字段留为空白，则假定除了标识列之外的所有字段都构成所需的主键。3.删除重复记录时，将保留其标识列中值最低的记录。

这是我的delete_dupes存储过程：

 create proc delete_dupes (@tablename varchar(max), @columns columnnames readonly) 
 as
 begin

declare @table table (iterator int, name varchar(max), is_identity int)
declare @tablepartition table (idx int identity, type varchar(max), value varchar(max))
declare @partitionby varchar(max)  
declare @iterator int= 1 


if exists (select 1 from @columns)  begin
declare @columns1 table (iterator int, columnnames varchar(max))
insert @columns1
select 1, columnnames from @columns
set @partitionby = (select distinct 
                substring((Select ', '+t1.columnnames 
                From @columns1 t1
                Where T1.iterator = T2.iterator
                ORDER BY T1.iterator
                For XML PATH ('')),2, 1000)  partition
From @columns1 T2 )

end

insert @table 
select 1, a.name, is_identity from sys.all_columns a join sys.all_objects b on a.object_id=b.object_id
where b.name = @tablename  

declare @identity varchar(max)= (select name from @table where is_identity=1)

while @iterator>=0 begin 
insert @tablepartition
Select          distinct case when @iterator=1 then 'order by' else 'over (partition by' end , 
                substring((Select ', '+t1.name 
                From @table t1
                Where T1.iterator = T2.iterator and is_identity=@iterator
                ORDER BY T1.iterator
                For XML PATH ('')),2, 5000)  partition
From @table T2
set @iterator=@iterator-1
end 

declare @originalpartition varchar(max)

if @partitionby is null begin
select @originalpartition  = replace(b.value+','+a.type+a.value ,'over (partition by','')  from @tablepartition a cross join @tablepartition b where a.idx=2 and b.idx=1
select @partitionby = a.type+a.value+' '+b.type+a.value+','+b.value+') rownum' from @tablepartition a cross join @tablepartition b where a.idx=2 and b.idx=1
 end
 else
 begin
 select @originalpartition=b.value +','+ @partitionby from @tablepartition a cross join @tablepartition b where a.idx=2 and b.idx=1
 set @partitionby = (select 'OVER (partition by'+ @partitionby  + ' ORDER BY'+ @partitionby + ','+b.value +') rownum'
 from @tablepartition a cross join @tablepartition b where a.idx=2 and b.idx=1)
 end


exec('select row_number() ' + @partitionby +', '+@originalpartition+' into ##temp from '+ @tablename+'')


exec(
'delete a from _original a 
left join ##temp b on a.'+@identity+'=b.'+@identity+' and rownum=1  
where b.rownum is null')

drop table ##temp

end

遵循此要求后，您可以通过运行proc删除所有重复的记录。要删除重复对象而不定义所需的主键，请使用此调用：

exec delete_dupes '_original'

要基于已定义的所需主键删除重复项，请使用此调用：

declare @table1 as columnnames
insert @table1
values ('salesrepid'),('sale')
exec delete_dupes '_original' , @table1

— 丹尼尔·马库斯（Daniel Marcus）
source