C＃LINQ在列表中查找重复项

333

使用LINQ，List<int>如何从中检索包含重复项不止一次的列表及其值？

linq list duplicate-removal

— Mirko Arcese
source

567

解决问题的最简单方法是根据元素的值对其进行分组，如果组中有多个元素，则选择该组的代表。在LINQ中，这转换为：

var query = lst.GroupBy(x => x)
              .Where(g => g.Count() > 1)
              .Select(y => y.Key)
              .ToList();

如果您想知道元素重复了多少次，可以使用：

var query = lst.GroupBy(x => x)
              .Where(g => g.Count() > 1)
              .Select(y => new { Element = y.Key, Counter = y.Count() })
              .ToList();

这将返回List匿名类型的，并且每个元素都将具有属性Element和Counter，以检索所需的信息。

最后，如果您要查找的是字典，则可以使用

var query = lst.GroupBy(x => x)
              .Where(g => g.Count() > 1)
              .ToDictionary(x => x.Key, y => y.Count());

这将返回一个字典，将您的元素作为键，并将其重复的次数作为值。

— 救
source

现在只是一个奇迹，让我们说将重复的int分配到n个int数组中，im使用字典和for循环来了解哪个数组包含重复项，并根据分配逻辑将其删除，是否有最快的方法（linq想知道）达到那个结果？预先感谢您的关注。

— Mirko Arcese

我正在做这样的事情： code for（int i = 0; i <plicates.Count; i ++）plicatesLocation.Add（重复，新List <int>（））; for（int k = 0; k <hitsList.Length; k ++）{if（hitsList [k] .Contains（duplicate））{repeatsLocation.ElementAt（i）.Value.Add（k）; }} //根据某些规则删除重复项。}code

— Mirko Arcese

如果要在数组列表中查找重复项，请看一下SelectMany

— 保存

我正在搜索一系列列表中的重复项，但没有获得selectmany如何帮助我实现目标的信息

— Mirko Arcese 2013年

1

若要检查是否有多个集合具有多个元素，使用Skip（1）.Any（）而不是Count（）效率更高。想象一个有1000个元素的集合。Skip（1）.Any（）一旦找到第二个元素，就会检测到不止一个。使用Count（）需要访问完整的集合。

— Harald Coppoolse

133

找出可枚举是否包含任何重复项：

var anyDuplicate = enumerable.GroupBy(x => x.Key).Any(g => g.Count() > 1);

找出可枚举中的所有值是否唯一：

var allUnique = enumerable.GroupBy(x => x.Key).All(g => g.Count() == 1);

— 麦克白豆素
source

这些并非总是布尔对立面吗？在所有情况下anyDuplicate ==！allUnique。

— Garr Godfrey

1

@GarrGodfrey他们总是布尔对立面

— Caltor

21

另一种方法是使用HashSet：

var hash = new HashSet<int>();
var duplicates = list.Where(i => !hash.Add(i));

如果要在重复项列表中使用唯一值：

var myhash = new HashSet<int>();
var mylist = new List<int>(){1,1,2,2,3,3,3,4,4,4};
var duplicates = mylist.Where(item => !myhash.Add(item)).Distinct().ToList();

这是与通用扩展方法相同的解决方案：

public static class Extensions
{
  public static IEnumerable<TSource> GetDuplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector, IEqualityComparer<TKey> comparer)
  {
    var hash = new HashSet<TKey>(comparer);
    return source.Where(item => !hash.Add(selector(item))).ToList();
  }

  public static IEnumerable<TSource> GetDuplicates<TSource>(this IEnumerable<TSource> source, IEqualityComparer<TSource> comparer)
  {
    return source.GetDuplicates(x => x, comparer);      
  }

  public static IEnumerable<TSource> GetDuplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector)
  {
    return source.GetDuplicates(selector, null);
  }

  public static IEnumerable<TSource> GetDuplicates<TSource>(this IEnumerable<TSource> source)
  {
    return source.GetDuplicates(x => x, null);
  }
}

— 胡贝扎
source

这不能按预期方式工作。将其List<int> { 1, 2, 3, 4, 5, 2 }用作源，结果是IEnumerable<int>一个元素的值为1（其中正确的重复值为2）

— BCA

昨天@BCA，我想你错了。看看这个例子：dotnetfiddle.net/GUnhUl

— HuBeZa

您的小提琴会打印出正确的结果。但是，我在其Console.WriteLine("Count: {0}", duplicates.Count());正下方添加了一行，并显示6。除非我对此功能的要求有所遗漏，否则结果集合中应该只有1个项目。

— BCA

@BCA昨天，这是由LINQ延迟执行引起的错误。ToList为了解决这个问题，我进行了添加，但这意味着该方法在调用后立即执行，而不是在迭代结果时执行。

— HuBeZa

var hash = new HashSet<int>(); var duplicates = list.Where(i => !hash.Add(i));将导致一个包含所有重复项的列表。因此，如果列表中有4个出现的2，那么重复列表将包含3个出现的2，因为只能将2中的一个添加到HashSet中。如果您希望列表包含每个重复项的唯一值，请改用此代码：var duplicates = mylist.Where(item => !myhash.Add(item)).ToList().Distinct().ToList();

— solid_luffy18年

10

你可以这样做：

var list = new[] {1,2,3,1,4,2};
var duplicateItems = list.Duplicates();

使用这些扩展方法：

public static class Extensions
{
    public static IEnumerable<TSource> Duplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector)
    {
        var grouped = source.GroupBy(selector);
        var moreThan1 = grouped.Where(i => i.IsMultiple());
        return moreThan1.SelectMany(i => i);
    }

    public static IEnumerable<TSource> Duplicates<TSource, TKey>(this IEnumerable<TSource> source)
    {
        return source.Duplicates(i => i);
    }

    public static bool IsMultiple<T>(this IEnumerable<T> source)
    {
        var enumerator = source.GetEnumerator();
        return enumerator.MoveNext() && enumerator.MoveNext();
    }
}

在Duplicates方法中使用IsMultiple（）比Count（）更快，因为这不会迭代整个集合。

— 亚历克斯·西普曼
source

如果查看分组的参考源，您会看到它Count() 是预先计算的，并且您的解决方案可能会变慢。

— Johnbot

@Johnbot。没错，在这种情况下，它更快并且实现可能永远不会改变...但是它取决于IGrouping背后实现类的实现细节。使用我的工具，您将知道它永远不会迭代整个集合。

— Alex Siepman

因此，计算[ Count()]与遍历整个列表基本上不同。Count()是预先计算的，但没有迭代整个列表。

— Jogi'2

@rehan khan：我不明白Count（）和Count（）之间的区别

— Alex Siepman '17

2

@RehanKhan：IsMultiple没有做Count（），它在2个项目之后立即停止。就像Take（2）.Count> = 2;

— 亚历克斯·希普曼

6

我创建了一个扩展名以响应此问题，您可以将其包括在您的项目中，我认为当您在List或Linq中搜索重复项时，这种情况最为常见。

例：

//Dummy class to compare in list
public class Person
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Surname { get; set; }
    public Person(int id, string name, string surname)
    {
        this.Id = id;
        this.Name = name;
        this.Surname = surname;
    }
}


//The extention static class
public static class Extention
{
    public static IEnumerable<T> getMoreThanOnceRepeated<T>(this IEnumerable<T> extList, Func<T, object> groupProps) where T : class
    { //Return only the second and next reptition
        return extList
            .GroupBy(groupProps)
            .SelectMany(z => z.Skip(1)); //Skip the first occur and return all the others that repeats
    }
    public static IEnumerable<T> getAllRepeated<T>(this IEnumerable<T> extList, Func<T, object> groupProps) where T : class
    {
        //Get All the lines that has repeating
        return extList
            .GroupBy(groupProps)
            .Where(z => z.Count() > 1) //Filter only the distinct one
            .SelectMany(z => z);//All in where has to be retuned
    }
}

//how to use it:
void DuplicateExample()
{
    //Populate List
    List<Person> PersonsLst = new List<Person>(){
    new Person(1,"Ricardo","Figueiredo"), //fist Duplicate to the example
    new Person(2,"Ana","Figueiredo"),
    new Person(3,"Ricardo","Figueiredo"),//second Duplicate to the example
    new Person(4,"Margarida","Figueiredo"),
    new Person(5,"Ricardo","Figueiredo")//third Duplicate to the example
    };

    Console.WriteLine("All:");
    PersonsLst.ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
    /* OUTPUT:
        All:
        1 -> Ricardo Figueiredo
        2 -> Ana Figueiredo
        3 -> Ricardo Figueiredo
        4 -> Margarida Figueiredo
        5 -> Ricardo Figueiredo
        */

    Console.WriteLine("All lines with repeated data");
    PersonsLst.getAllRepeated(z => new { z.Name, z.Surname })
        .ToList()
        .ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
    /* OUTPUT:
        All lines with repeated data
        1 -> Ricardo Figueiredo
        3 -> Ricardo Figueiredo
        5 -> Ricardo Figueiredo
        */
    Console.WriteLine("Only Repeated more than once");
    PersonsLst.getMoreThanOnceRepeated(z => new { z.Name, z.Surname })
        .ToList()
        .ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
    /* OUTPUT:
        Only Repeated more than once
        3 -> Ricardo Figueiredo
        5 -> Ricardo Figueiredo
        */
}

— 里卡多·菲格雷多
source

1

考虑使用Skip（1）.Any（）代替Count（）。如果有1000个重复项，则Skip（1）.Any（）将在找到第二个后停止。Count（）将访问所有1000个元素。

— Harald Coppoolse

1

如果添加此扩展方法，请考虑使用HashSet.Add代替GroupBy，如其他答案之一所示。一旦HashSet.Add找到重复项，它将停止。即使找到了一个以上具有多个元素的组，您的GroupBy仍将继续对所有元素进行分组

— Harald Coppoolse 17-10-26

6

仅查找重复值：

var duplicates = list.GroupBy(x => x.Key).Any(g => g.Count() > 1);

例如。var list = new [] {1,2,3,1,4,2};

因此group by将按其键对数字进行分组，并与之保持计数（重复的次数）。之后，我们只检查重复多次的值。

仅查找uniuqe值：

var unique = list.GroupBy(x => x.Key).All(g => g.Count() == 1);

例如。var list = new [] {1,2,3,1,4,2};

因此group by将按其键对数字进行分组，并与之保持计数（重复的次数）。之后，我们只是检查仅重复一次均值唯一的值。

— 洗礼
source

下面的代码还将找到唯一的项目。var unique = list.Distinct(x => x)

— 马鲁MN

1

在MS SQL Server中检查的完整的Linq to SQL Duplicates功能扩展集。不使用.ToList（）或IEnumerable。这些查询在SQL Server中而不是在内存中执行。。结果仅在内存中返回。

public static class Linq2SqlExtensions {

    public class CountOfT<T> {
        public T Key { get; set; }
        public int Count { get; set; }
    }

    public static IQueryable<TKey> Duplicates<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
        => source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(s => s.Key);

    public static IQueryable<TSource> GetDuplicates<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
        => source.GroupBy(groupBy).Where(w => w.Count() > 1).SelectMany(s => s);

    public static IQueryable<CountOfT<TKey>> DuplicatesCounts<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
        => source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(y => new CountOfT<TKey> { Key = y.Key, Count = y.Count() });

    public static IQueryable<Tuple<TKey, int>> DuplicatesCountsAsTuble<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
        => source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(s => Tuple.Create(s.Key, s.Count()));
}

— GeoB
source

0

有一个答案，但我不明白为什么不起作用；

var anyDuplicate = enumerable.GroupBy(x => x.Key).Any(g => g.Count() > 1);

在这种情况下，我的解决方案就是这样；

var duplicates = model.list
                    .GroupBy(s => s.SAME_ID)
                    .Where(g => g.Count() > 1).Count() > 0;
if(duplicates) {
    doSomething();
}

— 艾库特·金杜格（AykutGündoğdu）
source