使用Linq获取集合的最后N个元素？

284

给定一个集合，有没有办法获取该集合的最后N个元素？如果框架中没有方法，那么编写扩展方法来执行此操作的最佳方法是什么？

c# linq

422

collection.Skip(Math.Max(0, collection.Count() - N));

这种方法保留了项目顺序，而无需依赖任何排序，并且在多个LINQ提供程序之间具有广泛的兼容性。

重要的是要注意不要以Skip负数呼叫。某些提供程序（例如，实体框架）在出现负参数时将产生ArgumentException。呼吁Math.Max巧妙地避免这种情况。

下面的类具有扩展方法的所有基本要素，这些要素是：静态类，静态方法和this关键字的使用。

public static class MiscExtensions
{
    // Ex: collection.TakeLast(5);
    public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> source, int N)
    {
        return source.Skip(Math.Max(0, source.Count() - N));
    }
}

关于性能的简要说明：

因为对的调用Count()可能导致某些数据结构的枚举，所以这种方法具有导致两次通过数据的风险。对于大多数枚举对象而言，这并不是真正的问题。实际上，已经存在针对列表，数组甚至EF查询的优化，以评估Count()O（1）时间的操作。

但是，如果必须使用仅向前枚举，并且希望避免进行两次遍历，请考虑像Lasse V. Karlsen或Mark Byers所描述的单遍算法。这两种方法都使用临时缓冲区在枚举时保存项目，一旦找到集合的末尾便会产生这些项目。

— kbrimington
source

2

+1，因为这在Linq to Entities / SQL中有效。我猜它在Linq to Objects中也比James Curran的策略表现更好。

— StriplingWarrior 2010年

11

取决于集合的性质。Count（）可能是O（N）。

— James Curran 2010年

3

@詹姆斯：完全正确。如果严格处理IEnumerable集合，则可能是两次通过查询。我会对看到有保证的1次通过算法非常感兴趣。这可能很有用。

— kbrimington

4

做了一些基准测试。事实证明LINQ to Objects根据您使用的集合类型执行一些优化。使用数组Lists和LinkedLists，James的解决方案趋向于更快，尽管幅度不大。如果计算IEnumerable（例如，通过Enumerable.Range），James的解决方案将花费更长的时间。我无法想出任何一种方法来保证一次通过，而不需要了解实现的某些知识或将值复制到不同的数据结构中。

— StriplingWarrior

1

@RedFilter-足够公平。我想我的传呼习惯在这里泄漏了。感谢您敏锐的目光。

— kbrimington 2012年

59

coll.Reverse().Take(N).Reverse().ToList();


public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> coll, int N)
{
    return coll.Reverse().Take(N).Reverse();
}

更新：解决clintp的问题：a）使用我上面定义的TakeLast（）方法可以解决此问题，但是如果您确实希望在不使用其他方法的情况下执行此操作，则只需认识到Enumerable.Reverse（）可以用作扩展方法，则不需要以这种方式使用它：

List<string> mystring = new List<string>() { "one", "two", "three" }; 
mystring = Enumerable.Reverse(mystring).Take(2).Reverse().ToList();

— 詹姆斯·柯伦
source

List<string> mystring = new List<string>() { "one", "two", "three" }; mystring = mystring.Reverse().Take(2).Reverse(); 我遇到的问题是，如果我说：我收到一个编译器错误，因为.Reverse（）返回void并且编译器选择该方法而不是返回IEnumerable的Linq方法。有什么建议吗？

— 克林顿·皮尔斯

1

您可以通过将mystring显式转换为IEnumerable <String>来解决此问题：（（IEnumerable <String>）mystring）.Reverse（）。Take（2）.Reverse（）

— Jan Hettich

足够容易和简单，但是需要将顺序完全反转两次。这可能是最好的方法

— shashwat 2013年

除了kbrimington接受的答案外，我还喜欢它。如果您不在乎最后一条N记录后的顺序，则可以跳过第二条记录Reverse。

— ZoolWay 2014年

@shashwat它不会“完全”两次反转订单。第二次冲销仅适用于N个物料的集合。此外，根据Reverse（）的实现方式，对其的第一次调用可能只会反转N个项目。（.NET 4.0实现会将集合复制到数组，并向后索引）

— James Curran，

47

注意：我想念您的问题标题“ 使用Linq”，所以我的答案实际上并未使用Linq。

如果要避免缓存整个集合的非惰性副本，则可以编写一个使用链接列表执行此操作的简单方法。

以下方法将在原始集合中找到的每个值添加到链接列表中，并将链接列表缩小为所需的项目数。由于它通过遍历整个集合始终将链接列表修剪为该项目数，因此它将仅保留原始集合中最多N个项目的副本。

它不需要您知道原始集合中的项目数，也不需要多次对其进行迭代。

用法：

IEnumerable<int> sequence = Enumerable.Range(1, 10000);
IEnumerable<int> last10 = sequence.TakeLast(10);
...

扩展方式：

public static class Extensions
{
    public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> collection,
        int n)
    {
        if (collection == null)
            throw new ArgumentNullException(nameof(collection));
        if (n < 0)
            throw new ArgumentOutOfRangeException(nameof(n), $"{nameof(n)} must be 0 or greater");

        LinkedList<T> temp = new LinkedList<T>();

        foreach (var value in collection)
        {
            temp.AddLast(value);
            if (temp.Count > n)
                temp.RemoveFirst();
        }

        return temp;
    }
}

— 拉瑟诉卡尔森
source

我仍然认为即使您不是从技术上使用Linq，您也可以得到一个很好的答案，所以我还是给您+1 :)

— Matthew Groves 2010年

干净，整洁且可扩展的+1！

— Yasser Shaikh

1

我认为这是不会导致源枚举器运行两次（或多次）并且不会强制枚举实现的唯一解决方案，因此在大多数应用程序中，我会说这将在效率上大得多的内存和速度。

— Sprotty

30

这是一种适用于任何枚举但仅使用O（N）临时存储的方法：

public static class TakeLastExtension
{
    public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> source, int takeCount)
    {
        if (source == null) { throw new ArgumentNullException("source"); }
        if (takeCount < 0) { throw new ArgumentOutOfRangeException("takeCount", "must not be negative"); }
        if (takeCount == 0) { yield break; }

        T[] result = new T[takeCount];
        int i = 0;

        int sourceCount = 0;
        foreach (T element in source)
        {
            result[i] = element;
            i = (i + 1) % takeCount;
            sourceCount++;
        }

        if (sourceCount < takeCount)
        {
            takeCount = sourceCount;
            i = 0;
        }

        for (int j = 0; j < takeCount; ++j)
        {
            yield return result[(i + j) % takeCount];
        }
    }
}

用法：

List<int> l = new List<int> {4, 6, 3, 6, 2, 5, 7};
List<int> lastElements = l.TakeLast(3).ToList();

它通过使用大小为N的环形缓冲区来存储所看到的元素，并用新元素覆盖旧元素来工作。当达到可枚举的末尾时，环形缓冲区将包含最后N个元素。

— 马克·拜尔斯
source

2

+1：这应该比我的性能更好，但是当集合包含的元素少于时，您应该确保它做对了n。

— 拉瑟五世卡尔森

好吧，在大多数情况下，我认为人们会从SO复制代码以用于生产时自行添加此类内容时会小心，这可能不是问题。如果要添加它，请考虑同时检查collection变量是否为null。否则，这是一个很好的解决方案：）我当时正在考虑自己使用环形缓冲区，因为链表会增加GC压力，但是距离我这样做已经有一段时间了，而且我不想麻烦测试代码来弄清楚如果我做对了。我必须说我爱上LINQPad了：）linqpad.net

— Lasse V. Karlsen 2010年

2

一种可能的优化方法是检查可枚举的IList是否实现，如果可以，则使用简单的解决方案。那么，只有真正“流化” IEnumerables时才需要临时存储方法

— piers7 2011年

1

琐碎的nit-pick：您对ArgumentOutOfRangeException的参数顺序错误（R＃说）

— piers7 2011年

28

.NET Core 2.0+提供了LINQ方法TakeLast()：

https://docs.microsoft.com/zh-cn/dotnet/api/system.linq.enumerable.takelast

例如：

Enumerable
    .Range(1, 10)
    .TakeLast(3) // <--- takes last 3 items
    .ToList()
    .ForEach(i => System.Console.WriteLine(i))

// outputs:
// 8
// 9
// 10

— 射线
source

我正在使用：NET Standard 2.0，但没有可用。怎么了？:(

— SuperJMN

@SuperJMN尽管您可能引用了.net标准2.0库，但是您可能未在项目中定位到正确版本的dotnet core。此方法不适用于v1.x（netcoreapp1.x），仅适用于dotnetcore（netcoreapp2.x）的v2.0和v2.1 。您可能会针对net472也不支持的完整框架（例如）。（.NET标准库可以通过任何上述使用，但可以仅暴露某些API特定于目标框架看到。docs.microsoft.com/en-us/dotnet/standard/frameworks）

— 雷

1

这些现在需要更高。不需要重新发明轮子

— 詹姆斯伍德利

11

我很惊讶没有人提及它，但是SkipWhile确实有一种使用元素的index的方法。

public static IEnumerable<T> TakeLastN<T>(this IEnumerable<T> source, int n)
{
    if (source == null)
        throw new ArgumentNullException("Source cannot be null");

    int goldenIndex = source.Count() - n;
    return source.SkipWhile((val, index) => index < goldenIndex);
}

//Or if you like them one-liners (in the spirit of the current accepted answer);
//However, this is most likely impractical due to the repeated calculations
collection.SkipWhile((val, index) => index < collection.Count() - N)

与其他解决方案相比，此解决方案提供的唯一可感知的好处是，您可以选择添加谓词以进行更强大和高效的LINQ查询，而不用具有两次遍历IEnumerable的单独操作。

public static IEnumerable<T> FilterLastN<T>(this IEnumerable<T> source, int n, Predicate<T> pred)
{
    int goldenIndex = source.Count() - n;
    return source.SkipWhile((val, index) => index < goldenIndex && pred(val));
}

— 尼克·巴布科克
source

9

在RX的System.Interactive程序集中使用EnumerableEx.TakeLast。它是类似于@Mark的O（N）实现，但它使用队列而不是环形缓冲区构造（并且在达到缓冲区容量时将项目出队）。

（注意：这是IEnumerable版本-而不是IObservable版本，尽管两者的实现几乎相同）

— 码头7
source

这是最好的答案。如果有合适的库做这项工作并且RX团队素质很高，请不要自己动手做。

— bradgonesurfing

-如果你有这个打算，从的NuGet安装nuget.org/packages/Ix-Async

— nikib3ro

C＃不Queue<T>使用循环缓冲区实现吗？

— tigrou

@tigrou。不，它不是圆形的

— citykid

1

然后，文档必须在撒谎

— tigrou

6

如果您正在处理带有键的集合（例如，来自数据库的条目），那么快速（即比所选答案更快）的解决方案将是

collection.OrderByDescending(c => c.Key).Take(3).OrderBy(c => c.Key);

— dav_i
source

+1对我

— 有用

5

如果您不介意将Rx作为monad的一部分，可以使用TakeLast：

IEnumerable<int> source = Enumerable.Range(1, 10000);

IEnumerable<int> lastThree = source.AsObservable().TakeLast(3).AsEnumerable();

— 理查德·萨雷（Richard Szalay）
source

2

如果引用RX的System.Interactive而不是System.Reactive，则不需要AsObservable（）（请参阅我的回答）

— piers7 2011年

2

如果可以选择使用第三方库，则MoreLinq会定义使用TakeLast()哪一种库。

— 短信
source

2

我试图结合效率和简单性，最终得出以下结论：

public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> source, int count)
{
    if (source == null) { throw new ArgumentNullException("source"); }

    Queue<T> lastElements = new Queue<T>();
    foreach (T element in source)
    {
        lastElements.Enqueue(element);
        if (lastElements.Count > count)
        {
            lastElements.Dequeue();
        }
    }

    return lastElements;
}

关于性能：在C＃中，Queue<T>使用循环缓冲区实现，因此每个循环都没有完成对象实例化（仅当队列长大时）。我没有设置队列容量（使用专用的构造函数），因为有人可能会使用来调用此扩展count = int.MaxValue。为了获得更高的性能，您可以检查源工具是否实现IList<T>，如果是，则使用数组索引直接提取最后一个值。

— 提格鲁
source

1

使用LINQ获取集合的最后N个是有点效率低下的，因为上述所有解决方案都需要在集合中进行迭代。TakeLast(int n)在System.Interactive也有这个问题。

如果您有列表，则更有效的方法是使用以下方法对其进行切片

/// Select from start to end exclusive of end using the same semantics
/// as python slice.
/// <param name="list"> the list to slice</param>
/// <param name="start">The starting index</param>
/// <param name="end">The ending index. The result does not include this index</param>
public static List<T> Slice<T>
(this IReadOnlyList<T> list, int start, int? end = null)
{
    if (end == null)
    {
        end = list.Count();
    }
     if (start < 0)
    {
        start = list.Count + start;
    }
     if (start >= 0 && end.Value > 0 && end.Value > start)
    {
        return list.GetRange(start, end.Value - start);
    }
     if (end < 0)
    {
        return list.GetRange(start, (list.Count() + end.Value) - start);
    }
     if (end == start)
    {
        return new List<T>();
    }
     throw new IndexOutOfRangeException(
        "count = " + list.Count() + 
        " start = " + start +
        " end = " + end);
}

与

public static List<T> GetRange<T>( this IReadOnlyList<T> list, int index, int count )
{
    List<T> r = new List<T>(count);
    for ( int i = 0; i < count; i++ )
    {
        int j=i + index;
        if ( j >= list.Count )
        {
            break;
        }
        r.Add(list[j]);
    }
    return r;
}

和一些测试用例

[Fact]
public void GetRange()
{
    IReadOnlyList<int> l = new List<int>() { 0, 10, 20, 30, 40, 50, 60 };
     l
        .GetRange(2, 3)
        .ShouldAllBeEquivalentTo(new[] { 20, 30, 40 });
     l
        .GetRange(5, 10)
        .ShouldAllBeEquivalentTo(new[] { 50, 60 });

}
 [Fact]
void SliceMethodShouldWork()
{
    var list = new List<int>() { 1, 3, 5, 7, 9, 11 };
    list.Slice(1, 4).ShouldBeEquivalentTo(new[] { 3, 5, 7 });
    list.Slice(1, -2).ShouldBeEquivalentTo(new[] { 3, 5, 7 });
    list.Slice(1, null).ShouldBeEquivalentTo(new[] { 3, 5, 7, 9, 11 });
    list.Slice(-2)
        .Should()
        .BeEquivalentTo(new[] {9, 11});
     list.Slice(-2,-1 )
        .Should()
        .BeEquivalentTo(new[] {9});
}

— bradgone冲浪
source

1

我知道回答这个问题已经晚了。但是，如果您使用的是IList <>类型的集合，并且您不关心返回的集合的顺序，则此方法的运行速度会更快。我已经用Mark Byers的答案进行了一些更改。所以现在方法TakeLast是：

public static IEnumerable<T> TakeLast<T>(IList<T> source, int takeCount)
{
    if (source == null) { throw new ArgumentNullException("source"); }
    if (takeCount < 0) { throw new ArgumentOutOfRangeException("takeCount", "must not be negative"); }
    if (takeCount == 0) { yield break; }

    if (source.Count > takeCount)
    {
        for (int z = source.Count - 1; takeCount > 0; z--)
        {
            takeCount--;
            yield return source[z];
        }
    }
    else
    {
        for(int i = 0; i < source.Count; i++)
        {
            yield return source[i];
        }
    }
}

为了进行测试，我使用了Mark Byers方法和kbrimington的andswer。这是测试：

IList<int> test = new List<int>();
for(int i = 0; i<1000000; i++)
{
    test.Add(i);
}

Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();

IList<int> result = TakeLast(test, 10).ToList();

stopwatch.Stop();

Stopwatch stopwatch1 = new Stopwatch();
stopwatch1.Start();

IList<int> result1 = TakeLast2(test, 10).ToList();

stopwatch1.Stop();

Stopwatch stopwatch2 = new Stopwatch();
stopwatch2.Start();

IList<int> result2 = test.Skip(Math.Max(0, test.Count - 10)).Take(10).ToList();

stopwatch2.Stop();

以下是采用10个元素的结果：

在此处输入图片说明

对于1000001个元素，结果为：在此处输入图片说明

— 萨沙
source

1

这是我的解决方案：

public static class EnumerationExtensions
{
    public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> input, int count)
    {
        if (count <= 0)
            yield break;

        var inputList = input as IList<T>;

        if (inputList != null)
        {
            int last = inputList.Count;
            int first = last - count;

            if (first < 0)
                first = 0;

            for (int i = first; i < last; i++)
                yield return inputList[i];
        }
        else
        {
            // Use a ring buffer. We have to enumerate the input, and we don't know in advance how many elements it will contain.
            T[] buffer = new T[count];

            int index = 0;

            count = 0;

            foreach (T item in input)
            {
                buffer[index] = item;

                index = (index + 1) % buffer.Length;
                count++;
            }

            // The index variable now points at the next buffer entry that would be filled. If the buffer isn't completely
            // full, then there are 'count' elements preceding index. If the buffer *is* full, then index is pointing at
            // the oldest entry, which is the first one to return.
            //
            // If the buffer isn't full, which means that the enumeration has fewer than 'count' elements, we'll fix up
            // 'index' to point at the first entry to return. That's easy to do; if the buffer isn't full, then the oldest
            // entry is the first one. :-)
            //
            // We'll also set 'count' to the number of elements to be returned. It only needs adjustment if we've wrapped
            // past the end of the buffer and have enumerated more than the original count value.

            if (count < buffer.Length)
                index = 0;
            else
                count = buffer.Length;

            // Return the values in the correct order.
            while (count > 0)
            {
                yield return buffer[index];

                index = (index + 1) % buffer.Length;
                count--;
            }
        }
    }

    public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> input, int count)
    {
        if (count <= 0)
            return input;
        else
            return input.SkipLastIter(count);
    }

    private static IEnumerable<T> SkipLastIter<T>(this IEnumerable<T> input, int count)
    {
        var inputList = input as IList<T>;

        if (inputList != null)
        {
            int first = 0;
            int last = inputList.Count - count;

            if (last < 0)
                last = 0;

            for (int i = first; i < last; i++)
                yield return inputList[i];
        }
        else
        {
            // Aim to leave 'count' items in the queue. If the input has fewer than 'count'
            // items, then the queue won't ever fill and we return nothing.

            Queue<T> elements = new Queue<T>();

            foreach (T item in input)
            {
                elements.Enqueue(item);

                if (elements.Count > count)
                    yield return elements.Dequeue();
            }
        }
    }
}

该代码虽然有点笨拙，但是作为一个可重用的嵌入式组件，它在大多数情况下都应具有良好的性能，并且将使使用它的代码保持简洁。:-)

My TakeLastfor non- IList`1基于与@Mark Byers和@MackieChan的答案相同的环形缓冲区算法。它们多么相似很有趣-我完全独立地撰写了我的文章。猜猜实际上只有一种正确处理环形缓冲区的方法。:-)

查看@kbrimington的答案，可以为此添加一个额外的检查，IQuerable<T>以退回到与Entity Framework一起很好地使用的方法-假设我目前没有。

— 乔纳森·吉尔伯特
source

0

在真实示例下面，如何从集合（数组）中获取最后3个元素：

// split address by spaces into array
string[] adrParts = adr.Split(new string[] { " " },StringSplitOptions.RemoveEmptyEntries);
// take only 3 last items in array
adrParts = adrParts.SkipWhile((value, index) => { return adrParts.Length - index > 3; }).ToArray();

— 亚历山大·蒂姆科夫（Aleksey Timkov）
source

0

使用此方法获取所有范围而不会出现错误

 public List<T> GetTsRate( List<T> AllT,int Index,int Count)
        {
            List<T> Ts = null;
            try
            {
                Ts = AllT.ToList().GetRange(Index, Count);
            }
            catch (Exception ex)
            {
                Ts = AllT.Skip(Index).ToList();
            }
            return Ts ;
        }

— 阿里·阿斯加尔·芬德斯基
source

0

使用循环缓冲区的实现几乎没有什么不同。基准测试表明，该方法比使用Queue（在System.Linq中实现TakeLast的方法）快大约两倍，但是并非没有代价-它需要一个缓冲区，该缓冲区随所请求的元素数量而增长，即使您拥有一个小集合可以获取巨大的内存分配。

public IEnumerable<T> TakeLast<T>(IEnumerable<T> source, int count)
{
    int i = 0;

    if (count < 1)
        yield break;

    if (source is IList<T> listSource)
    {
        if (listSource.Count < 1)
            yield break;

        for (i = listSource.Count < count ? 0 : listSource.Count - count; i < listSource.Count; i++)
            yield return listSource[i];

    }
    else
    {
        bool move = true;
        bool filled = false;
        T[] result = new T[count];

        using (var enumerator = source.GetEnumerator())
            while (move)
            {
                for (i = 0; (move = enumerator.MoveNext()) && i < count; i++)
                    result[i] = enumerator.Current;

                filled |= move;
            }

        if (filled)
            for (int j = i; j < count; j++)
                yield return result[j];

        for (int j = 0; j < i; j++)
            yield return result[j];

    }
}

— 罗马斯
source