使用LINQ将列表拆分为子列表

377

有什么方法可以将item用作每个拆分的定界符，将a List<SomeObject>分为几个单独的列表SomeObject？

让我举例说明：

我有一个List<SomeObject>和我需要一个List<List<SomeObject>>或List<SomeObject>[]，以便每个结果列表将包含一组3个原始列表（依次）。

例如。：

原始清单： [a, g, e, w, p, s, q, f, x, y, i, m, c]
结果列表： [a, g, e], [w, p, s], [q, f, x], [y, i, m], [c]

我还需要将结果列表的大小作为此函数的参数。

c# linq data-structures

— 费利佩·利马
source

378

请尝试以下代码。

public static IList<IList<T>> Split<T>(IList<T> source)
{
    return  source
        .Select((x, i) => new { Index = i, Value = x })
        .GroupBy(x => x.Index / 3)
        .Select(x => x.Select(v => v.Value).ToList())
        .ToList();
}

想法是首先按索引对元素进行分组。除以3的效果是将它们分为3组。然后将每个组转换为一个列表，将IEnumerableof List转换List为List的

— 贾里德·帕
source

21

GroupBy执行隐式排序。那会降低性能。我们需要的是SelectMany的某种逆。

— yfeldblum

5

@ Justice，GroupBy可以通过哈希实现。您如何知道GroupBy的实现“会破坏性能”？

— 艾米B

5

在枚举所有元素之前，GroupBy不会返回任何内容。这就是为什么它很慢。OP想要的列表是连续的，因此更好的方法可以[a,g,e]在枚举更多原始列表之前产生第一个子列表。

— Panic Panic

9

以无限IEnumerable的极端示例为例。GroupBy(x=>f(x)).First()永远不会产生一群。OP询问了有关列表的信息，但是如果我们编写与IEnumerable一起使用的代码，而只进行一次迭代，那么我们将获得性能优势。

— Panic Panic

8

@Nick Order不会保留您的方式。知道仍然是一件好事，但是您可以将它们分为（0,3,6,9，...），（1,4,7,10，...），（2,5,8 ，11，...）。如果顺序无关紧要，那很好，但是在这种情况下，听起来好像很重要。

— Reafexus 2013年

325

这个问题有点老了，但是我只是写了这个，而且我认为它比其他提出的解决方案还优雅一些：

/// <summary>
/// Break a list of items into chunks of a specific size
/// </summary>
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, int chunksize)
{
    while (source.Any())
    {
        yield return source.Take(chunksize);
        source = source.Skip(chunksize);
    }
}

— 凯西
source

14

喜欢这个解决方案。我建议添加此健全性检查以防止无限循环： if (chunksize <= 0) throw new ArgumentException("Chunk size must be greater than zero.", "chunksize");

— 蟑螂

10

我喜欢这个，但是效率不是很高

— Sam Saffron

51

我喜欢这个，但是时间效率却是O(n²)。您可以遍历该列表并获得O(n)时间。

— 嬉皮

8

@ hIpPy，n ^ 2怎么样？对我来说

— 似乎很

13

@vivekmaharajh每次source都用换行符代替IEnumerable。因此，从中获取元素需要source经过Skips 层

— Lasse Espeholt

99

通常，CaseyB建议的方法可以很好地工作，实际上，如果您传入了a List<T>，很难对它进行指责，也许我会将其更改为：

public static IEnumerable<IEnumerable<T>> ChunkTrivialBetter<T>(this IEnumerable<T> source, int chunksize)
{
   var pos = 0; 
   while (source.Skip(pos).Any())
   {
      yield return source.Skip(pos).Take(chunksize);
      pos += chunksize;
   }
}

这样可以避免大量的呼叫链。但是，这种方法有一个普遍的缺陷。它为每个块实现两个枚举，以突出显示尝试运行的问题：

foreach (var item in Enumerable.Range(1, int.MaxValue).Chunk(8).Skip(100000).First())
{
   Console.WriteLine(item);
}
// wait forever

为了克服这个问题，我们可以尝试Cameron的方法，该方法通过了上面的测试，而且只经过了一次枚举，因此飞扬地通过了上述测试。

问题是它有一个不同的缺陷，它实现了每个块中的每个项目，这种方法的问题在于您的内存消耗很高。

为了说明该尝试运行：

foreach (var item in Enumerable.Range(1, int.MaxValue)
               .Select(x => x + new string('x', 100000))
               .Clump(10000).Skip(100).First())
{
   Console.Write('.');
}
// OutOfMemoryException

最后，任何实现都应该能够处理块的无序迭代，例如：

Enumerable.Range(1,3).Chunk(2).Reverse().ToArray()
// should return [3],[1,2]

许多高度优化的解决方案（例如我对该答案的第一个修订版）在那里失败。在casperOne的优化答案中可以看到相同的问题。

要解决所有这些问题，可以使用以下方法：

namespace ChunkedEnumerator
{
    public static class Extensions 
    {
        class ChunkedEnumerable<T> : IEnumerable<T>
        {
            class ChildEnumerator : IEnumerator<T>
            {
                ChunkedEnumerable<T> parent;
                int position;
                bool done = false;
                T current;


                public ChildEnumerator(ChunkedEnumerable<T> parent)
                {
                    this.parent = parent;
                    position = -1;
                    parent.wrapper.AddRef();
                }

                public T Current
                {
                    get
                    {
                        if (position == -1 || done)
                        {
                            throw new InvalidOperationException();
                        }
                        return current;

                    }
                }

                public void Dispose()
                {
                    if (!done)
                    {
                        done = true;
                        parent.wrapper.RemoveRef();
                    }
                }

                object System.Collections.IEnumerator.Current
                {
                    get { return Current; }
                }

                public bool MoveNext()
                {
                    position++;

                    if (position + 1 > parent.chunkSize)
                    {
                        done = true;
                    }

                    if (!done)
                    {
                        done = !parent.wrapper.Get(position + parent.start, out current);
                    }

                    return !done;

                }

                public void Reset()
                {
                    // per http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.reset.aspx
                    throw new NotSupportedException();
                }
            }

            EnumeratorWrapper<T> wrapper;
            int chunkSize;
            int start;

            public ChunkedEnumerable(EnumeratorWrapper<T> wrapper, int chunkSize, int start)
            {
                this.wrapper = wrapper;
                this.chunkSize = chunkSize;
                this.start = start;
            }

            public IEnumerator<T> GetEnumerator()
            {
                return new ChildEnumerator(this);
            }

            System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
            {
                return GetEnumerator();
            }

        }

        class EnumeratorWrapper<T>
        {
            public EnumeratorWrapper (IEnumerable<T> source)
            {
                SourceEumerable = source;
            }
            IEnumerable<T> SourceEumerable {get; set;}

            Enumeration currentEnumeration;

            class Enumeration
            {
                public IEnumerator<T> Source { get; set; }
                public int Position { get; set; }
                public bool AtEnd { get; set; }
            }

            public bool Get(int pos, out T item) 
            {

                if (currentEnumeration != null && currentEnumeration.Position > pos)
                {
                    currentEnumeration.Source.Dispose();
                    currentEnumeration = null;
                }

                if (currentEnumeration == null)
                {
                    currentEnumeration = new Enumeration { Position = -1, Source = SourceEumerable.GetEnumerator(), AtEnd = false };
                }

                item = default(T);
                if (currentEnumeration.AtEnd)
                {
                    return false;
                }

                while(currentEnumeration.Position < pos) 
                {
                    currentEnumeration.AtEnd = !currentEnumeration.Source.MoveNext();
                    currentEnumeration.Position++;

                    if (currentEnumeration.AtEnd) 
                    {
                        return false;
                    }

                }

                item = currentEnumeration.Source.Current;

                return true;
            }

            int refs = 0;

            // needed for dispose semantics 
            public void AddRef()
            {
                refs++;
            }

            public void RemoveRef()
            {
                refs--;
                if (refs == 0 && currentEnumeration != null)
                {
                    var copy = currentEnumeration;
                    currentEnumeration = null;
                    copy.Source.Dispose();
                }
            }
        }

        public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, int chunksize)
        {
            if (chunksize < 1) throw new InvalidOperationException();

            var wrapper =  new EnumeratorWrapper<T>(source);

            int currentPos = 0;
            T ignore;
            try
            {
                wrapper.AddRef();
                while (wrapper.Get(currentPos, out ignore))
                {
                    yield return new ChunkedEnumerable<T>(wrapper, chunksize, currentPos);
                    currentPos += chunksize;
                }
            }
            finally
            {
                wrapper.RemoveRef();
            }
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            int i = 10;
            foreach (var group in Enumerable.Range(1, int.MaxValue).Skip(10000000).Chunk(3))
            {
                foreach (var n in group)
                {
                    Console.Write(n);
                    Console.Write(" ");
                }
                Console.WriteLine();
                if (i-- == 0) break;
            }


            var stuffs = Enumerable.Range(1, 10).Chunk(2).ToArray();

            foreach (var idx in new [] {3,2,1})
            {
                Console.Write("idx " + idx + " ");
                foreach (var n in stuffs[idx])
                {
                    Console.Write(n);
                    Console.Write(" ");
                }
                Console.WriteLine();
            }

            /*

10000001 10000002 10000003
10000004 10000005 10000006
10000007 10000008 10000009
10000010 10000011 10000012
10000013 10000014 10000015
10000016 10000017 10000018
10000019 10000020 10000021
10000022 10000023 10000024
10000025 10000026 10000027
10000028 10000029 10000030
10000031 10000032 10000033
idx 3 7 8
idx 2 5 6
idx 1 3 4
             */

            Console.ReadKey();


        }

    }
}

您还可以针对块的无序迭代进行一轮优化，这不在本文的讨论范围之内。

至于应该选择哪种方法？这完全取决于您要解决的问题。如果您不关心第一个缺陷，那么简单的答案就非常有吸引力。

请注意，与大多数方法一样，这对于多线程来说并不安全，如果您希望使其成为线程安全的东西，则可能会变得很奇怪，需要进行修改EnumeratorWrapper。

— 山姆藏红花
source

该错误是Enumerable.Range（0，100）.Chunk（3）.Reverse（）。ToArray（）错误还是Enumerable.Range（0，100）.ToArray（）。Chunk（3）.Reverse（） .ToArray（）抛出异常？

— 卡梅伦·麦克法兰

@SamSaffron我已经更新了答案，并极大地简化了代码，因为我觉得这是最突出的用例（并请注意警告）。

— casperOne

怎么样处理IQueryable <>？我的猜测是，如果我们想将最多的操作委托给提供者，则采用“套取/跳过”方法将是最佳选择

— Guillaume86

@ Guillaume86我同意，如果您拥有IList或IQueryable，则可以采用各种快捷方式，这些快捷方式可以使速度更快（Linq在内部针对其他各种方法执行此操作）

— Sam Saffron 2012年

1

到目前为止，这是效率的最佳答案。我在将SqlBulkCopy与IEnumerable结合使用时遇到问题，该IEnumerable在每列上运行其他进程，因此它必须高效地运行一次。这将使我可以将IEnumerable分解为可管理的大小块。（对于那些想知道的人，我确实启用了SqlBulkCopy的流模式，该模式似乎已损坏）。

— Brain2000 '16

64

我相信您可以使用许多使用Take和的查询Skip，但这些查询会在原始列表上添加过多的迭代。

相反，我认为您应该创建自己的迭代器，如下所示：

public static IEnumerable<IEnumerable<T>> GetEnumerableOfEnumerables<T>(
  IEnumerable<T> enumerable, int groupSize)
{
   // The list to return.
   List<T> list = new List<T>(groupSize);

   // Cycle through all of the items.
   foreach (T item in enumerable)
   {
     // Add the item.
     list.Add(item);

     // If the list has the number of elements, return that.
     if (list.Count == groupSize)
     {
       // Return the list.
       yield return list;

       // Set the list to a new list.
       list = new List<T>(groupSize);
     }
   }

   // Return the remainder if there is any,
   if (list.Count != 0)
   {
     // Return the list.
     yield return list;
   }
}

然后，您可以调用此函数，并且启用了LINQ，因此可以对所得序列执行其他操作。

根据Sam的回答，我觉得有一个更简单的方法可以做到：

再次遍历列表（我本来没有这么做）
在释放大块之前将这些项目分组实现（对于大块项目，将存在内存问题）
Sam发布的所有代码

就是说，这是另一遍，我在扩展方法中将其编码IEnumerable<T>为Chunk：

public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, 
    int chunkSize)
{
    // Validate parameters.
    if (source == null) throw new ArgumentNullException("source");
    if (chunkSize <= 0) throw new ArgumentOutOfRangeException("chunkSize",
        "The chunkSize parameter must be a positive value.");

    // Call the internal implementation.
    return source.ChunkInternal(chunkSize);
}

没什么奇怪的，只是基本的错误检查。

转到ChunkInternal：

private static IEnumerable<IEnumerable<T>> ChunkInternal<T>(
    this IEnumerable<T> source, int chunkSize)
{
    // Validate parameters.
    Debug.Assert(source != null);
    Debug.Assert(chunkSize > 0);

    // Get the enumerator.  Dispose of when done.
    using (IEnumerator<T> enumerator = source.GetEnumerator())
    do
    {
        // Move to the next element.  If there's nothing left
        // then get out.
        if (!enumerator.MoveNext()) yield break;

        // Return the chunked sequence.
        yield return ChunkSequence(enumerator, chunkSize);
    } while (true);
}

基本上，它会获取IEnumerator<T>和手动遍历每个项目。它检查是否当前有任何项目要枚举。枚举每个块之后，如果没有剩余的项目，它就会爆发。

一旦检测到序列中有项目，就将内部IEnumerable<T>实现的责任委托给ChunkSequence：

private static IEnumerable<T> ChunkSequence<T>(IEnumerator<T> enumerator, 
    int chunkSize)
{
    // Validate parameters.
    Debug.Assert(enumerator != null);
    Debug.Assert(chunkSize > 0);

    // The count.
    int count = 0;

    // There is at least one item.  Yield and then continue.
    do
    {
        // Yield the item.
        yield return enumerator.Current;
    } while (++count < chunkSize && enumerator.MoveNext());
}

由于MoveNext已经在IEnumerator<T>传递给时调用了ChunkSequence它，因此它会产生返回的项目Current，然后递增计数，确保chunkSize每次迭代后都不要返回超过项目的数量，并移至序列中的下一个项目（但如果产生的项目超过了块大小）。

如果没有剩余的项目，则该InternalChunk方法将在外循环中进行另一遍传递，但是根据文档（强调我的MoveNext观点），第二次调用该方法仍将返回false ：

如果MoveNext通过了集合的末尾，则枚举数将位于集合中最后一个元素之后，并且MoveNext返回false。当枚举器位于此位置时，对MoveNext的后续调用也将返回false，直到调用Reset。

此时，循环将中断，序列序列将终止。

这是一个简单的测试：

static void Main()
{
    string s = "agewpsqfxyimc";

    int count = 0;

    // Group by three.
    foreach (IEnumerable<char> g in s.Chunk(3))
    {
        // Print out the group.
        Console.Write("Group: {0} - ", ++count);

        // Print the items.
        foreach (char c in g)
        {
            // Print the item.
            Console.Write(c + ", ");
        }

        // Finish the line.
        Console.WriteLine();
    }
}

输出：

Group: 1 - a, g, e,
Group: 2 - w, p, s,
Group: 3 - q, f, x,
Group: 4 - y, i, m,
Group: 5 - c,

一个重要的注意事项，如果您不耗尽整个子序列或在父序列的任何位置中断都不会起作用。这是一个重要的警告，但是如果您的用例是您将消耗序列序列中的每个元素，那么这将对您有用。

此外，如果您按顺序操作，它将做奇怪的事情，就像Sam在某一时刻所做的那样。

— 卡斯珀一
source

我认为这是最好的解决方案...唯一的问题是列表没有长度...它具有计数。但这很容易改变。通过甚至不构造列表，而是返回包含以偏移量/长度组合包含对主列表的引用的枚举数，可以使此方法更好。因此，如果组大小很大，我们就不会浪费内存。如果您要我写下来，请发表评论。

— 阿米尔（Amir）2009年

@Amir我想看到的是写了

— samandmoore

这很好又快-Cameron在您之后也发布了一个非常相似的内容，唯一的警告是它会缓存块，如果块和项目大小很大，可能会导致内存不足。请参阅我的答案，尽管有很多毛病。

— 山姆·萨弗隆

@SamSaffron是的，如果您的中有大量项目，则List<T>由于缓冲，显然会出现内存问题。回想起来，我应该在回答中指出这一点，但当时似乎重点是太多的迭代。话虽如此，您的解决方案确实更加实用。我还没有测试过，但是现在我想知道是否有一个少毛的解决方案。

— casperOne'5

@casperOne是的...当我正在寻找一种拆分可枚举的方法时，Google给了我这个页面，对于我的特定用例，如果我将它们归类为列出它会被炸毁（实际上dapper仅针对此用例具有buffer：false选项）

— Sam Saffron'5

48

好的，这是我的看法：

完全懒惰：适用于无限枚举
无中间复制/缓冲
O（n）执行时间
当内部序列仅部分消耗时，也可以使用

public static IEnumerable<IEnumerable<T>> Chunks<T>(this IEnumerable<T> enumerable,
                                                    int chunkSize)
{
    if (chunkSize < 1) throw new ArgumentException("chunkSize must be positive");

    using (var e = enumerable.GetEnumerator())
    while (e.MoveNext())
    {
        var remaining = chunkSize;    // elements remaining in the current chunk
        var innerMoveNext = new Func<bool>(() => --remaining > 0 && e.MoveNext());

        yield return e.GetChunk(innerMoveNext);
        while (innerMoveNext()) {/* discard elements skipped by inner iterator */}
    }
}

private static IEnumerable<T> GetChunk<T>(this IEnumerator<T> e,
                                          Func<bool> innerMoveNext)
{
    do yield return e.Current;
    while (innerMoveNext());
}

用法示例

var src = new [] {1, 2, 3, 4, 5, 6}; 

var c3 = src.Chunks(3);      // {{1, 2, 3}, {4, 5, 6}}; 
var c4 = src.Chunks(4);      // {{1, 2, 3, 4}, {5, 6}}; 

var sum   = c3.Select(c => c.Sum());    // {6, 15}
var count = c3.Count();                 // 2
var take2 = c3.Select(c => c.Take(2));  // {{1, 2}, {4, 5}}

说明

该代码通过嵌套两个yield基于迭代器的代码来工作。

外部迭代器必须跟踪内部（块）迭代器有效消耗了多少个元素。这是通过闭合在完成remaining用innerMoveNext()。在外部迭代器产生下一个块之前，将丢弃未使用的块元素。这是必要的，因为当内部枚举数没有（完全）消耗时（例如c3.Count()返回6），否则结果不一致。

注意： 答案已经更新，以解决@aolszowka指出的缺点。

— 3dGrabber
source

2

非常好。我的“正确”解决方案比这复杂得多。这是恕我直言的第一答案。

— CaseyB 2014年

当调用ToArray（）时，这会遇到意外的（从API的角度来看）行为，它也不是线程安全的。

— aolszowka

@aolszowka：您能详细说明一下吗？

— 3dGrabber

@ 3dGrabber也许这就是我重构代码的方式（对不起，到这里过去太久了，基本上不是我在sourceEnumerator中传递的扩展方法）。我使用的测试用例达到了这种效果：int [] arrayToSort = new int [] {9，7，2，6，3，4，8，5，5，1，10，11，12，13}; var source = Chunkify <int>（arrayToSort，3）.ToArray（）; 在Source中得到的结果表明有13个块（元素数）。这对我来说很有意义，因为除非您查询内部枚举，否则枚举数不会增加。

— aolszowka

1

@aolszowka：非常有效的观点。我添加了警告和用法部分。该代码假定您迭代内部可枚举。通过您的解决方案，您将失去懒惰。我认为使用自定义的缓存IEnumerator可以同时兼顾两者。如果找到解决方案，请在此处发布...

— 3dGrabber

18

完全懒惰，不计或复制：

public static class EnumerableExtensions
{

  public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> source, int len)
  {
     if (len == 0)
        throw new ArgumentNullException();

     var enumer = source.GetEnumerator();
     while (enumer.MoveNext())
     {
        yield return Take(enumer.Current, enumer, len);
     }
  }

  private static IEnumerable<T> Take<T>(T head, IEnumerator<T> tail, int len)
  {
     while (true)
     {
        yield return head;
        if (--len == 0)
           break;
        if (tail.MoveNext())
           head = tail.Current;
        else
           break;
     }
  }
}

— xtofs
source

此解决方案非常精致，对不起我不能多次对此答案表示赞同。

— 2015年

3

我认为这不会失败。但是它肯定会有一些奇怪的行为。如果你有100个项目，而你分成10批，你列举所有批次没有列举这些批次的任何物品，你最终以100个批次1

— CaseyB

1

正如@CaseyB所提到的，这也遭受了3dGrabber失败的困扰，此处此处是stackoverflow.com/a/20953521/1037948，但很快！

— drzaus

1

这是一个很好的解决方案。确实履行了诺言。

— Rod Hartzell

迄今为止最优雅，最关键的解决方案。唯一的事情是，您应该添加一个负数检查，并将ArgumentNullException替换为ArgumentException

— Romain Vergnory

13

我认为以下建议是最快的。我牺牲了源Enumerable的惰性，使其无法使用Array.Copy并提前知道每个子列表的长度。

public static IEnumerable<T[]> Chunk<T>(this IEnumerable<T> items, int size)
{
    T[] array = items as T[] ?? items.ToArray();
    for (int i = 0; i < array.Length; i+=size)
    {
        T[] chunk = new T[Math.Min(size, array.Length - i)];
        Array.Copy(array, i, chunk, 0, chunk.Length);
        yield return chunk;
    }
}

— 马克·安德烈·贝特朗
source

不仅速度最快，它还可以正确处理结果上的更多可枚举操作，即items.Chunk（5）.Reverse（）。SelectMany（x => x）

— 也是

9

我们可以改进@JaredPar的解决方案以进行真正的延迟评估。我们使用一种GroupAdjacentBy产生具有相同键的连续元素组的方法：

sequence
.Select((x, i) => new { Value = x, Index = i })
.GroupAdjacentBy(x=>x.Index/3)
.Select(g=>g.Select(x=>x.Value))

因为组是一对一产生的，所以此解决方案可以有效地处理长序列或无限序列。

— 恐慌上校
source

8

几年前，我写了一个Clump扩展方法。效果很好，是这里最快的实现。：P

/// <summary>
/// Clumps items into same size lots.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="source">The source list of items.</param>
/// <param name="size">The maximum size of the clumps to make.</param>
/// <returns>A list of list of items, where each list of items is no bigger than the size given.</returns>
public static IEnumerable<IEnumerable<T>> Clump<T>(this IEnumerable<T> source, int size)
{
    if (source == null)
        throw new ArgumentNullException("source");
    if (size < 1)
        throw new ArgumentOutOfRangeException("size", "size must be greater than 0");

    return ClumpIterator<T>(source, size);
}

private static IEnumerable<IEnumerable<T>> ClumpIterator<T>(IEnumerable<T> source, int size)
{
    Debug.Assert(source != null, "source is null.");

    T[] items = new T[size];
    int count = 0;
    foreach (var item in source)
    {
        items[count] = item;
        count++;

        if (count == size)
        {
            yield return items;
            items = new T[size];
            count = 0;
        }
    }
    if (count > 0)
    {
        if (count == size)
            yield return items;
        else
        {
            T[] tempItems = new T[count];
            Array.Copy(items, tempItems, count);
            yield return tempItems;
        }
    }
}

— 卡梅伦·麦克法兰
source

它应该可以工作，但是它正在缓冲100％的数据块，我试图避免这种情况……但是事实证明，这确实令人毛骨悚然。

— Sam Saffron

@SamSaffron是的。特别是如果您将诸如plinq之类的东西混入混合中，这正是我的实现最初的目的。

— 卡梅隆·麦克法兰

扩大我的答案，让我知道您的想法

— 萨姆·萨弗隆

@CameronMacFarland-您能解释为什么第二次检查count == size是必要的吗？谢谢。

— dugas 2013年

8

System.InteractiveBuffer()为此提供了此功能。一些快速测试表明，性能类似于Sam的解决方案。

— 达尔比克
source

1

您知道缓冲语义吗？例如：如果您有一个枚举器吐出300k大的字符串，并尝试将其拆分为10,000个大小的块，您会耗尽内存吗？

— Sam Saffron

Buffer()返回IEnumerable<IList<T>>，是的，您可能在那里遇到了问题-它不会像您的问题那样流。

— dahlbyk'5

7

这是我几个月前写的一个列表拆分例程：

public static List<List<T>> Chunk<T>(
    List<T> theList,
    int chunkSize
)
{
    List<List<T>> result = theList
        .Select((x, i) => new {
            data = x,
            indexgroup = i / chunkSize
        })
        .GroupBy(x => x.indexgroup, x => x.data)
        .Select(g => new List<T>(g))
        .ToList();

    return result;
}

— 艾米·B
source

6

我发现这个小片段相当不错。

public static IEnumerable<List<T>> Chunked<T>(this List<T> source, int chunkSize)
{
    var offset = 0;

    while (offset < source.Count)
    {
        yield return source.GetRange(offset, Math.Min(source.Count - offset, chunkSize));
        offset += chunkSize;
    }
}

— 厄兰多
source

5

这个如何？

var input = new List<string> { "a", "g", "e", "w", "p", "s", "q", "f", "x", "y", "i", "m", "c" };
var k = 3

var res = Enumerable.Range(0, (input.Count - 1) / k + 1)
                    .Select(i => input.GetRange(i * k, Math.Min(k, input.Count - i * k)))
                    .ToList();

据我所知，GetRange（）就采取的项目数而言是线性的。因此，这应该表现良好。

— 罗曼·佩卡
source

5

这是一个古老的问题，但这是我最终要解决的问题。它只枚举一次可枚举，但会为每个分区创建列表。当ToArray()某些实现被调用时，它不会遭受意外行为的困扰：

    public static IEnumerable<IEnumerable<T>> Partition<T>(IEnumerable<T> source, int chunkSize)
    {
        if (source == null)
        {
            throw new ArgumentNullException("source");
        }

        if (chunkSize < 1)
        {
            throw new ArgumentException("Invalid chunkSize: " + chunkSize);
        }

        using (IEnumerator<T> sourceEnumerator = source.GetEnumerator())
        {
            IList<T> currentChunk = new List<T>();
            while (sourceEnumerator.MoveNext())
            {
                currentChunk.Add(sourceEnumerator.Current);
                if (currentChunk.Count == chunkSize)
                {
                    yield return currentChunk;
                    currentChunk = new List<T>();
                }
            }

            if (currentChunk.Any())
            {
                yield return currentChunk;
            }
        }
    }

— 奥尔索夫卡
source

将其转换为扩展方法将是很好的：public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> source, int chunkSize)

— krizzzn 2014年

+1为您的答案。但是我建议两件事：1.使用foreach代替while和using块。2.在List的构造函数中传递chunkSize，以便列表知道其最大预期大小。

— Usman Zafar 2014年

4

我们发现David B的解决方案效果最好。但是我们将其调整为更通用的解决方案：

list.GroupBy(item => item.SomeProperty) 
   .Select(group => new List<T>(group)) 
   .ToArray();

— 麦克杰森
source

3

这很好，但与原始申请者所要求的完全不同。

— Amy B

4

以下是我能想到的最紧凑的解决方案，即O（n）。

public static IEnumerable<T[]> Chunk<T>(IEnumerable<T> source, int chunksize)
{
    var list = source as IList<T> ?? source.ToList();
    for (int start = 0; start < list.Count; start += chunksize)
    {
        T[] chunk = new T[Math.Min(chunksize, list.Count - start)];
        for (int i = 0; i < chunk.Length; i++)
            chunk[i] = list[start + i];

        yield return chunk;
    }
}

— 马克·安德烈·贝特朗
source

4

旧代码，但这是我一直在使用的代码：

    public static IEnumerable<List<T>> InSetsOf<T>(this IEnumerable<T> source, int max)
    {
        var toReturn = new List<T>(max);
        foreach (var item in source)
        {
            toReturn.Add(item);
            if (toReturn.Count == max)
            {
                yield return toReturn;
                toReturn = new List<T>(max);
            }
        }
        if (toReturn.Any())
        {
            yield return toReturn;
        }
    }

— 罗伯特·麦基
source

发布后，我意识到这与6年前发布的casperOne几乎完全相同，只是使用了.Any（）而不是.Count（），因为我不需要整个计数，只需要知道是否存在。

— 罗伯特·麦基

3

如果列表的类型为system.collections.generic，则可以使用“ CopyTo”方法将数组的元素复制到其他子数组。您指定起始元素和要复制的元素数。

您还可以对原始列表进行3个克隆，并在每个列表上使用“ RemoveRange”将列表缩小到所需的大小。

或只是创建一个辅助方法来为您做。

— o
source

2

这是一个旧的解决方案，但我有不同的方法。我Skip用来移动到所需的偏移量并Take提取所需数量的元素：

public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, 
                                                   int chunkSize)
{
    if (chunkSize <= 0)
        throw new ArgumentOutOfRangeException($"{nameof(chunkSize)} should be > 0");

    var nbChunks = (int)Math.Ceiling((double)source.Count()/chunkSize);

    return Enumerable.Range(0, nbChunks)
                     .Select(chunkNb => source.Skip(chunkNb*chunkSize)
                     .Take(chunkSize));
}

— 贝特朗
source

1

与我使用的方法非常相似，但是我建议不要使用IEnumerable。例如，如果源是LINQ查询的结果，则“跳过/获取”将触发查询的nbChunk枚举。可能会变得昂贵。最好将IList或ICollection用作源的类型。那完全避免了问题。

— 戴维森

2

对于任何对打包/维护的解决方案感兴趣的人，MoreLINQ库提供了Batch与您所请求的行为相匹配的扩展方法：

IEnumerable<char> source = "Example string";
IEnumerable<IEnumerable<char>> chunksOfThreeChars = source.Batch(3);

该Batch实现与Cameron MacFarland的答案类似，并增加了一个过载，用于在返回之前对块/批处理进行转换，并且性能很好。

— 凯文尼德
source

这应该是公认的答案。代替重新发明轮子，应该使用

— morelinq

1

使用模块化分区：

public IEnumerable<IEnumerable<string>> Split(IEnumerable<string> input, int chunkSize)
{
    var chunks = (int)Math.Ceiling((double)input.Count() / (double)chunkSize);
    return Enumerable.Range(0, chunks).Select(id => input.Where(s => s.GetHashCode() % chunks == id));
}

— 亚诺什·G。
source

1

只需投入我的两分钱。如果要“存储桶”列表（从左到右可视化），可以执行以下操作：

 public static List<List<T>> Buckets<T>(this List<T> source, int numberOfBuckets)
    {
        List<List<T>> result = new List<List<T>>();
        for (int i = 0; i < numberOfBuckets; i++)
        {
            result.Add(new List<T>());
        }

        int count = 0;
        while (count < source.Count())
        {
            var mod = count % numberOfBuckets;
            result[mod].Add(source[count]);
            count++;
        }
        return result;
    }

— 马蒂兰兹
source

1

另一种方法是使用Rx Buffer运算符

//using System.Linq;
//using System.Reactive.Linq;
//using System.Reactive.Threading.Tasks;

var observableBatches = anAnumerable.ToObservable().Buffer(size);

var batches = aList.ToObservable().Buffer(size).ToList().ToTask().GetAwaiter().GetResult();

— 弗雷克
source

恕我直言，最波特的答案。

— 斯坦尼斯拉夫·贝尔科夫

1

public static List<List<T>> GetSplitItemsList<T>(List<T> originalItemsList, short number)
    {
        var listGroup = new List<List<T>>();
        int j = number;
        for (int i = 0; i < originalItemsList.Count; i += number)
        {
            var cList = originalItemsList.Take(j).Skip(i).ToList();
            j += number;
            listGroup.Add(cList);
        }
        return listGroup;
    }

— 朱
source

0

我得到了主要答案，并使其成为一个IOC容器来确定在何处拆分。（对于在寻找答案时阅读这篇文章的人来说，谁真的只希望将其中的3个分开？）

这种方法允许根据需要拆分任何类型的项目。

public static List<List<T>> SplitOn<T>(List<T> main, Func<T, bool> splitOn)
{
    int groupIndex = 0;

    return main.Select( item => new 
                             { 
                               Group = (splitOn.Invoke(item) ? ++groupIndex : groupIndex), 
                               Value = item 
                             })
                .GroupBy( it2 => it2.Group)
                .Select(x => x.Select(v => v.Value).ToList())
                .ToList();
}

因此，对于OP，代码将是

var it = new List<string>()
                       { "a", "g", "e", "w", "p", "s", "q", "f", "x", "y", "i", "m", "c" };

int index = 0; 
var result = SplitOn(it, (itm) => (index++ % 3) == 0 );

— 欧米茄
source

0

像Sam Saffron的方法那样表现出色。

public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size)
{
    if (source == null) throw new ArgumentNullException(nameof(source));
    if (size <= 0) throw new ArgumentOutOfRangeException(nameof(size), "Size must be greater than zero.");

    return BatchImpl(source, size).TakeWhile(x => x.Any());
}

static IEnumerable<IEnumerable<T>> BatchImpl<T>(this IEnumerable<T> source, int size)
{
    var values = new List<T>();
    var group = 1;
    var disposed = false;
    var e = source.GetEnumerator();

    try
    {
        while (!disposed)
        {
            yield return GetBatch(e, values, group, size, () => { e.Dispose(); disposed = true; });
            group++;
        }
    }
    finally
    {
        if (!disposed)
            e.Dispose();
    }
}

static IEnumerable<T> GetBatch<T>(IEnumerator<T> e, List<T> values, int group, int size, Action dispose)
{
    var min = (group - 1) * size + 1;
    var max = group * size;
    var hasValue = false;

    while (values.Count < min && e.MoveNext())
    {
        values.Add(e.Current);
    }

    for (var i = min; i <= max; i++)
    {
        if (i <= values.Count)
        {
            hasValue = true;
        }
        else if (hasValue = e.MoveNext())
        {
            values.Add(e.Current);
        }
        else
        {
            dispose();
        }

        if (hasValue)
            yield return values[i - 1];
        else
            yield break;
    }
}

}

— Leandromoh
source

0

可以与无限生成器一起使用：

a.Zip(a.Skip(1), (x, y) => Enumerable.Repeat(x, 1).Concat(Enumerable.Repeat(y, 1)))
 .Zip(a.Skip(2), (xy, z) => xy.Concat(Enumerable.Repeat(z, 1)))
 .Where((x, i) => i % 3 == 0)

演示代码：https : //ideone.com/GKmL7M

using System;
using System.Collections.Generic;
using System.Linq;

public class Test
{
  private static void DoIt(IEnumerable<int> a)
  {
    Console.WriteLine(String.Join(" ", a));

    foreach (var x in a.Zip(a.Skip(1), (x, y) => Enumerable.Repeat(x, 1).Concat(Enumerable.Repeat(y, 1))).Zip(a.Skip(2), (xy, z) => xy.Concat(Enumerable.Repeat(z, 1))).Where((x, i) => i % 3 == 0))
      Console.WriteLine(String.Join(" ", x));

    Console.WriteLine();
  }

  public static void Main()
  {
    DoIt(new int[] {1});
    DoIt(new int[] {1, 2});
    DoIt(new int[] {1, 2, 3});
    DoIt(new int[] {1, 2, 3, 4});
    DoIt(new int[] {1, 2, 3, 4, 5});
    DoIt(new int[] {1, 2, 3, 4, 5, 6});
  }
}

1

1 2

1 2 3
1 2 3

1 2 3 4
1 2 3

1 2 3 4 5
1 2 3

1 2 3 4 5 6
1 2 3
4 5 6

但是实际上我更愿意编写没有linq的相应方法。

— Qwertiy
source

0

看一下这个！我有一个带有序列计数器和日期的元素列表。每次序列重新启动时，我都想创建一个新列表。

例如消息列表。

 List<dynamic> messages = new List<dynamic>
        {
            new { FcntUp = 101, CommTimestamp = "2019-01-01 00:00:01" },
            new { FcntUp = 102, CommTimestamp = "2019-01-01 00:00:02" },
            new { FcntUp = 103, CommTimestamp = "2019-01-01 00:00:03" },

            //restart of sequence
            new { FcntUp = 1, CommTimestamp = "2019-01-01 00:00:04" },
            new { FcntUp = 2, CommTimestamp = "2019-01-01 00:00:05" },
            new { FcntUp = 3, CommTimestamp = "2019-01-01 00:00:06" },

            //restart of sequence
            new { FcntUp = 1, CommTimestamp = "2019-01-01 00:00:07" },
            new { FcntUp = 2, CommTimestamp = "2019-01-01 00:00:08" },
            new { FcntUp = 3, CommTimestamp = "2019-01-01 00:00:09" }
        };

我想在计数器重新启动时将列表拆分为单独的列表。这是代码：

var arraylist = new List<List<dynamic>>();

        List<dynamic> messages = new List<dynamic>
        {
            new { FcntUp = 101, CommTimestamp = "2019-01-01 00:00:01" },
            new { FcntUp = 102, CommTimestamp = "2019-01-01 00:00:02" },
            new { FcntUp = 103, CommTimestamp = "2019-01-01 00:00:03" },

            //restart of sequence
            new { FcntUp = 1, CommTimestamp = "2019-01-01 00:00:04" },
            new { FcntUp = 2, CommTimestamp = "2019-01-01 00:00:05" },
            new { FcntUp = 3, CommTimestamp = "2019-01-01 00:00:06" },

            //restart of sequence
            new { FcntUp = 1, CommTimestamp = "2019-01-01 00:00:07" },
            new { FcntUp = 2, CommTimestamp = "2019-01-01 00:00:08" },
            new { FcntUp = 3, CommTimestamp = "2019-01-01 00:00:09" }
        };

        //group by FcntUp and CommTimestamp
        var query = messages.GroupBy(x => new { x.FcntUp, x.CommTimestamp });

        //declare the current item
        dynamic currentItem = null;

        //declare the list of ranges
        List<dynamic> range = null;

        //loop through the sorted list
        foreach (var item in query)
        {
            //check if start of new range
            if (currentItem == null || item.Key.FcntUp < currentItem.Key.FcntUp)
            {
                //create a new list if the FcntUp starts on a new range
                range = new List<dynamic>();

                //add the list to the parent list
                arraylist.Add(range);
            }

            //add the item to the sublist
            range.Add(item);

            //set the current item
            currentItem = item;
        }

— 克莱斯·菲利普·斯泰格
source

-1

要插入我的两分钱...

通过将列表类型用于要分块的源，我发现了另一个非常紧凑的解决方案：

public static IEnumerable<IEnumerable<TSource>> Chunk<TSource>(this IEnumerable<TSource> source, int chunkSize)
{
    // copy the source into a list
    var chunkList = source.ToList();

    // return chunks of 'chunkSize' items
    while (chunkList.Count > chunkSize)
    {
        yield return chunkList.GetRange(0, chunkSize);
        chunkList.RemoveRange(0, chunkSize);
    }

    // return the rest
    yield return chunkList;
}

— 帕特里克
source