将列表拆分为N个大小较小的列表

209

我试图将一个列表分成一系列较小的列表。

我的问题：我的列表拆分功能无法将它们拆分为正确大小的列表。它应该将它们拆分为30号列表，而是将其拆分为114号列表？

如何使函数将列表拆分为X个大小不超过30的列表？

public static List<List<float[]>> splitList(List <float[]> locations, int nSize=30) 
{       
    List<List<float[]>> list = new List<List<float[]>>();

    for (int i=(int)(Math.Ceiling((decimal)(locations.Count/nSize))); i>=0; i--) {
        List <float[]> subLocat = new List <float[]>(locations); 

        if (subLocat.Count >= ((i*nSize)+nSize))
            subLocat.RemoveRange(i*nSize, nSize);
        else subLocat.RemoveRange(i*nSize, subLocat.Count-(i*nSize));

        Debug.Log ("Index: "+i.ToString()+", Size: "+subLocat.Count.ToString());
        list.Add (subLocat);
    }

    return list;
}

如果我在大小为144的列表上使用该函数，则输出为：

索引：4，大小：120
索引：3，大小：114
索引：2，大小：114
索引：1，大小：114
索引：0，大小：114

c# list split

— sazr
source

1

如果LINQ解决方案可以接受，那么这个问题可能会有帮助。

特别是Sam Saffron对先前问题的回答。除非这是用于学校作业，否则我只会使用他的代码并停止。

— jcolebrand

268

public static List<List<float[]>> SplitList(List<float[]> locations, int nSize=30)  
{        
    var list = new List<List<float[]>>(); 

    for (int i = 0; i < locations.Count; i += nSize) 
    { 
        list.Add(locations.GetRange(i, Math.Min(nSize, locations.Count - i))); 
    } 

    return list; 
}

通用版本：

public static IEnumerable<List<T>> SplitList<T>(List<T> locations, int nSize=30)  
{        
    for (int i = 0; i < locations.Count; i += nSize) 
    { 
        yield return locations.GetRange(i, Math.Min(nSize, locations.Count - i)); 
    }  
}

— Serj-Tm
source

因此，如果我有一个列表长度为zillion的列表，并且希望将其拆分为长度为30的较小列表，并且从每个较小的列表中我只想取（1），那么我仍然会创建30个项目的列表，而其中我丢弃了29个项目。这可以做得更聪明！

— Harald Coppoolse

这真的有效吗？它不会在第一次拆分时失败，因为您得到的范围是nSize到nSize？例如，如果nSize为3而我的数组的大小为5，则返回的第一个索引范围是GetRange(3, 3)

— Matthew Pigram 18-3-22

2

@MatthewPigram经过测试，正在运行。Math.Min取最小值，因此，如果最后一个块小于nSize（2 <3），则会创建一个包含剩余项的列表。

— Phate01

1

@HaraldCoppoolse OP并没有要求选择，仅是拆分列表

— Phate01

@MatthewPigram第一次迭代-GetRange（0,3），第二次迭代-GetRange（3,2）

— Serj-Tm

381

我建议使用此扩展方法按指定的块大小将源列表分块到子列表：

/// <summary>
/// Helper methods for the lists.
/// </summary>
public static class ListExtensions
{
    public static List<List<T>> ChunkBy<T>(this List<T> source, int chunkSize) 
    {
        return source
            .Select((x, i) => new { Index = i, Value = x })
            .GroupBy(x => x.Index / chunkSize)
            .Select(x => x.Select(v => v.Value).ToList())
            .ToList();
    }
}

例如，如果将18个项目的列表按每个块5个项目进行分块，则将为您提供4个子列表的列表，其中包含以下项目：5-5-5-3。

— 德米特里·帕夫洛夫（Dmitry Pavlov）
source

25

在生产环境中使用此功能之前，请确保您了解内存和性能的运行时含义。仅仅因为LINQ可以简洁，并不意味着它是一个好主意。

— 尼克

4

绝对可以，@ Nick我通常建议您在进行任何操作之前先进行思考。使用LINQ进行块处理不应经常重复执行数千次。通常，您需要分块列表以逐批和/或并行处理项目。

— 德米特里·巴甫洛夫

6

我认为内存和性能不是这里的大问题。我碰巧有一个将超过200,000条记录的列表拆分成较小的列表（每条约3000条）的要求，这使我进入了该线程，并且我测试了这两种方法，发现运行时间几乎相同。之后，我测试了将该列表拆分为每个包含3条记录的列表，但是性能还可以。我确实认为Serj-Tm的解决方案更直接，并且具有更好的可维护性。

— 沉默寄居者

2

请注意，最好不要使用ToList()s，而让懒惰的求值做魔术。

— Yair Halberstadt

3

@DmitryPavlov 一直以来，我一直不知道能够在select语句中像这样投影索引！我一直以为这是一个新功能，直到我注意到您在2014年发布了此功能，这真让我感到惊讶！感谢您分享。另外，让此扩展方法可用于IEnumerable并返回IEnumerable会更好吗？

— 艾登（Aydin）

37

怎么样：

while(locations.Any())
{    
    list.Add(locations.Take(nSize).ToList());
    locations= locations.Skip(nSize).ToList();
}

— 拉法尔
source

这会消耗很多内存吗？每次每当location.Skip.ToList发生时，我都想知道是否分配了更多的内存，并且新列表引用了未跳过的项目。

— Zasz 2014年

2

是的，在每个循环上都会创建新列表。是的，它消耗内存。但是，如果您遇到内存问题，则不是要优化的地方，因为可以在下一个循环中收集该列表的实例。您可以通过跳过内存性能来换取性能，ToList但是我不会费心尝试对其进行优化-它是如此琐碎且不太可能成为瓶颈。此实现的主要收益在于它的琐碎性，易于理解。如果您愿意，可以使用接受的答案，它不会创建这些列表，但是会稍微复杂一些。

— 拉法尔2014年

2

.Skip(n)n每次调用时都会对元素进行迭代，尽管这可能没事，但是考虑性能关键的代码很重要。stackoverflow.com/questions/20002975/...

— Chakrava

@Chakrava可以肯定的是，我的解决方案不会在性能关键的代码中使用，但是根据我的经验，您首先要编写有效的代码，然后确定什么是性能关键的，并且很少对50个对象执行对象操作。应该逐案评估。

— 拉法尔

@Rafal我同意，我.Skip()在公司的代码库中发现了许多s，尽管它们可能不是“最佳”的，但它们工作得很好。无论如何，诸如DB操作之类的事情要花费更长的时间。但是我认为必须注意的是，必须.Skip()按其方式“触摸”每个元素<n，而不是直接跳转到第n个元素（就像您可能期望的那样）。如果迭代器接触元素有副作用，.Skip()则可能是难以发现错误的原因。

— Chakrava

11

Serj-Tm解决方案很好，这也是作为列表扩展方法的通用版本（将其放入静态类中）：

public static List<List<T>> Split<T>(this List<T> items, int sliceSize = 30)
{
    List<List<T>> list = new List<List<T>>();
    for (int i = 0; i < items.Count; i += sliceSize)
        list.Add(items.GetRange(i, Math.Min(sliceSize, items.Count - i)));
    return list;
}

— 当量
source

10

我发现公认的答案（Serj-Tm）最可靠，但我想建议一个通用版本。

public static List<List<T>> splitList<T>(List<T> locations, int nSize = 30)
{
    var list = new List<List<T>>();

    for (int i = 0; i < locations.Count; i += nSize)
    {
        list.Add(locations.GetRange(i, Math.Min(nSize, locations.Count - i)));
    }

    return list;
}

— 利纳斯
source

8

图书馆MoreLinq的方法称为 Batch

List<int> ids = new List<int>() { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 }; // 10 elements
int counter = 1;
foreach(var batch in ids.Batch(2))
{
    foreach(var eachId in batch)
    {
        Console.WriteLine("Batch: {0}, Id: {1}", counter, eachId);
    }
    counter++;
}

结果是

Batch: 1, Id: 1
Batch: 1, Id: 2
Batch: 2, Id: 3
Batch: 2, Id: 4
Batch: 3, Id: 5
Batch: 3, Id: 6
Batch: 4, Id: 7
Batch: 4, Id: 8
Batch: 5, Id: 9
Batch: 5, Id: 0

ids 分为2个元素的5个大块。

— 西德隆
source

这需要被接受。或至少在此页面上高很多。

— Zar Shardan

7

我有一个通用方法，可以采用任何类型，包括浮点数，并且已经过单元测试，希望对您有所帮助：

    /// <summary>
    /// Breaks the list into groups with each group containing no more than the specified group size
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="values">The values.</param>
    /// <param name="groupSize">Size of the group.</param>
    /// <returns></returns>
    public static List<List<T>> SplitList<T>(IEnumerable<T> values, int groupSize, int? maxCount = null)
    {
        List<List<T>> result = new List<List<T>>();
        // Quick and special scenario
        if (values.Count() <= groupSize)
        {
            result.Add(values.ToList());
        }
        else
        {
            List<T> valueList = values.ToList();
            int startIndex = 0;
            int count = valueList.Count;
            int elementCount = 0;

            while (startIndex < count && (!maxCount.HasValue || (maxCount.HasValue && startIndex < maxCount)))
            {
                elementCount = (startIndex + groupSize > count) ? count - startIndex : groupSize;
                result.Add(valueList.GetRange(startIndex, elementCount));
                startIndex += elementCount;
            }
        }


        return result;
    }

— 林天真
source

谢谢。想知道是否可以使用maxCount参数定义更新注释？安全网？

— Andrew Jens

2

小心枚举的多个枚举。values.Count()将导致完整的枚举，然后再进行values.ToList()另一个。这样做更安全，values = values.ToList()已经实现。

— mhand

7

尽管上面的许多答案都能胜任，但它们都以永无止境的顺序（或很长的顺序）严重失败。以下是一个完全在线的实现，可以保证最佳的时间和内存复杂性。我们仅将源可枚举迭代一次，然后使用yield return进行惰性评估。消费者可以在每次迭代时丢弃该列表，从而使内存占用量batchSize与元素数量相等的内存占用量相等。

public static IEnumerable<List<T>> BatchBy<T>(this IEnumerable<T> enumerable, int batchSize)
{
    using (var enumerator = enumerable.GetEnumerator())
    {
        List<T> list = null;
        while (enumerator.MoveNext())
        {
            if (list == null)
            {
                list = new List<T> {enumerator.Current};
            }
            else if (list.Count < batchSize)
            {
                list.Add(enumerator.Current);
            }
            else
            {
                yield return list;
                list = new List<T> {enumerator.Current};
            }
        }

        if (list?.Count > 0)
        {
            yield return list;
        }
    }
}

编辑：刚意识到OP要求将a List<T>分成更小的List<T>，所以我对无限可枚举的评论不适用于OP，但可能会帮助最终的人。这些评论是对其他发布的解决方案的回应，这些解决方案确实IEnumerable<T>用作其功能的输入，但多次枚举了可枚举的来源。

— 手
source

我认为该IEnumerable<IEnumerable<T>>版本更好，因为它不需要太多的List构建。

— NetMage

@NetMage-一个问题IEnumerable<IEnumerable<T>>是实现可能依赖于消费者完全枚举每个内部可枚举的产量。我敢肯定，某种解决方案可以用某种方式来避免该问题，但是我认为所生成的代码很快就会变得复杂。另外，由于它很懒，我们一次只生成一个列表，并且由于我们预先知道大小，因此每个列表只发生一次内存分配。

— mhand

您是对的-我的实现使用一种新型的枚举器（位置枚举器），该枚举器使用标准枚举器跟踪当前位置，让我们移至新位置。

— NetMage

6

最后添加了对mhand的非常有用的评论

原始答案

尽管大多数解决方案都可以使用，但我认为它们并不是非常有效。假设您只需要前几个块中的前几个项目。然后，您将不需要遍历序列中的所有（成千上万个）项目。

以下内容将最多列举两次：一次用于“汇整”，一次用于“跳过”。它不会枚举超过您将使用的元素：

public static IEnumerable<IEnumerable<TSource>> ChunkBy<TSource>
    (this IEnumerable<TSource> source, int chunkSize)
{
    while (source.Any())                     // while there are elements left
    {   // still something to chunk:
        yield return source.Take(chunkSize); // return a chunk of chunkSize
        source = source.Skip(chunkSize);     // skip the returned chunk
    }
}

这将枚举序列多少次？

假设您将来源划分为chunkSize。您只枚举前N个块。从每个枚举的块中，您只会枚举前M个元素。

While(source.Any())
{
     ...
}

Any将获得枚举数，执行1 MoveNext（）并在处理枚举数后返回返回值。这将完成N次

yield return source.Take(chunkSize);

根据参考资料，这将执行以下操作：

public static IEnumerable<TSource> Take<TSource>(this IEnumerable<TSource> source, int count)
{
    return TakeIterator<TSource>(source, count);
}

static IEnumerable<TSource> TakeIterator<TSource>(IEnumerable<TSource> source, int count)
{
    foreach (TSource element in source)
    {
        yield return element;
        if (--count == 0) break;
    }
}

直到您开始对提取的块进行枚举之前，这并不会起作用。如果获取多个块，但决定不对第一个块进行枚举，则不会执行foreach，因为调试器将向您显示。

如果决定采用第一个块的前M个元素，则收益率返回将精确执行M次。这表示：

获取枚举器
调用MoveNext（）和当前M次。
配置枚举器

返回第一个块后，我们跳过此第一个块：

source = source.Skip(chunkSize);

再次：我们将参考参考资料来查找skipiterator

static IEnumerable<TSource> SkipIterator<TSource>(IEnumerable<TSource> source, int count)
{
    using (IEnumerator<TSource> e = source.GetEnumerator()) 
    {
        while (count > 0 && e.MoveNext()) count--;
        if (count <= 0) 
        {
            while (e.MoveNext()) yield return e.Current;
        }
    }
}

如您所见，块中的每个元素都SkipIterator调用MoveNext()一次。它不打电话Current。

因此，根据块，我们看到完成了以下操作：

Any（）：GetEnumerator；1 MoveNext（）; 设置枚举器；
采取（）：
- 如果未枚举块的内容，则为空。
- 如果对内容进行枚举，则每个枚举项分别为：GetEnumerator（），一个MoveNext和一个Current，则处理枚举；
- Skip（）：对于每个枚举的块（不包含块的内容）：GetEnumerator（），MoveNext（）chunkSize次，没有Current！设置枚举器

如果查看枚举数会发生什么，您会看到对MoveNext（）的调用很多，只有Current实际决定访问的TSource项的调用。

如果采用大小为块大小的N个块，则调用MoveNext（）

N次Any（）
只要您不枚举区块，现在还没有时间进行Take
Skip（）的N倍chunkSize

如果您决定只枚举每个获取的块的前M个元素，则需要为每个枚举的块调用M次MoveNext。

总数

MoveNext calls: N + N*M + N*chunkSize
Current calls: N*M; (only the items you really access)

因此，如果您决定枚举所有块的所有元素：

MoveNext: numberOfChunks + all elements + all elements = about twice the sequence
Current: every item is accessed exactly once

MoveNext是否需要大量工作，取决于源序列的类型。对于列表和数组，这是一个简单的索引增量，可能超出范围检查。

但是，如果IEnumerable是数据库查询的结果，请确保在计算机上确实实现了数据，否则将多次提取数据。DbContext和Dapper将在将其访问之前将数据正确传输到本地进程。如果多次枚举相同的序列，则不会多次获取。Dapper返回一个对象，该对象是一个List，DbContext记住该数据已被获取。

在开始在块中划分项目之前，是否明智地调用AsEnumerable（）或ToLists（）取决于您的存储库

— 哈拉德·科普浦斯
source

这会不会枚举两次每批次？所以我们真的在枚举来源2*chunkSize时间吗？这是致命的，取决于可枚举的来源（也许是数据库支持的来源，或其他未存储的来源）。想象一下此可枚举作为输入Enumerable.Range(0, 10000).Select(i => DateTime.UtcNow)-每次枚举可枚举都将得到不同的时间，因为它没有被记住

— mhand 18'Mar

考虑：Enumerable.Range(0, 10).Select(i => DateTime.UtcNow)。通过调用，Any您将每次都重新计算当前时间。对于DateTime.UtcNow来说还不错，但是考虑一个由数据库连接/ SQL游标或类似对象支持的枚举。我见过成千上万DB调用的发行，因为开发商没有明白的“可枚举的多个枚举”的潜在影响的情况下- ReSharper的提供了这样的暗示，以及

— M手

4

public static IEnumerable<IEnumerable<T>> SplitIntoSets<T>
    (this IEnumerable<T> source, int itemsPerSet) 
{
    var sourceList = source as List<T> ?? source.ToList();
    for (var index = 0; index < sourceList.Count; index += itemsPerSet)
    {
        yield return sourceList.Skip(index).Take(itemsPerSet);
    }
}

— 斯科特·汉嫩
source

3

public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> items, int maxItems)
{
    return items.Select((item, index) => new { item, index })
                .GroupBy(x => x.index / maxItems)
                .Select(g => g.Select(x => x.item));
}

— 代号
source

2

这个怎么样？这个想法是只使用一个循环。而且，谁知道，也许您在代码中仅使用IList实现，而又不想转换为List。

private IEnumerable<IList<T>> SplitList<T>(IList<T> list, int totalChunks)
{
    IList<T> auxList = new List<T>();
    int totalItems = list.Count();

    if (totalChunks <= 0)
    {
        yield return auxList;
    }
    else 
    {
        for (int i = 0; i < totalItems; i++)
        {               
            auxList.Add(list[i]);           

            if ((i + 1) % totalChunks == 0)
            {
                yield return auxList;
                auxList = new List<T>();                
            }

            else if (i == totalItems - 1)
            {
                yield return auxList;
            }
        }
    }   
}

— 迭戈·罗玛（Diego Romar）
source

1

多一个

public static IList<IList<T>> SplitList<T>(this IList<T> list, int chunkSize)
{
    var chunks = new List<IList<T>>();
    List<T> chunk = null;
    for (var i = 0; i < list.Count; i++)
    {
        if (i % chunkSize == 0)
        {
            chunk = new List<T>(chunkSize);
            chunks.Add(chunk);
        }
        chunk.Add(list[i]);
    }
    return chunks;
}

— 加布里埃尔·梅德罗斯（Gabriel Medeiros）
source

1

public static List<List<T>> ChunkBy<T>(this List<T> source, int chunkSize)
    {           
        var result = new List<List<T>>();
        for (int i = 0; i < source.Count; i += chunkSize)
        {
            var rows = new List<T>();
            for (int j = i; j < i + chunkSize; j++)
            {
                if (j >= source.Count) break;
                rows.Add(source[j]);
            }
            result.Add(rows);
        }
        return result;
    }

— Baskovli3
source

0

List<int> list =new List<int>(){1,2,3,4,5,6,7,8,9,10,12};
Dictionary<int,List<int>> dic = new Dictionary <int,List<int>> ();
int batchcount = list.Count/2; //To List into two 2 parts if you want three give three
List<int> lst = new List<int>();
for (int i=0;i<list.Count; i++)
{
lstdocs.Add(list[i]);
if (i % batchCount == 0 && i!=0)
{
Dic.Add(threadId, lstdocs);
lst = new List<int>();**strong text**
threadId++;
}
}
Dic.Add(threadId, lstdocs);

— ANNAPUREDDY PRAVEEN KUMAR REDD
source

2

最好是解释您的答案，而不是只提供代码片段

— Kevin

0

我也遇到了同样的需求，并且我使用了Linq的Skip（）和Take（）方法的组合。我将到目前为止的迭代次数乘以我得到的数字，然后得出要跳过的项目数，然后得出下一组。

        var categories = Properties.Settings.Default.MovementStatsCategories;
        var items = summariesWithinYear
            .Select(s =>  s.sku).Distinct().ToList();

        //need to run by chunks of 10,000
        var count = items.Count;
        var counter = 0;
        var numToTake = 10000;

        while (count > 0)
        {
            var itemsChunk = items.Skip(numToTake * counter).Take(numToTake).ToList();
            counter += 1;

            MovementHistoryUtilities.RecordMovementHistoryStatsBulk(itemsChunk, categories, nLogger);

            count -= numToTake;
        }

— BeccaGirl
source

0

基于Dimitry Pavlov的回答，我将删除.ToList()。并且还要避免匿名类。相反，我喜欢使用不需要堆内存分配的结构。（A ValueTuple也会做。）

public static IEnumerable<IEnumerable<TSource>> ChunkBy<TSource>(this IEnumerable<TSource> source, int chunkSize)
{
    if (source is null)
    {
        throw new ArgumentNullException(nameof(source));
    }
    if (chunkSize <= 0)
    {
        throw new ArgumentOutOfRangeException(nameof(chunkSize), chunkSize, "The argument must be greater than zero.");
    }

    return source
        .Select((x, i) => new ChunkedValue<TSource>(x, i / chunkSize))
        .GroupBy(cv => cv.ChunkIndex)
        .Select(g => g.Select(cv => cv.Value));
} 

[StructLayout(LayoutKind.Auto)]
[DebuggerDisplay("{" + nameof(ChunkedValue<T>.ChunkIndex) + "}: {" + nameof(ChunkedValue<T>.Value) + "}")]
private struct ChunkedValue<T>
{
    public ChunkedValue(T value, int chunkIndex)
    {
        this.ChunkIndex = chunkIndex;
        this.Value = value;
    }

    public int ChunkIndex { get; }

    public T Value { get; }
}

可以像下面这样使用它，它仅对集合进行一次迭代，并且不分配任何重要的内存。

int chunkSize = 30;
foreach (var chunk in collection.ChunkBy(chunkSize))
{
    foreach (var item in chunk)
    {
        // your code for item here.
    }
}

如果实际需要一个具体的清单，那么我会这样做：

int chunkSize = 30;
var chunkList = new List<List<T>>();
foreach (var chunk in collection.ChunkBy(chunkSize))
{
    // create a list with the correct capacity to be able to contain one chunk
    // to avoid the resizing (additional memory allocation and memory copy) within the List<T>.
    var list = new List<T>(chunkSize);
    list.AddRange(chunk);
    chunkList.Add(list);
}

— 蒂尔顿·JH
source