带有异步Lambda的并行foreach


138

我想并行处理一个集合,但是在实现它时遇到了麻烦,因此希望获得一些帮助。

如果要在并行循环的lambda中调用C#中标记为async的方法,则会出现问题。例如:

var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, async item =>
{
  // some pre stuff
  var response = await GetData(item);
  bag.Add(response);
  // some post stuff
}
var count = bag.Count;

计数为0时会发生问题,因为创建的所有线程实际上都是后台线程,并且Parallel.ForEach调用不等待完成。如果删除async关键字,则该方法如下所示:

var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, item =>
{
  // some pre stuff
  var responseTask = await GetData(item);
  responseTask.Wait();
  var response = responseTask.Result;
  bag.Add(response);
  // some post stuff
}
var count = bag.Count;

它可以工作,但是它完全禁用了等待的灵巧性,我必须执行一些手动的异常处理。(为简洁起见,已删除)。

如何实现Parallel.ForEach在lambda中使用await关键字的循环?可能吗?

Parallel.ForEach方法的原型采用Action<T>as作为参数,但我希望它等待我的异步lambda。


1
我假设您打算awaitawait GetData(item)第二个代码块中删除,因为这将按原样产生编译错误。
Josh M.

Answers:


186

如果只需要简单的并行性,则可以执行以下操作:

var bag = new ConcurrentBag<object>();
var tasks = myCollection.Select(async item =>
{
  // some pre stuff
  var response = await GetData(item);
  bag.Add(response);
  // some post stuff
});
await Task.WhenAll(tasks);
var count = bag.Count;

如果您需要更复杂的内容,请查看Stephen Toub的ForEachAsync文章


46
可能需要节流机制。这将立即创建尽可能多的任务,这些任务可能会在10k网络请求等中结束。
usr

10
@usr Stephen Toub文章中的最后一个示例解决了这个问题。
svick

@svick我为最后一个示例感到困惑。在我看来,它只是分批处理任务以向我创建更多任务,但它们都是从头开始的。
路加·普普利特

2
@LukePuplett它创建dop任务,然后每个任务依次处理输入集合的某些子集。
svick

4
@Afshin_Zavvar:如果您调用时Task.Run没有await得到结果,那只是在线程池上丢下了“一劳永逸”的工作。那几乎总是一个错误。
斯蒂芬·克莱里

74

您可以使用AsyncEnumerator NuGet Package中ParallelForEachAsync扩展方法:

using Dasync.Collections;

var bag = new ConcurrentBag<object>();
await myCollection.ParallelForEachAsync(async item =>
{
  // some pre stuff
  var response = await GetData(item);
  bag.Add(response);
  // some post stuff
}, maxDegreeOfParallelism: 10);
var count = bag.Count;

1
这是你的包裹吗?我已经看到您现在在几个地方发布此信息了吗?:D哦等等..你的名字在包装上:D +1
Piotr Kula

17
@ppumkin,是的,这是我的。我一遍又一遍地看到了这个问题,因此决定以最简单的方式解决它,并使其他人也摆脱困境:)
Serge Semenov

谢谢..这绝对有道理,并帮助我度过了愉快的时光!
Piotr Kula

2
您有错字:maxDegreeOfParallelism>maxDegreeOfParalellism
Shiran Dror

3
正确的拼写确实是maxDegreeOfParallelism,但是@ShiranDror的注释中有一些内容-在您的包中,您错误地将变量称为变量maxDegreeOfParalellism(因此,引用的代码只有在您对其进行更改后才能编译。)
BornToCode

17

有了SemaphoreSlim您就可以实现并行控制。

var bag = new ConcurrentBag<object>();
var maxParallel = 20;
var throttler = new SemaphoreSlim(initialCount: maxParallel);
var tasks = myCollection.Select(async item =>
{
  try
  {
     await throttler.WaitAsync();
     var response = await GetData(item);
     bag.Add(response);
  }
  finally
  {
     throttler.Release();
  }
});
await Task.WhenAll(tasks);
var count = bag.Count;

3

我的ParallelForEach异步轻量级实现。

特征:

  1. 节流(最大并行度)。
  2. 异常处理(完成时将引发聚合异常)。
  3. 高效的内存(无需存储任务列表)。

public static class AsyncEx
{
    public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism = 10)
    {
        var semaphoreSlim = new SemaphoreSlim(maxDegreeOfParallelism);
        var tcs = new TaskCompletionSource<object>();
        var exceptions = new ConcurrentBag<Exception>();
        bool addingCompleted = false;

        foreach (T item in source)
        {
            await semaphoreSlim.WaitAsync();
            asyncAction(item).ContinueWith(t =>
            {
                semaphoreSlim.Release();

                if (t.Exception != null)
                {
                    exceptions.Add(t.Exception);
                }

                if (Volatile.Read(ref addingCompleted) && semaphoreSlim.CurrentCount == maxDegreeOfParallelism)
                {
                    tcs.SetResult(null);
                }
            });
        }

        Volatile.Write(ref addingCompleted, true);
        await tcs.Task;
        if (exceptions.Count > 0)
        {
            throw new AggregateException(exceptions);
        }
    }
}

用法示例:

await Enumerable.Range(1, 10000).ParallelForEachAsync(async (i) =>
{
    var data = await GetData(i);
}, maxDegreeOfParallelism: 100);

2

我为此创建了一个扩展方法,该方法利用了SemaphoreSlim并允许设置最大并行度

    /// <summary>
    /// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
    /// </summary>
    /// <typeparam name="T">Type of IEnumerable</typeparam>
    /// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
    /// <param name="action">an async <see cref="Action" /> to execute</param>
    /// <param name="maxDegreeOfParallelism">Optional, An integer that represents the maximum degree of parallelism,
    /// Must be grater than 0</param>
    /// <returns>A Task representing an async operation</returns>
    /// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
    public static async Task ForEachAsyncConcurrent<T>(
        this IEnumerable<T> enumerable,
        Func<T, Task> action,
        int? maxDegreeOfParallelism = null)
    {
        if (maxDegreeOfParallelism.HasValue)
        {
            using (var semaphoreSlim = new SemaphoreSlim(
                maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value))
            {
                var tasksWithThrottler = new List<Task>();

                foreach (var item in enumerable)
                {
                    // Increment the number of currently running tasks and wait if they are more than limit.
                    await semaphoreSlim.WaitAsync();

                    tasksWithThrottler.Add(Task.Run(async () =>
                    {
                        await action(item).ContinueWith(res =>
                        {
                            // action is completed, so decrement the number of currently running tasks
                            semaphoreSlim.Release();
                        });
                    }));
                }

                // Wait for all tasks to complete.
                await Task.WhenAll(tasksWithThrottler.ToArray());
            }
        }
        else
        {
            await Task.WhenAll(enumerable.Select(item => action(item)));
        }
    }

用法示例:

await enumerable.ForEachAsyncConcurrent(
    async item =>
    {
        await SomeAsyncMethod(item);
    },
    5);

“使用”将无济于事。foreach循环将无限期地等待信号灯。只需尝试以下简单的代码即可重现此问题:等待Enumerable.Range(1,4).ForEachAsyncConcurrent(async(i)=> {Console.WriteLine(i); throw new Exception(“ test exception”);},maxDegreeOfParallelism: 2);
nicolay.anykienko,

@ nicolay.anykienko你是正确的#2。可以通过添加taskWithThrottler.RemoveAll(x => x.IsCompleted);解决该内存问题。
Askids '18

1
我已经在代码中尝试过了,如果maxDegreeOfParallelism不为null,则代码会死锁。:在这里你可以看到所有的代码重现stackoverflow.com/questions/58793118/...
马西莫Savazzi
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.