多重处理：如何在类中定义的函数上使用Pool.map？

178

当我运行类似：

from multiprocessing import Pool

p = Pool(5)
def f(x):
     return x*x

p.map(f, [1,2,3])

它工作正常。但是，将其作为类的函数：

class calculate(object):
    def run(self):
        def f(x):
            return x*x

        p = Pool()
        return p.map(f, [1,2,3])

cl = calculate()
print cl.run()

给我以下错误：

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/sw/lib/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/sw/lib/python2.6/threading.py", line 484, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/sw/lib/python2.6/multiprocessing/pool.py", line 225, in _handle_tasks
    put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

我看过Alex Martelli的一篇文章，涉及类似的问题，但还不够明确。

python multiprocessing pickle

— 默莫兹
source

1

“这是一个班级的功能”吗？您能否发布实际得到实际错误的代码。没有实际的代码，我们只能猜测您在做什么错。

— S.Lott

概括地说，存在比Python的标准pickle模块更强大的酸洗模块（例如此答案中提到的picloud模块）。

— klaus se 2013年

1

我在中的闭包也遇到了类似的问题IPython.Parallel，但是您可以通过将对象推送到节点来解决问题。用多处理解决这个问题似乎很烦人。

— Alex S

这calculate是可挑剔的，因此似乎可以通过以下方法解决：1）使用复制calculate实例的构造函数创建函数对象，然后2）将此函数对象的实例传递给Pool的map方法。没有？

— 2014年

1

@math我认为Python的任何“最近的更改”都不会有任何帮助。该multiprocessing模块的某些限制是由于其目标是成为一个跨平台的实现，以及fork(2)Windows中缺乏类似系统的调用。如果您不关心Win32支持，则可能有一个基于过程的更简单解决方法。或者，如果你准备使用的线程，而不是过程，可以替代from multiprocessing import Pool使用from multiprocessing.pool import ThreadPool as Pool。

— 2016年

69

我也对pool.map可以接受哪种功能的限制感到恼火。为了避免这种情况，我写了以下内容。即使递归使用parmap，它似乎也可以工作。

from multiprocessing import Process, Pipe
from itertools import izip

def spawn(f):
    def fun(pipe, x):
        pipe.send(f(x))
        pipe.close()
    return fun

def parmap(f, X):
    pipe = [Pipe() for x in X]
    proc = [Process(target=spawn(f), args=(c, x)) for x, (p, c) in izip(X, pipe)]
    [p.start() for p in proc]
    [p.join() for p in proc]
    return [p.recv() for (p, c) in pipe]

if __name__ == '__main__':
    print parmap(lambda x: x**x, range(1, 5))

— 规则
source

1

这对我来说非常有效，谢谢。我发现了一个缺点：我尝试对传递给defaultdict的某些函数使用parmap并再次得到PicklingError。我没有找到解决方案，我只是重新编写了代码以不使用defaultdict。

— SANS

2

在Win32上的Python 2.7.2（默认值，2011年6月12日，15：08：59）[MSC v.1500 32位（Intel）]中

— 不起作用

3

这确实适用于Python 2.7.3 Aug 1,2012，05:14:39。这不适用于巨型可迭代对象->它会导致OSError：[Errno 24]由于打开的管道数量过多，因此打开的文件太多。

— Eiyrioü冯Kauyf

该解决方案为每个工作项目生成一个过程。下面的“ klaus se”解决方案更为有效。

— ypnos

84

我无法使用到目前为止发布的代码，因为使用“ multiprocessing.Pool”的代码不适用于lambda表达式，并且不使用“ multiprocessing.Pool”的代码会产生与工作项一样多的进程。

我修改了代码，它生成了预定义数量的工作程序，并且仅在存在空闲工作程序时才迭代输入列表。我还为工作程序st ctrl-c按预期方式启用了“守护程序”模式。

import multiprocessing


def fun(f, q_in, q_out):
    while True:
        i, x = q_in.get()
        if i is None:
            break
        q_out.put((i, f(x)))


def parmap(f, X, nprocs=multiprocessing.cpu_count()):
    q_in = multiprocessing.Queue(1)
    q_out = multiprocessing.Queue()

    proc = [multiprocessing.Process(target=fun, args=(f, q_in, q_out))
            for _ in range(nprocs)]
    for p in proc:
        p.daemon = True
        p.start()

    sent = [q_in.put((i, x)) for i, x in enumerate(X)]
    [q_in.put((None, None)) for _ in range(nprocs)]
    res = [q_out.get() for _ in range(len(sent))]

    [p.join() for p in proc]

    return [x for i, x in sorted(res)]


if __name__ == '__main__':
    print(parmap(lambda i: i * 2, [1, 2, 3, 4, 6, 7, 8]))

— 克劳斯
source

2

您如何获得进度条才能正确使用此parmap功能？

— shockburner 2014年

2

一个问题-我使用了这种解决方案，但注意到我产生的python进程在内存中保持活动状态。关于如何在退出parmap时杀死它们的任何快速思考？

— CompEcon 2014年

1

@ klaus-se我知道我们不愿意在评论中说谢谢，但是您的回答对我来说太有价值了，我无法抗拒。我希望我能给您的不仅仅是一种声誉……

— deshtop 2015年

2

(None, None)作为最后一项传递的@greole 表示fun它已到达每个进程的项序列的末尾。

— aganders3 2015年

4

@deshtop：如果您自己有足够的声誉，您可以赏金:-)

— 马克

57

除非您跳出标准库，否则多重处理和酸洗将受到破坏和限制。

如果您使用 multiprocessing叫pathos.multiprocesssing，你可以直接使用类和类方法在多处理的map功能。这是因为dill用代替pickle或cPickle，并且dill可以在python中序列化几乎所有内容。

pathos.multiprocessing还提供了异步映射功能…，并且可以map使用多个参数（例如map(math.pow, [1,2,3], [4,5,6])）运行

查看讨论：多重处理和莳萝可以一起做什么？

和：http： //matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization

它甚至可以处理您最初编写的代码，而无需进行修改，也可以从解释器中进行处理。 为什么还有其他更脆弱且针对单个案例的问题？

>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> class calculate(object):
...  def run(self):
...   def f(x):
...    return x*x
...   p = Pool()
...   return p.map(f, [1,2,3])
... 
>>> cl = calculate()
>>> print cl.run()
[1, 4, 9]

在此处获取代码：https : //github.com/uqfoundation/pathos

而且，只是为了炫耀它可以做什么：

>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> 
>>> p = Pool(4)
>>> 
>>> def add(x,y):
...   return x+y
... 
>>> x = [0,1,2,3]
>>> y = [4,5,6,7]
>>> 
>>> p.map(add, x, y)
[4, 6, 8, 10]
>>> 
>>> class Test(object):
...   def plus(self, x, y): 
...     return x+y
... 
>>> t = Test()
>>> 
>>> p.map(Test.plus, [t]*4, x, y)
[4, 6, 8, 10]
>>> 
>>> res = p.amap(t.plus, x, y)
>>> res.get()
[4, 6, 8, 10]

— 迈克·麦克肯斯
source

1

pathos.multiprocessing还具有一个异步映射（amap），可以使用进度条和其他异步编程。

— Mike McKerns

我喜欢pathos.multiprocessing，它可以在享受多处理功能的同时几乎替代非并行地图。我有一个简单的pathos.multiprocessing.map包装器，这样在处理跨多个内核的只读大型数据结构时，它的内存使用效率更高，请参见git仓库。

— Fashandge，2014年

似乎很有趣，但是没有安装。这是pip给出的消息：Could not find a version that satisfies the requirement pp==1.5.7-pathos (from pathos)

— xApple

1

是。我已经有一段时间没有发布了，因为我已经将功能拆分为单独的程序包，并且还转换为2/3兼容代码。上面的大部分已被模块化，multiprocess其中2/3兼容。参见stackoverflow.com/questions/27873093/…和pypi.python.org/pypi/multiprocess。

— Mike McKerns，2016年

3

@xApple：作为后续版本，pathos已经发布了新的稳定版本，并且与2.x和3.x兼容。

— Mike McKerns '16

40

据我所知，目前还没有解决您的问题的方法：您map()必须通过导入模块来访问所赋予的功能。这就是robert的代码起作用的原因：f()可以通过导入以下代码来获得该函数：

def f(x):
    return x*x

class Calculate(object):
    def run(self):
        p = Pool()
        return p.map(f, [1,2,3])

if __name__ == '__main__':
    cl = Calculate()
    print cl.run()

我实际上添加了一个“主要”部分，因为它遵循Windows平台的建议（“确保主要模块可以由新的Python解释器安全地导入，而不会引起意外的副作用”）。

我还在前面加上了一个大写字母Calculate，以便遵循PEP 8。:)

— 埃里克·奥·莱比格
source

18

mrule的解决方案是正确的，但有一个错误：如果子级发送回大量数据，则它可以填充管道的缓冲区，阻塞子级pipe.send()，而父级正在等待子级退出pipe.join()。解决方案是在给孩子join()打电话之前先读取孩子的数据。此外，孩子应关闭父母的管道末端以防止死锁。下面的代码解决了该问题。另请注意，这parmap会为中的每个元素创建一个进程X。更高级的解决方案是使用multiprocessing.cpu_count()划分X成多个块，然后合并结果，然后再返回。我将其作为练习留给读者，以免破坏mrule的简洁答案的简洁性。;）

from multiprocessing import Process, Pipe
from itertools import izip

def spawn(f):
    def fun(ppipe, cpipe,x):
        ppipe.close()
        cpipe.send(f(x))
        cpipe.close()
    return fun

def parmap(f,X):
    pipe=[Pipe() for x in X]
    proc=[Process(target=spawn(f),args=(p,c,x)) for x,(p,c) in izip(X,pipe)]
    [p.start() for p in proc]
    ret = [p.recv() for (p,c) in pipe]
    [p.join() for p in proc]
    return ret

if __name__ == '__main__':
    print parmap(lambda x:x**x,range(1,5))

— 鲍勃·麦克艾拉斯（Bob McElrath）
source

您如何选择进程数？

— patapouf_ai 2016年

但是由于错误，它很快就死了OSError: [Errno 24] Too many open files。我认为在使其正常运行的过程数量上需要有某种限制...

— patapouf_ai

13

我也为此感到挣扎。作为简化的示例，我具有作为类的数据成员的功能：

from multiprocessing import Pool
import itertools
pool = Pool()
class Example(object):
    def __init__(self, my_add): 
        self.f = my_add  
    def add_lists(self, list1, list2):
        # Needed to do something like this (the following line won't work)
        return pool.map(self.f,list1,list2)

我需要在同一类的Pool.map（）调用中使用self.f函数，而self.f没有将元组作为参数。由于此函数嵌入在类中，因此我不清楚如何编写包装器的类型以及其他建议的答案。

我通过使用另一个接受元组/列表的包装器解决了这个问题，其中第一个元素是函数，其余元素是该函数的参数，称为eval_func_tuple（f_args）。使用此功能，有问题的行可以用return pool.map（eval_func_tuple，itertools.izip（itertools.repeat（self.f），list1，list2））代替。这是完整的代码：

档案：util.py

def add(a, b): return a+b

def eval_func_tuple(f_args):
    """Takes a tuple of a function and args, evaluates and returns result"""
    return f_args[0](*f_args[1:])

档案：main.py

from multiprocessing import Pool
import itertools
import util  

pool = Pool()
class Example(object):
    def __init__(self, my_add): 
        self.f = my_add  
    def add_lists(self, list1, list2):
        # The following line will now work
        return pool.map(util.eval_func_tuple, 
            itertools.izip(itertools.repeat(self.f), list1, list2)) 

if __name__ == '__main__':
    myExample = Example(util.add)
    list1 = [1, 2, 3]
    list2 = [10, 20, 30]
    print myExample.add_lists(list1, list2)

运行main.py将得到[11，22，33]。随时进行改进，例如也可以将eval_func_tuple修改为采用关键字参数。

另一方面，在另一个答案中，对于进程数多于可用CPU数的情况，可以使函数“ parmap”更有效。我在下面复制一个编辑后的版本。这是我的第一篇文章，我不确定是否应该直接编辑原始答案。我还重命名了一些变量。

from multiprocessing import Process, Pipe  
from itertools import izip  

def spawn(f):  
    def fun(pipe,x):  
        pipe.send(f(x))  
        pipe.close()  
    return fun  

def parmap(f,X):  
    pipe=[Pipe() for x in X]  
    processes=[Process(target=spawn(f),args=(c,x)) for x,(p,c) in izip(X,pipe)]  
    numProcesses = len(processes)  
    processNum = 0  
    outputList = []  
    while processNum < numProcesses:  
        endProcessNum = min(processNum+multiprocessing.cpu_count(), numProcesses)  
        for proc in processes[processNum:endProcessNum]:  
            proc.start()  
        for proc in processes[processNum:endProcessNum]:  
            proc.join()  
        for proc,c in pipe[processNum:endProcessNum]:  
            outputList.append(proc.recv())  
        processNum = endProcessNum  
    return outputList    

if __name__ == '__main__':  
    print parmap(lambda x:x**x,range(1,5))

— 布兰特
source

8

我回答了klaus se和aganders3的回答，并制作了一个文档化的模块，该模块更具可读性，并保存在一个文件中。您可以将其添加到您的项目中。它甚至还有一个可选的进度条！

"""
The ``processes`` module provides some convenience functions
for using parallel processes in python.

Adapted from http://stackoverflow.com/a/16071616/287297

Example usage:

    print prll_map(lambda i: i * 2, [1, 2, 3, 4, 6, 7, 8], 32, verbose=True)

Comments:

"It spawns a predefined amount of workers and only iterates through the input list
 if there exists an idle worker. I also enabled the "daemon" mode for the workers so
 that KeyboardInterupt works as expected."

Pitfalls: all the stdouts are sent back to the parent stdout, intertwined.

Alternatively, use this fork of multiprocessing: 
https://github.com/uqfoundation/multiprocess
"""

# Modules #
import multiprocessing
from tqdm import tqdm

################################################################################
def apply_function(func_to_apply, queue_in, queue_out):
    while not queue_in.empty():
        num, obj = queue_in.get()
        queue_out.put((num, func_to_apply(obj)))

################################################################################
def prll_map(func_to_apply, items, cpus=None, verbose=False):
    # Number of processes to use #
    if cpus is None: cpus = min(multiprocessing.cpu_count(), 32)
    # Create queues #
    q_in  = multiprocessing.Queue()
    q_out = multiprocessing.Queue()
    # Process list #
    new_proc  = lambda t,a: multiprocessing.Process(target=t, args=a)
    processes = [new_proc(apply_function, (func_to_apply, q_in, q_out)) for x in range(cpus)]
    # Put all the items (objects) in the queue #
    sent = [q_in.put((i, x)) for i, x in enumerate(items)]
    # Start them all #
    for proc in processes:
        proc.daemon = True
        proc.start()
    # Display progress bar or not #
    if verbose:
        results = [q_out.get() for x in tqdm(range(len(sent)))]
    else:
        results = [q_out.get() for x in range(len(sent))]
    # Wait for them to finish #
    for proc in processes: proc.join()
    # Return results #
    return [x for i, x in sorted(results)]

################################################################################
def test():
    def slow_square(x):
        import time
        time.sleep(2)
        return x**2
    objs    = range(20)
    squares = prll_map(slow_square, objs, 4, verbose=True)
    print "Result: %s" % squares

编辑：添加了@ alexander-mcfarlane建议和一个测试功能

— 苹果
source

进度栏的一个问题...进度栏仅衡量工作负载在处理器之间分配的效率如何低下。如果工作负载完全分开，那么所有处理器都将join()同时运行，那么您将100%在tqdm显示屏中看到完成的闪光。唯一有用的时间是每个处理器的工作负载是否有偏差

— Alexander McFarlane

1

继续前进tqdm()：result = [q_out.get() for _ in tqdm(sent)]它的工作原理要好得多-尽管付出了很大的努力，但我还是非常感谢+1

— Alexander McFarlane

感谢您的建议，我会尝试一下，然后更新答案！

— xApple

答案已更新，进度条效果更好！

— xApple 2016年

8

我知道这个问题是在6年前提出的，但是只是想添加我的解决方案，因为上面的一些建议看起来非常复杂，但是我的解决方案实际上非常简单。

我要做的就是将pool.map（）调用包装到一个辅助函数中。将方法的类对象和args作为元组传递，看起来有点像这样。

def run_in_parallel(args):
    return args[0].method(args[1])

myclass = MyClass()
method_args = [1,2,3,4,5,6]
args_map = [ (myclass, arg) for arg in method_args ]
pool = Pool()
pool.map(run_in_parallel, args_map)

— 夜猫子
source

7

在类中定义的函数（即使在类中的函数内部）也不是真正的泡菜。但是，这可行：

def f(x):
    return x*x

class calculate(object):
    def run(self):
        p = Pool()
    return p.map(f, [1,2,3])

cl = calculate()
print cl.run()

— 罗伯特
source

15

谢谢，但是我发现在类之外定义函数有点脏。该类应捆绑完成给定任务所需的所有内容。

— Mermoz

3

@Memoz：“该类应该捆绑所有需要的东西”真的吗？我找不到很多这样的例子。大多数类依赖于其他类或函数。为什么称类依赖为“脏”？依赖有什么问题？

— S.Lott

好的，该函数不应修改现有的类数据-因为它将在另一个过程中修改版本-因此它可以是静态方法。您可以选择一种腌制静态方法：stackoverflow.com/questions/1914261/… 或者，对于这种琐碎的事情，您可以使用lambda。

— 罗伯特

6

我知道这个问题是在8年零10个月前提出的，但我想向您介绍我的解决方案：

from multiprocessing import Pool

class Test:

    def __init__(self):
        self.main()

    @staticmethod
    def methodForMultiprocessing(x):
        print(x*x)

    def main(self):
        if __name__ == "__main__":
            p = Pool()
            p.map(Test.methodForMultiprocessing, list(range(1, 11)))
            p.close()

TestObject = Test()

您只需要使类函数成为静态方法即可。但是也可以使用类方法：

from multiprocessing import Pool

class Test:

    def __init__(self):
        self.main()

    @classmethod
    def methodForMultiprocessing(cls, x):
        print(x*x)

    def main(self):
        if __name__ == "__main__":
            p = Pool()
            p.map(Test.methodForMultiprocessing, list(range(1, 11)))
            p.close()

TestObject = Test()

在Python 3.7.3中测试

— 托纳克斯O7
source

3

我修改了klaus se的方法，因为当它以较小的列表为我工作时，当项目数大于或等于1000时，它将挂起。None我没有一次在停止条件下一次推送作业，而是一次全部加载了输入队列，只是让进程在其上进行修改直到它变空。

from multiprocessing import cpu_count, Queue, Process

def apply_func(f, q_in, q_out):
    while not q_in.empty():
        i, x = q_in.get()
        q_out.put((i, f(x)))

# map a function using a pool of processes
def parmap(f, X, nprocs = cpu_count()):
    q_in, q_out   = Queue(), Queue()
    proc = [Process(target=apply_func, args=(f, q_in, q_out)) for _ in range(nprocs)]
    sent = [q_in.put((i, x)) for i, x in enumerate(X)]
    [p.start() for p in proc]
    res = [q_out.get() for _ in sent]
    [p.join() for p in proc]

    return [x for i,x in sorted(res)]

编辑：不幸的是，现在我在系统上遇到此错误：Multiprocessing Queue maxsize限制为32767，希望那里的解决方法会有所帮助。

— aganders3
source

1

如果您以某种方式手动忽略了类中的Pool对象列表中的对象，则可以运行您的代码而不会出现任何问题，因为pickle错误无法表明该对象。您可以使用以下__getstate__功能（也请参见此处）执行此操作。该Pool对象将尝试查找__getstate__和__setstate__函数，并在运行时找到它们并执行它们map，map_async等等：

class calculate(object):
    def __init__(self):
        self.p = Pool()
    def __getstate__(self):
        self_dict = self.__dict__.copy()
        del self_dict['p']
        return self_dict
    def __setstate__(self, state):
        self.__dict__.update(state)

    def f(self, x):
        return x*x
    def run(self):
        return self.p.map(self.f, [1,2,3])

然后做：

cl = calculate()
cl.run()

将为您提供输出：

[1, 4, 9]

我已经在Python 3.x中测试了上面的代码，并且可以正常工作。

— 阿米尔
source

0

我不确定是否采用了这种方法，但是我正在使用的解决方法是：

from multiprocessing import Pool

t = None

def run(n):
    return t.f(n)

class Test(object):
    def __init__(self, number):
        self.number = number

    def f(self, x):
        print x * self.number

    def pool(self):
        pool = Pool(2)
        pool.map(run, range(10))

if __name__ == '__main__':
    t = Test(9)
    t.pool()
    pool = Pool(2)
    pool.map(run, range(10))

输出应为：

— CpILL
source

0

class Calculate(object):
  # Your instance method to be executed
  def f(self, x, y):
    return x*y

if __name__ == '__main__':
  inp_list = [1,2,3]
  y = 2
  cal_obj = Calculate()
  pool = Pool(2)
  results = pool.map(lambda x: cal_obj.f(x, y), inp_list)

您可能希望将此函数应用于类的每个不同实例。那么这也是解决方案

class Calculate(object):
  # Your instance method to be executed
  def __init__(self, x):
    self.x = x

  def f(self, y):
    return self.x*y

if __name__ == '__main__':
  inp_list = [Calculate(i) for i in range(3)]
  y = 2
  pool = Pool(2)
  results = pool.map(lambda x: x.f(y), inp_list)

— 希卡杜
source

0

这是我的解决方案，我认为它比这里的大多数其他解决方案都没有那么强大。这类似于nightowl的答案。

someclasses = [MyClass(), MyClass(), MyClass()]

def method_caller(some_object, some_method='the method'):
    return getattr(some_object, some_method)()

othermethod = partial(method_caller, some_method='othermethod')

with Pool(6) as pool:
    result = pool.map(othermethod, someclasses)

— 艾伦·奥恩
source

0

从http://www.rueckstiess.net/research/snippets/show/ca1d7d90和 http://qingkaikong.blogspot.com/2016/12/python-parallel-method-in-class.html

我们可以创建一个外部函数，并使用类self对象将其作为种子：

from joblib import Parallel, delayed
def unwrap_self(arg, **kwarg):
    return square_class.square_int(*arg, **kwarg)

class square_class:
    def square_int(self, i):
        return i * i

    def run(self, num):
        results = []
        results = Parallel(n_jobs= -1, backend="threading")\
            (delayed(unwrap_self)(i) for i in zip([self]*len(num), num))
        print(results)

或没有joblib：

from multiprocessing import Pool
import time

def unwrap_self_f(arg, **kwarg):
    return C.f(*arg, **kwarg)

class C:
    def f(self, name):
        print 'hello %s,'%name
        time.sleep(5)
        print 'nice to meet you.'

    def run(self):
        pool = Pool(processes=2)
        names = ('frank', 'justin', 'osi', 'thomas')
        pool.map(unwrap_self_f, zip([self]*len(names), names))

if __name__ == '__main__':
    c = C()
    c.run()

— 鲍勃·巴克斯利
source

0

这可能不是一个很好的解决方案，但就我而言，我是这样解决的。

from multiprocessing import Pool

def foo1(data):
    self = data.get('slf')
    lst = data.get('lst')
    return sum(lst) + self.foo2()

class Foo(object):
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def foo2(self):
        return self.a**self.b   

    def foo(self):
        p = Pool(5)
        lst = [1, 2, 3]
        result = p.map(foo1, (dict(slf=self, lst=lst),))
        return result

if __name__ == '__main__':
    print(Foo(2, 4).foo())

我必须传递self给函数，因为我必须通过该函数访问类的属性和函数。这对我有用。始终欢迎提出纠正和建议。

— 穆罕默德·哈桑（Muhammad Hassan）
source

0

这是我为在python3中使用多处理池而编写的样板，特别是使用python3.7.7来运行测试。我使用跑得最快imap_unordered。只需插入您的方案并尝试一下即可。您可以使用timeit或仅time.time()找出最适合您的方法。

import multiprocessing
import time

NUMBER_OF_PROCESSES = multiprocessing.cpu_count()
MP_FUNCTION = 'starmap'  # 'imap_unordered' or 'starmap' or 'apply_async'

def process_chunk(a_chunk):
    print(f"processig mp chunk {a_chunk}")
    return a_chunk


map_jobs = [1, 2, 3, 4]

result_sum = 0

s = time.time()
if MP_FUNCTION == 'imap_unordered':
    pool = multiprocessing.Pool(processes=NUMBER_OF_PROCESSES)
    for i in pool.imap_unordered(process_chunk, map_jobs):
        result_sum += i
elif MP_FUNCTION == 'starmap':
    pool = multiprocessing.Pool(processes=NUMBER_OF_PROCESSES)
    try:
        map_jobs = [(i, ) for i in map_jobs]
        result_sum = pool.starmap(process_chunk, map_jobs)
        result_sum = sum(result_sum)
    finally:
        pool.close()
        pool.join()
elif MP_FUNCTION == 'apply_async':
    with multiprocessing.Pool(processes=NUMBER_OF_PROCESSES) as pool:
        result_sum = [pool.apply_async(process_chunk, [i, ]).get() for i in map_jobs]
    result_sum = sum(result_sum)
print(f"result_sum is {result_sum}, took {time.time() - s}s")

在上述情况下，imap_unordered实际上似乎对我而言表现最差。试用您的案例，并在计划运行的计算机上对其进行基准测试。也请继续阅读过程池。干杯!

— 拉德泰克
source