线程池类似于多处理池？

347

是否有用于工作线程的Pool类，类似于多处理模块的Pool类？

我喜欢例如并行化地图功能的简单方法

def long_running_func(p):
    c_func_no_gil(p)

p = multiprocessing.Pool(4)
xs = p.map(long_running_func, range(100))

但是，我希望这样做而不会产生新流程的开销。

我知道GIL。但是，在我的用例中，该函数将是IO绑定的C函数，python包装程序将在实际函数调用之前为其释放GIL。

我必须编写自己的线程池吗？

python multithreading missing-features

— 马丁
source

这在Python Cookbook中看起来很有希望：Recipe 576519：具有与（multi）processing.Pool（Python）相同的API的线程池

— otherchirps 2010年

1

如今，它是内置的：from multiprocessing.pool import ThreadPool。

— martineau

您能详细说明一下

I know about the GIL. However, in my usecase, the function will be an IO-bound C function for which the python wrapper will release the GIL before the actual function call.

吗？

— mrgloom

@mrgloom stackoverflow.com/questions/1294382

— Darklighter

448

我刚刚发现模块中实际上有一个基于线程的Pool接口multiprocessing，但是它有些隐藏并且没有正确记录。

可以通过导入

from multiprocessing.pool import ThreadPool

它是使用包装Python线程的虚拟Process类实现的。可以找到基于线程的Process类multiprocessing.dummy，在docs中对其进行了简要介绍。该虚拟模块应该提供基于线程的整个多处理接口。

— 马丁
source

5

棒极了。我在主线程之外创建ThreadPools时遇到问题，尽管可以在子线程中使用它们。我为此添加了一个问题：bugs.python.org/issue10015

— Olson

82

我不明白为什么该课程没有文档。如今，此类帮助程序类非常重要。

— 2012年

18

@Wernight：它之所以不公开，主要是因为没有人提供过提供该程序（或类似功能）作为threading.ThreadPool的补丁，包括文档和测试。将其包含在标准库中确实是一个很好的电池，但是如果没有人编写，它将不会发生。在多处理这个现有的实现的一个大好处，就是它应该做任何这样的线程补丁太多容易写（docs.python.org/devguide）

— ncoghlan

3

@ daniel.gindi：multiprocessing.dummy.Pool/ multiprocessing.pool.ThreadPool是同一件事，都是线程池。它们模仿进程池的接口，但是它们完全在线程方面实现。重新阅读文档，您将其倒退了。

— ShadowRanger

9

@ daniel.gindi：进一步阅读：“ multiprocessing.dummy复制API，multiprocessing但仅是threading模块周围的包装器。” multiprocessing通常，它与进程有关，但是为了允许在进程和线程之间进行切换，它们（大多数情况下）在中复制了multiprocessingAPI multiprocessing.dummy，但以线程（而非进程）为后盾。目的是允许您将import multiprocessing.dummy as multiprocessing基于进程的代码更改为基于线程的代码。

— ShadowRanger

235

在Python 3中，您可以使用concurrent.futures.ThreadPoolExecutor，即：

executor = ThreadPoolExecutor(max_workers=10)
a = executor.submit(my_function)

有关更多信息和示例，请参阅文档。

— 阿德里安·阿达米亚克（Adrian Adamiak）
source

6

为了使用sudo pip install futures

— 反向

这是最有效，最快的多处理方式

— Haritsinh Gohil

2

ThreadPoolExecutor和之间有什么区别multiprocessing.dummy.Pool？

— 周杰伦

2

从

— current.futures

63

是的，它似乎（或多或少）具有相同的API。

import multiprocessing

def worker(lnk):
    ....    
def start_process():
    .....
....

if(PROCESS):
    pool = multiprocessing.Pool(processes=POOL_SIZE, initializer=start_process)
else:
    pool = multiprocessing.pool.ThreadPool(processes=POOL_SIZE, 
                                           initializer=start_process)

pool.map(worker, inputs)
....

— 军费
source

9

的导入路径ThreadPool不同于Pool。正确的导入是from multiprocessing.pool import ThreadPool。

— 万寿菊

2

奇怪的是，这不是文档化的API，并且仅简要地提到multiprocessing.pool作为提供AsyncResult。但是它在2.x和3.x中可用。

— Marvin

2

这就是我想要的。它只是一个导入行，而对我现有的池行进行了很小的更改，效果很好。

— Danegraphics '18

39

对于非常简单和轻巧的东西（从此处稍作修改）：

from Queue import Queue
from threading import Thread


class Worker(Thread):
    """Thread executing tasks from a given tasks queue"""
    def __init__(self, tasks):
        Thread.__init__(self)
        self.tasks = tasks
        self.daemon = True
        self.start()

    def run(self):
        while True:
            func, args, kargs = self.tasks.get()
            try:
                func(*args, **kargs)
            except Exception, e:
                print e
            finally:
                self.tasks.task_done()


class ThreadPool:
    """Pool of threads consuming tasks from a queue"""
    def __init__(self, num_threads):
        self.tasks = Queue(num_threads)
        for _ in range(num_threads):
            Worker(self.tasks)

    def add_task(self, func, *args, **kargs):
        """Add a task to the queue"""
        self.tasks.put((func, args, kargs))

    def wait_completion(self):
        """Wait for completion of all the tasks in the queue"""
        self.tasks.join()

if __name__ == '__main__':
    from random import randrange
    from time import sleep

    delays = [randrange(1, 10) for i in range(100)]

    def wait_delay(d):
        print 'sleeping for (%d)sec' % d
        sleep(d)

    pool = ThreadPool(20)

    for i, d in enumerate(delays):
        pool.add_task(wait_delay, d)

    pool.wait_completion()

要支持完成任务的回调，您只需将回调添加到任务元组即可。

— 德戈里森
source

如果线程无条件地无限循环，线程又如何加入？

— 约瑟夫·加文

@JosephGarvin我已经对其进行了测试，并且线程在空队列上一直处于Queue.get()阻塞状态（由于对调用的阻塞），直到程序结束，然后它们自动终止。

— 论坛管理员

@JosephGarvin，好问题。Queue.join()实际上将加入任务队列，而不是工作线程。因此，当队列为空时，wait_completion返回，程序结束，并且线程被操作系统获取。

— randomir

如果所有这些代码都包装成一个整齐的函数，即使队列为空并pool.wait_completion()返回，它似乎也不会停止线程。结果就是线程不断建立。

— ubiquibacon，

16

嗨，在Python中使用线程池可以使用以下库：

from multiprocessing.dummy import Pool as ThreadPool

然后使用，这个库就是这样的：

pool = ThreadPool(threads)
results = pool.map(service, tasks)
pool.close()
pool.join()
return results

线程是所需的线程数，任务是大多数映射到服务的任务列表。

— 玛诺切尔（Manochehr Rasouli）
source

谢谢，这是一个很好的建议！从文档：multiprocessing.dummy复制了多处理的API，但仅不过是线程模块的包装。一种更正-我想您想说pool api是（function，iterable）

— 奠定了

2

我们错过了.close()和.join()调用，这导致.map()在所有线程完成之前完成。只是警告。

— 安纳托利·谢尔巴科夫

8

这是我最终使用的结果。它是上述dgorissen类的修改版本。

文件： threadpool.py

from queue import Queue, Empty
import threading
from threading import Thread


class Worker(Thread):
    _TIMEOUT = 2
    """ Thread executing tasks from a given tasks queue. Thread is signalable, 
        to exit
    """
    def __init__(self, tasks, th_num):
        Thread.__init__(self)
        self.tasks = tasks
        self.daemon, self.th_num = True, th_num
        self.done = threading.Event()
        self.start()

    def run(self):       
        while not self.done.is_set():
            try:
                func, args, kwargs = self.tasks.get(block=True,
                                                   timeout=self._TIMEOUT)
                try:
                    func(*args, **kwargs)
                except Exception as e:
                    print(e)
                finally:
                    self.tasks.task_done()
            except Empty as e:
                pass
        return

    def signal_exit(self):
        """ Signal to thread to exit """
        self.done.set()


class ThreadPool:
    """Pool of threads consuming tasks from a queue"""
    def __init__(self, num_threads, tasks=[]):
        self.tasks = Queue(num_threads)
        self.workers = []
        self.done = False
        self._init_workers(num_threads)
        for task in tasks:
            self.tasks.put(task)

    def _init_workers(self, num_threads):
        for i in range(num_threads):
            self.workers.append(Worker(self.tasks, i))

    def add_task(self, func, *args, **kwargs):
        """Add a task to the queue"""
        self.tasks.put((func, args, kwargs))

    def _close_all_threads(self):
        """ Signal all threads to exit and lose the references to them """
        for workr in self.workers:
            workr.signal_exit()
        self.workers = []

    def wait_completion(self):
        """Wait for completion of all the tasks in the queue"""
        self.tasks.join()

    def __del__(self):
        self._close_all_threads()


def create_task(func, *args, **kwargs):
    return (func, args, kwargs)

使用游泳池

from random import randrange
from time import sleep

delays = [randrange(1, 10) for i in range(30)]

def wait_delay(d):
    print('sleeping for (%d)sec' % d)
    sleep(d)

pool = ThreadPool(20)
for i, d in enumerate(delays):
    pool.add_task(wait_delay, d)
pool.wait_completion()

— 论坛者
source

其他读者的注释：这段代码是Python 3（shebang #!/usr/bin/python3）

— Daniel Marschall，

为什么使用for i, d in enumerate(delays):然后忽略该i值？

— martineau

@martineau-可能只是开发中的一个文物，他们可能想i在运行中打印。

— n1k31t4

为什么在create_task那这是为了什么

— MrR

我不敢相信并以4票对SO的回答是在Python中进行ThreadPooling的方法。官方python发行版中的Threadpool是否仍然损坏？我想念什么？

— MrR

2

创建新流程的开销非常小，尤其是其中只有4个时。我怀疑这是您应用程序的性能热点。保持简单，优化您必须去的地方以及分析结果指向的地方。

— Unbeli
source

5

如果发问者在Windows下（我不相信他指定了），那么我认为进程启动可能是一笔不小的费用。至少我最近在做这些项目。:-)

— 布兰登·罗兹

1

没有基于线程的内置池。但是，用Queue该类实现生产者/消费者队列可能很快。

来自：https : //docs.python.org/2/library/queue.html

from threading import Thread
from Queue import Queue
def worker():
    while True:
        item = q.get()
        do_work(item)
        q.task_done()

q = Queue()
for i in range(num_worker_threads):
     t = Thread(target=worker)
     t.daemon = True
     t.start()

for item in source():
    q.put(item)

q.join()       # block until all tasks are done

— 亚恩·拉明（Yann Ramin）
source

3

concurrent.futures模块不再是这种情况。

— 塔纳托斯（Thanatos）2014年

11

我认为这不再是真的。from multiprocessing.pool import ThreadPool

— Randall Hunt

1

没有记录multiprocessing.pool.ThreadPool，因为其实现从未完成。它缺少测试和文档。

— MrR