带有Python请求的异步请求

142

我尝试了python 请求库文档中提供的示例。

使用async.map(rs)，我得到了响应代码，但是我想获得所请求的每个页面的内容。例如，这不起作用：

out = async.map(rs)
print out[0].content

— 特雷克
source

也许您得到的答复是空洞的？

— Mariusz Jamro '02

为我工作。请发布您收到的完整错误。

— Chewie

没有错误。它只是通过提供的测试网址永久运行。

— trbck '02

当我在https上使用网址时，它显然会出现。http工作正常

— trbck 2012年

看起来像requests-threads现在存在。

— OrangeDog

154

注意

下面的答案是不适用于请求v0.13.0 +。编写此问题后，异步功能已移至grequests。但是，您可以将其替换requests为grequests下面的内容，它应该可以工作。

我已经留下了这个答案，以反映原始问题，该问题与使用请求<v0.13.0有关。

要async.map 异步执行多个任务，您必须：

为每个对象定义一个函数（您的任务）
将该函数添加为请求中的事件挂钩
调用async.map所有请求/操作的列表

例：

from requests import async
# If using requests > v0.13.0, use
# from grequests import async

urls = [
    'http://python-requests.org',
    'http://httpbin.org',
    'http://python-guide.org',
    'http://kennethreitz.com'
]

# A simple task to do to each response object
def do_something(response):
    print response.url

# A list to hold our things to do via async
async_list = []

for u in urls:
    # The "hooks = {..." part is where you define what you want to do
    # 
    # Note the lack of parentheses following do_something, this is
    # because the response will be used as the first argument automatically
    action_item = async.get(u, hooks = {'response' : do_something})

    # Add the task to our list of things to do via async
    async_list.append(action_item)

# Do our list of things to do via async
async.map(async_list)

— 杰夫
source

2

留下您的评论的好主意：由于最新请求和grequests之间存在兼容性问题（请求1.1.0中缺少max_retries选项），我不得不降级请求以检索异步，并且我发现异步功能已随版本0.13+一起移动（pypi.python.org/pypi/requests）

— outforawhile

1

愚蠢的问题：与简单请求相比，使用grequest的速度提高了多少？请求有什么限制？例如将3500个请求放在async.map中可以吗？

— droope 2014年

10

from grequests import async不起作用..而这种对某事有用的定义对我来说很有效def do_something(response, **kwargs):，我从stackoverflow.com/questions/15594015/…中

— Allan Ruin 2014年

3

如果async.map调用仍然阻塞，那么该异步如何？除了请求本身是异步发送的，检索仍然是同步的吗？

— bryanph

3

更换from requests import async的import grequests as async为我工作。

— 马丁·托马

79

async现在是一个独立的模块：grequests。

看到这里：https : //github.com/kennethreitz/grequests

那里：通过Python发送多个HTTP请求的理想方法？

安装：

$ pip install grequests

用法：

建立一个堆栈：

import grequests

urls = [
    'http://www.heroku.com',
    'http://tablib.org',
    'http://httpbin.org',
    'http://python-requests.org',
    'http://kennethreitz.com'
]

rs = (grequests.get(u) for u in urls)

发送堆栈

grequests.map(rs)

结果看起来像

[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]

grequests似乎没有为并发请求设置限制，即当多个请求发送到同一服务器时。

— 暂时
source

11

关于并发请求的限制-您可以在运行map（）/ imap（）时指定池大小。即grequests.map（rs，size = 20）有20个并发抓取。

— synthesizerpatel

1

到目前为止，这还不具备python3功能（gevent无法在py3.4上构建v2.6）。

— saarp 2014年

1

我不太了解异步部分。如果我让results = grequests.map(rs)此行之后的代码被阻止，我可以看到异步效果吗？

— 艾伦·鲁恩

47

我同时测试了request-futures和grequests。Grequests速度更快，但是会带来猴子补丁和依赖关系的其他问题。request-futures比grequests慢几倍。我决定将自己的请求简单地包装到ThreadPoolExecutor中，这几乎与grequests一样快，但是没有外部依赖项。

import requests
import concurrent.futures

def get_urls():
    return ["url1","url2"]

def load_url(url, timeout):
    return requests.get(url, timeout = timeout)

with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:

    future_to_url = {executor.submit(load_url, url, 10): url for url in     get_urls()}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            resp_err = resp_err + 1
        else:
            resp_ok = resp_ok + 1

— 霍扎
source

在这里什么类型的例外是可能的？

— 慢哈利

requests.exceptions.Timeout

— Hodza 2015年

2

对不起，我不明白你的问题。在多个线程中仅使用单个url？仅一例DDoS攻击））

— Hodza 2015年

1

我不明白为什么这个答案获得了如此多的赞誉。OP问题是关于异步请求。ThreadPoolExecutor运行线程。是的，您可以在多个线程中发出请求，但这永远不会是一个异步程序，所以我怎么能回答原始问题？

— nagylzs

1

实际上，问题在于如何并行加载URL。是的，线程池执行程序不是最佳选择，最好使用async io，但在Python中效果很好。而且我不明白为什么线程不能用于异步？如果您需要异步运行CPU绑定的任务怎么办？

— Hodza

29

也许要求－未来是另一种选择。

from requests_futures.sessions import FuturesSession

session = FuturesSession()
# first request is started in background
future_one = session.get('http://httpbin.org/get')
# second requests is started immediately
future_two = session.get('http://httpbin.org/get?foo=bar')
# wait for the first request to complete, if it hasn't already
response_one = future_one.result()
print('response one status: {0}'.format(response_one.status_code))
print(response_one.content)
# wait for the second request to complete, if it hasn't already
response_two = future_two.result()
print('response two status: {0}'.format(response_two.status_code))
print(response_two.content)

办公文件中也建议使用此功能。如果您不想参与gevent，那将是一个很好的选择。

— Dreampuf
source

1

最简单的解决方案之一。可以通过定义max_workers参数来增加并发请求的数量

— Jose Cherian

1

很高兴看到此比例的示例，因此我们没有在每个项目中使用一个变量名进行循环。

— user1717828

每个请求只有一个线程很浪费资源！例如，不可能同时执行500个请求，它将杀死您的CPU。永远不要认为这是一个好的解决方案。

— Corneliu Maftuleac '18

@CorneliuMaftuleac好点。关于线程使用，您绝对需要关心它，并且库提供了启用线程池或处理池的选项。ThreadPoolExecutor(max_workers=10)

— Dreampuf

我相信@Dreampuf处理池会更糟吗？

— Corneliu Maftuleac '18

11

我在发布的大多数答案中都遇到了很多问题-它们要么使用已过时的库，这些库已被移植以具有有限的功能，要么为解决方案的执行提供了太多魔力，因此难以处理错误。如果它们不属于上述类别之一，则说明它们是第三方库或已弃用。

某些解决方案完全可以在http请求中正常工作，但是对于任何其他种类的请求（这都是荒谬的），这些解决方案都不够。这里不需要高度定制的解决方案。

简单地使用python内置库asyncio足以执行任何类型的异步请求，并为复杂的和用例特定的错误处理提供足够的流动性。

import asyncio

loop = asyncio.get_event_loop()

def do_thing(params):
    async def get_rpc_info_and_do_chores(id):
        # do things
        response = perform_grpc_call(id)
        do_chores(response)

    async def get_httpapi_info_and_do_chores(id):
        # do things
        response = requests.get(URL)
        do_chores(response)

    async_tasks = []
    for element in list(params.list_of_things):
       async_tasks.append(loop.create_task(get_chan_info_and_do_chores(id)))
       async_tasks.append(loop.create_task(get_httpapi_info_and_do_chores(ch_id)))

    loop.run_until_complete(asyncio.gather(*async_tasks))

它是如何工作的很简单。您正在创建一系列要异步执行的任务，然后要求循环执行这些任务并在完成时退出。没有多余的库，无需维护，也无需缺少功能。

— Arshbot
source

2

如果我理解正确，这会在执行GRPC和HTTP调用时阻止事件循环吗？因此，如果这些调用需要几秒钟才能完成，那么整个事件循环将阻塞几秒钟？为避免这种情况，您需要使用GRPC或HTTP库async。然后您可以例如执行 await response = requests.get(URL)。没有？

— Coder Nr 23

不幸的是，当尝试这样做时，我发现做一个包装器requests仅比同步调用URL列表快（在某些情况下更慢）。例如，使用上述策略请求一个需要3秒才能响应10次的端点大约需要30秒。如果要获得真正的async性能，则需要使用aiohttp。

— DragonBobZ

8

我知道这已经关闭了一段时间，但我认为推广另一个基于请求库的异步解决方案可能很有用。

list_of_requests = ['http://moop.com', 'http://doop.com', ...]

from simple_requests import Requests
for response in Requests().swarm(list_of_requests):
    print response.content

这些文档在这里：http : //pythonhosted.org/simple-requests/

— 猴子玻色子
source

@YSY随时发布一个问题：github.com/ctheiss/simple-requests/issues ; 我每天实际上要使用该库数千次。

— Monkey Boson

波士顿，您如何处理404/500错误？https网址呢？将支持包含数千个网址的剪裁。你能举个例子吗？谢谢

— YSY 2015年

@YSY默认情况下，404/500错误引发异常。可以覆盖此行为（请参见pythonhosted.org/simple-requests/…）。由于对gevent的依赖，HTTPS url很棘手，目前它对此有一个突出的错误（github.com/gevent/gevent/issues/477）。有一个在可以运行的机票垫片，但它仍然会抛出SNI服务器警告（但它会工作）。至于剪裁，恐怕我的所有用法都在我公司被关闭了。但我向您保证，我们会在数十个工作中执行数千个请求。

— Monkey Boson

图书馆在交互方面看起来很时尚。Python3 +是否可用？对不起，看不到任何提及。

— 艾萨克·菲利普

@Jethro绝对正确，图书馆总共需要重新写，因为潜在的技术，现在在Python 3完全不同的对，该库是“完整”，但只适用于Python的2

— 猴玻色子

4

threads=list()

for requestURI in requests:
    t = Thread(target=self.openURL, args=(requestURI,))
    t.start()
    threads.append(t)

for thread in threads:
    thread.join()

...

def openURL(self, requestURI):
    o = urllib2.urlopen(requestURI, timeout = 600)
    o...

— 杰森·派普（Jason Pump）
source

4

这是线程中的“正常”请求。不错的例子，购买是题外话。

— 尼克

4

如果你想使用ASYNCIO，然后requests-async提供异步/ AWAIT功能为requests- https://github.com/encode/requests-async

— 汤姆·克里斯蒂
source

2

确定，效果很好。在项目页面上，它说这项工作已由以下项目github.com/encode/httpx取代

— nurettin

2

我一直在使用python请求对github的gist API进行异步调用。

有关示例，请参见此处的代码：

https://github.com/davidthewatson/flasgist/blob/master/views.py#L60-72

这种样式的python可能不是最清晰的例子，但是我可以向您保证代码可以工作。让我知道这是否使您感到困惑，我们将对其进行记录。

— 大卫·沃森
source

2

您可以使用httpx它。

import httpx

async def get_async(url):
    async with httpx.AsyncClient() as client:
        return await client.get(url)

urls = ["http://google.com", "http://wikipedia.org"]

# Note that you need an async context to use `await`.
await asyncio.gather(*map(get_async, urls))

如果您需要功能语法，则gamla lib 会将其包装到中get_async。

那你可以做


await gamla.map(gamla.get_async(10), ["http://google.com", "http://wikipedia.org"])

在10以秒为单位的超时时间。

（免责声明：我是它的作者）

— 宇里
source

而respx对于嘲弄/测试:)

— RLAT

0

我还尝试了使用python中的异步方法进行某些操作，但是使用twist进行异步编程的运气却更好。它具有较少的问题，并且有据可查。这是一些类似于您正在尝试的东西的链接。

http://pythonquirks.blogspot.com/2011/04/twisted-asynchronous-http-request.html

— 山姆
source

扭曲是老式的。请改用HTTPX。

— AmirHossein