从URL下载返回的Zip文件

84

如果我有一个URL，当在Web浏览器中提交该URL时，会弹出一个对话框来保存一个zip文件，我将如何在Python中捕获并下载该zip文件？

— 用户名
source

1

我尝试了“下载二进制文件并将其写入此页面的磁盘”部分，该页面用作chram。

— Zeinab Abbasimazar，

32

大多数人建议使用requests它，如果有的话，requests 文档建议这样做以从URL下载和保存原始数据：

import requests 

def download_url(url, save_path, chunk_size=128):
    r = requests.get(url, stream=True)
    with open(save_path, 'wb') as fd:
        for chunk in r.iter_content(chunk_size=chunk_size):
            fd.write(chunk)

由于答案询问有关下载和保存zip文件的问题，因此我没有介绍有关阅读zip文件的详细信息。有关可能性，请参见下面的众多答案之一。

如果由于某种原因您无权访问requests，则可以urllib.request改用。它可能没有上述功能那么强大。

import urllib.request

def download_url(url, save_path):
    with urllib.request.urlopen(url) as dl_file:
        with open(save_path, 'wb') as out_file:
            out_file.write(dl_file.read())

最后，如果您仍在使用Python 2，则可以使用urllib2.urlopen。

from contextlib import closing

def download_url(url, save_path):
    with closing(urllib2.urlopen(url)) as dl_file:
        with open(save_path, 'wb') as out_file:
            out_file.write(dl_file.read())

— 哨兵
source

您能否也添加示例代码段。这么做真是

— 太好了

203

据我所知，正确的方法是：

import requests, zipfile, StringIO
r = requests.get(zip_file_url, stream=True)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
z.extractall()

当然，您需要检查GET是否成功r.ok。

对于Python 3+，子StringIO模块是与IO模块和使用BytesIO代替StringIO的：这里是这项变更的发行说明。

import requests, zipfile, io
r = requests.get(zip_file_url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("/path/to/destination_directory")

— 约阿夫兰
source

感谢您的回答。我用它来解决获取带有请求的zip文件的问题。

— gr1zzly be4r

yoavram，在您的代码中-我在哪里输入网页的网址？

— newGIS '16

25

如果要将下载的文件保存在其他位置，请替换z.extractall()为z.extractall("/path/to/destination_directory")

— user799188 '16

1

如果您只想从url中保存文件，则可以执行以下操作：urllib.request.urlretrieve(url, filename)。

— yoavram '18

3

为了帮助其他人连接我花了60分钟才花很长时间的点，您可以pd.read_table(z.open('filename'))在上面使用。如果您有一个包含多个文件的zip URL链接，并且仅对加载一个文件感兴趣，则该功能非常有用。

— Frikster

12

在这篇博客文章的帮助下，我已经使它与Just一起工作requests。奇怪stream的是，这样我们就不需要调用content大型请求了，这将要求立即处理所有请求，从而阻塞内存。在stream通过一次通过数据一个块迭代避免这一点。

url = 'https://www2.census.gov/geo/tiger/GENZ2017/shp/cb_2017_02_tract_500k.zip'
target_path = 'alaska.zip'

response = requests.get(url, stream=True)
handle = open(target_path, "wb")
for chunk in response.iter_content(chunk_size=512):
    if chunk:  # filter out keep-alive new chunks
        handle.write(chunk)
handle.close()

— 耶利米英格兰
source

2

答案不应该依赖于链接的大部分内容。链接可能会失效，或者可以更改另一端的内容以不再回答问题。请编辑您的答案，以包括您链接指向的信息的摘要或说明。

— mypetlion

7

这是我要在Python 3中工作的内容：

import zipfile, urllib.request, shutil

url = 'http://www....myzipfile.zip'
file_name = 'myzip.zip'

with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)
    with zipfile.ZipFile(file_name) as zf:
        zf.extractall()

— Webucator
source

你好。如何才能避免这个错误：urllib.error.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.？

— 维克多·M·赫拉斯梅·佩雷斯，

@ VictorHerasmePerez，HTTP 302响应状态代码表示该页面已被移动。我认为您面临的问题已在此处解决：stackoverflow.com/questions/32569934/…–

— Webucator

5

您可以使用urllib2.urlopen，也可以尝试使用出色的Requests模块，避免urllib2的麻烦：

import requests
results = requests.get('url')
#pass results.content onto secondary processing...

— Aravenel
source

1

但是，如何将results.content解析为zip？

— 0atman 2012年

使用zipfile模块： zip = zipfile.ZipFile(results.content)。然后，只需通过使用文件解析ZipFile.namelist()，ZipFile.open()或者ZipFile.extractall()

— aravenel

5

我是来这里搜索如何保存.bzip2文件的。让我为可能会寻找此代码的其他人粘贴代码。

url = "http://api.mywebsite.com"
filename = "swateek.tar.gz"

response = requests.get(url, headers=headers, auth=('myusername', 'mypassword'), timeout=50)
if response.status_code == 200:
with open(filename, 'wb') as f:
   f.write(response.content)

我只是想按原样保存文件。

— Swateek
source

3

感谢@yoavram提供上述解决方案，我的url路径链接到一个压缩文件夹，并遇到BADZipfile错误（文件不是zip文件），如果尝试多次检索URL并将其全部解压缩，这很奇怪突然之间，所以我对解决方案进行了一点修改。按照此处使用is_zipfile方法

r = requests.get(url, stream =True)
check = zipfile.is_zipfile(io.BytesIO(r.content))
while not check:
    r = requests.get(url, stream =True)
    check = zipfile.is_zipfile(io.BytesIO(r.content))
else:
    z = zipfile.ZipFile(io.BytesIO(r.content))
    z.extractall()

— 印度洋
source