如何使用boto3将S3对象保存到文件

131

我正在尝试使用适用于AWS的新boto3客户端做一个“ hello world” 。

我的用例非常简单：从S3获取对象并将其保存到文件中。

在boto 2.XI中，它应该是这样的：

import boto
key = boto.connect_s3().get_bucket('foo').get_key('foo')
key.get_contents_to_filename('/tmp/foo')

在boto 3中。我找不到一种干净的方法来做同样的事情，所以我手动遍历了“ Streaming”对象：

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    chunk = key['Body'].read(1024*8)
    while chunk:
        f.write(chunk)
        chunk = key['Body'].read(1024*8)

要么

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    for chunk in iter(lambda: key['Body'].read(4096), b''):
        f.write(chunk)

而且效果很好。我想知道是否有任何“本机” boto3函数可以完成相同的任务？

— Vor
source

215

Boto3最近有一项自定义功能，可以帮助您（其中包括其他方面）。当前，它在低级S3客户端上公开，可以这样使用：

s3_client = boto3.client('s3')
open('hello.txt').write('Hello, world!')

# Upload the file to S3
s3_client.upload_file('hello.txt', 'MyBucket', 'hello-remote.txt')

# Download the file from S3
s3_client.download_file('MyBucket', 'hello-remote.txt', 'hello2.txt')
print(open('hello2.txt').read())

这些功能将自动处理读/写文件，以及并行并行处理大文件。

请注意，s3_client.download_file不会创建目录。可以将其创建为pathlib.Path('/path/to/file.txt').parent.mkdir(parents=True, exist_ok=True)。

— 丹尼尔
source

1

@Daniel：感谢您的回复。如果我想使用boto3中的分段上传功能来上传文件，可以回答吗？

— 拉胡尔KP 2015年

1

@RahulKumarPatle upload_file方法将自动对大型文件使用分段上传。

— 丹尼尔（Daniel）

4

您如何使用这种方法传递凭据？

— JHowIX '02

1

@JHowIX可以全局配置凭据（例如，参见boto3.readthedocs.org/en/latest/guide/…），也可以在创建客户端时传递它们。有关可用选项的更多信息，请参见boto3.readthedocs.org/en/latest/reference/core/…！

— 丹尼尔（Daniel）

2

@VladNikiporoff“从源上传到目的地”“从源下载到目的地”

— jkdev

59

boto3现在具有比客户端更好的界面：

resource = boto3.resource('s3')
my_bucket = resource.Bucket('MyBucket')
my_bucket.download_file(key, local_filename)

就其本身而言，它并没有比client接受的答案好得多（尽管文档说它在失败时重试上载和下载做得更好），但考虑到资源通常更符合人体工程学（例如，s3 存储桶和对象资源）比客户端方法更好），这确实使您可以停留在资源层而不必下拉。

Resources 通常，可以使用与客户端相同的方式来创建它们，并且它们采用全部或大部分相同的参数，然后将其转发给其内部客户端。

— 法定代表人
source

1

很好的例子，并且由于原始问题询问是否保存对象而需要添加，此处的相关方法是my_bucket.upload_file()（或my_bucket.upload_fileobj()是否有BytesIO对象）。

— SMX

究竟文档在哪里说resource在重试方面做得更好？我找不到任何此类指示。

— Acumenus

42

对于那些想模拟set_contents_from_string类似boto2方法的人，您可以尝试

import boto3
from cStringIO import StringIO

s3c = boto3.client('s3')
contents = 'My string to save to S3 object'
target_bucket = 'hello-world.by.vor'
target_file = 'data/hello.txt'
fake_handle = StringIO(contents)

# notice if you do fake_handle.read() it reads like a file handle
s3c.put_object(Bucket=target_bucket, Key=target_file, Body=fake_handle.read())

对于Python3：

在python3中，StringIO和cStringIO都消失了。StringIO像这样使用导入：

from io import StringIO

要同时支持两个版本：

try:
   from StringIO import StringIO
except ImportError:
   from io import StringIO

— csseller
source

15

那就是答案。问题是：“如何使用boto3将字符串保存到S3对象？”

— jkdev

对于python3，我必须使用import io; fake_handl e = io.StringIO（content）

— Felix

16

# Preface: File is json with contents: {'name': 'Android', 'status': 'ERROR'}

import boto3
import io

s3 = boto3.resource('s3')

obj = s3.Object('my-bucket', 'key-to-file.json')
data = io.BytesIO()
obj.download_fileobj(data)

# object is now a bytes string, Converting it to a dict:
new_dict = json.loads(data.getvalue().decode("utf-8"))

print(new_dict['status']) 
# Should print "Error"

— 萨姆纳勋爵
source

14

切勿在代码中放入AWS_ACCESS_KEY_ID或AWS_SECRET_ACCESS_KEY。这些应该使用awscli aws configure命令定义，它们将由自动找到botocore。

— Miles Erickson

3

当您想要读取与默认配置不同的文件时，请mpu.aws.s3_download(s3path, destination)直接使用或复制粘贴的代码：

def s3_download(source, destination,
                exists_strategy='raise',
                profile_name=None):
    """
    Copy a file from an S3 source to a local destination.

    Parameters
    ----------
    source : str
        Path starting with s3://, e.g. 's3://bucket-name/key/foo.bar'
    destination : str
    exists_strategy : {'raise', 'replace', 'abort'}
        What is done when the destination already exists?
    profile_name : str, optional
        AWS profile

    Raises
    ------
    botocore.exceptions.NoCredentialsError
        Botocore is not able to find your credentials. Either specify
        profile_name or add the environment variables AWS_ACCESS_KEY_ID,
        AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN.
        See https://boto3.readthedocs.io/en/latest/guide/configuration.html
    """
    exists_strategies = ['raise', 'replace', 'abort']
    if exists_strategy not in exists_strategies:
        raise ValueError('exists_strategy \'{}\' is not in {}'
                         .format(exists_strategy, exists_strategies))
    session = boto3.Session(profile_name=profile_name)
    s3 = session.resource('s3')
    bucket_name, key = _s3_path_split(source)
    if os.path.isfile(destination):
        if exists_strategy is 'raise':
            raise RuntimeError('File \'{}\' already exists.'
                               .format(destination))
        elif exists_strategy is 'abort':
            return
    s3.Bucket(bucket_name).download_file(key, destination)

from collections import namedtuple

S3Path = namedtuple("S3Path", ["bucket_name", "key"])


def _s3_path_split(s3_path):
    """
    Split an S3 path into bucket and key.

    Parameters
    ----------
    s3_path : str

    Returns
    -------
    splitted : (str, str)
        (bucket, key)

    Examples
    --------
    >>> _s3_path_split('s3://my-bucket/foo/bar.jpg')
    S3Path(bucket_name='my-bucket', key='foo/bar.jpg')
    """
    if not s3_path.startswith("s3://"):
        raise ValueError(
            "s3_path is expected to start with 's3://', " "but was {}"
            .format(s3_path)
        )
    bucket_key = s3_path[len("s3://"):]
    bucket_name, key = bucket_key.split("/", 1)
    return S3Path(bucket_name, key)

— 马丁·托马
source

不起作用 NameError: name '_s3_path_split' is not defined

— 刘德华

@DaveLiu谢谢您的提示；我已经调整了代码。不过，该软件包以前应该可以使用。

— 马丁·托马

1

注意：我假设您已经分别配置了身份验证。下面的代码是从S3存储桶下载单个对象。

import boto3

#initiate s3 client 
s3 = boto3.resource('s3')

#Download object to the file    
s3.Bucket('mybucket').download_file('hello.txt', '/tmp/hello.txt')

— 塔沙尔·尼拉斯（Tushar Niras）
source