S3 Multi Part Upload 中断后续传

时间:2022-01-12 23:01:34

我们上传大文件时难免出错,但又不愿意重新开始传,这里介绍如何利用boto,继续中断的 multi part upload。

上传大文件示例

import math, os
import boto
import boto.s3.connection
from filechunkio import FileChunkIO

CONS_AK = 'RC35MU8KM1PMEQ4EFD46'
CONS_SK = 'Fg1KO10FS4uIPzSoKenmKAR2YHt052rM9u8VDik9'

# Connect to S3
c = boto.connect_s3(
aws_access_key_id=CONS_AK,
aws_secret_access_key=CONS_SK,
host='yhg-2',
port=80,
is_secure=False,
calling_format=boto.s3.connection.OrdinaryCallingFormat()
)

# b = c.get_bucket('mybucket')
b = c.create_bucket('mybucket')

# Local file path
source_path = './local-50M-file'
source_size = os.stat(source_path).st_size

# Create a multipart upload request
mul_key = 'HEHE'
header = {
'x-amz-meta-gang': 'Yang Honggang'
}
# Record upload id
# upload_id = mp.id
mp = b.initiate_multipart_upload(mul_key, headers=header)

# Use a chunk size of 20 MiB (feel free to change this)
chunk_size = 20971520
chunk_count = int(math.ceil(source_size / float(chunk_size)))

# Send the file parts, using FileChunkIO to create a file-like object
# that points to a certain byte range within the original file. We
# set bytes to never exceed the original file size.
for i in range(chunk_count):
offset = chunk_size * i
bytes = min(chunk_size, source_size - offset)
with FileChunkIO(source_path, 'r', offset=offset,
bytes=bytes) as fp:
mp.upload_part_from_file(fp, part_num=i + 1)

print "before complete"
# Finish the upload
mp.complete_upload()

中断后续传

假如上传文件过程中断,如何恢复上传呢?我们首先需要有 upload_id。

   import boto
import boto.s3.connection
from boto.s3.multipart import MultiPartUpload

CONS_AK = 'RC35MU8KM1PMEQ4EFD46'
CONS_SK = 'Fg1KO10FS4uIPzSoKenmKAR2YHt052rM9u8VDik9'

# Connect to S3
c = boto.connect_s3(
aws_access_key_id=CONS_AK,
aws_secret_access_key=CONS_SK,
host='yhg-2',
port=80,
is_secure=False,
calling_format=boto.s3.connection.OrdinaryCallingFormat()
)

bucket_name = 'mybucket'
b = c.get_bucket(bucket_name)
mul_key = 'my-multi-obj'

upload_id = '2~QfkgBbGqlzGDNPbGCvTyREOudXl4YY4'
mp = MultiPartUpload(b)
mp.key_name = mul_key
mp.bucket_name = 'mybucket'
mp.id = upload_id

   # Continue mp.upload_part_from_file() 
# Finish the upload
mp.complete_upload()

对于rgw,也可以通过如下命令查看中断的上传操作

# rados -p .rgw.buckets.extra ls --cluster yhg