如何使用boto在亚马逊S3存储桶中有效地将所有文件从一个目录复制到另一个目录?

时间:2020-12-14 16:02:50

I need to copy all keys from '/old/dir/' to '/new/dir/' in an amazon S3 bucket. I came up with this script (quick hack):

我需要在亚马逊S3存储桶中将所有键从'/ old / dir /'复制到'/ new / dir /'。我想出了这个脚本(快速破解):

import boto

s3 = boto.connect_s3()
thebucket = s3.get_bucket("bucketname")
keys = thebucket.list('/old/dir')
for k in keys:
    newkeyname = '/new/dir' + k.name.partition('/old/dir')[2]
    print 'new key name:', newkeyname
    thebucket.copy_key(newkeyname, k.bucket.name, k.name)

For now it is working but is much slower than what I can do manually in the graphical managment console by just copy/past with the mouse. Very frustrating and there are lots of keys to copy...

现在它正在工作,但比我在图形管理控制台中手动操作要慢得多,只需用鼠标复制/过去。非常令人沮丧,有很多钥匙要复制......

Do you know any quicker method ? Thanks.

你知道更快的方法吗?谢谢。

Edit: maybe I can do this with concurrent copy processes. I'm not really familiar with boto copy keys methods and how many concurrent processes I can send to amazon.

编辑:也许我可以使用并发复制过程来完成此操作。我不太熟悉boto复制密钥方法以及我可以向亚马逊发送多少并发进程。

Edit2: i'm currently learning Python multiprocessing. Let's see if I can send 50 copy operations simultaneously...

Edit2:我正在学习Python多处理。让我们看看我是否可以同时发送50个复制操作......

Edit 3: I tried with 30 concurrent copy using Python multiprocessing module. Copy was much faster than within the console and less error prone. There is a new issue with large files (>5Gb): boto raises an exception. I need to debug this before posting the updated script.

编辑3:我尝试使用Python多处理模块进行30个并发复制。复制速度比控制台内快得多,容易出错。大文件存在一个新问题(> 5Gb):boto引发异常。我需要在发布更新的脚本之前调试它。

1 个解决方案

#1


1  

Regarding your issue with files over 5GB: S3 doesn't support uploading files over 5GB using the PUT method, which is what boto tries to do (see boto source, Amazon S3 documentation).

关于5GB以上文件的问题:S3不支持使用PUT方法上传超过5GB的文件,这是boto尝试做的事情(参见boto source,Amazon S3文档)。

Unfortunately I'm not sure how you can get around this, apart from downloading it and re-uploading in a multi-part upload. I don't think boto supports a multi-part copy operation yet (if such a thing even exists)

不幸的是,我不知道如何解决这个问题,除了下载它并重新上传多部分上传。我不认为boto支持多部分复制操作(如果这样的事情甚至存在)

#1


1  

Regarding your issue with files over 5GB: S3 doesn't support uploading files over 5GB using the PUT method, which is what boto tries to do (see boto source, Amazon S3 documentation).

关于5GB以上文件的问题:S3不支持使用PUT方法上传超过5GB的文件,这是boto尝试做的事情(参见boto source,Amazon S3文档)。

Unfortunately I'm not sure how you can get around this, apart from downloading it and re-uploading in a multi-part upload. I don't think boto supports a multi-part copy operation yet (if such a thing even exists)

不幸的是,我不知道如何解决这个问题,除了下载它并重新上传多部分上传。我不认为boto支持多部分复制操作(如果这样的事情甚至存在)