Does amazon s3 support batch uploads? I have a job that needs to upload each night ~100K of files that can be up to 1G but is strongly skewed towards small files (90% are less than 100 bytes and 99% are less than 1000 bytes long).
亚马逊s3是否支持批量上传?我有一份工作需要每晚上传~100K的文件,最高可达1G但是偏向于小文件(90%小于100字节,99%长度小于1000字节)。
Does the s3 API support uploading multiple objects in a single HTTP call?
s3 API是否支持在单个HTTP调用中上传多个对象?
All the objects must be available in S3 as individual objects. I cannot host them anywhere else (FTP, etc) or in another format (Database, EC2 local drive, etc). That is an external requirement that I cannot change.
所有对象必须在S3中作为单个对象可用。我无法在其他任何地方(FTP等)或其他格式(数据库,EC2本地驱动器等)托管它们。这是我无法改变的外部要求。
4 个解决方案
#1
26
Does the s3 API support uploading multiple objects in a single HTTP call?
s3 API是否支持在单个HTTP调用中上传多个对象?
No, the S3 PUT operation only supports uploading one object per HTTP request.
不,S3 PUT操作仅支持每个HTTP请求上传一个对象。
You could install S3 Tools on your machine that you want to synchronize with the remote bucket, and run the following command:
您可以在计算机上安装要与远程存储桶同步的S3 Tools,然后运行以下命令:
s3cmd sync localdirectory s3://bucket/
Then you could place this command in a script and create a scheduled job to run this command each night.
然后,您可以将此命令放在脚本中,并创建一个计划作业,以便每晚运行此命令。
This should do what you want.
这应该做你想要的。
The tool performs the file synchronization based on MD5 hashes and filesize, so collision should be rare (if you really want you could just use the "s3cmd put" command to force blind overwriting of objects in your target bucket).
该工具基于MD5哈希值和文件大小执行文件同步,因此碰撞应该很少(如果您真的希望您可以使用“s3cmd put”命令强制盲目覆盖目标存储桶中的对象)。
EDIT: Also make sure that you read the documentation on the site I linked for S3 Tools - there are different flags needed for whether you want files deleted locally to be deleted from the bucket or ignored etc.
编辑:还要确保您阅读我为S3 Tools链接的站点上的文档 - 您是否希望从桶中删除本地删除的文件或忽略等所需的不同标志。
#2
32
Alternatively, you can upload S3 via AWS CLI tool using the sync command.
或者,您可以使用sync命令通过AWS CLI工具上传S3。
aws s3 sync local_folder s3://bucket-name
aws s3 sync local_folder s3:// bucket-name
You can use this method to batch upload files to S3 very fast.
您可以使用此方法非常快速地将文件批量上载到S3。
#3
0
One file (or part of a file) = one HTTP request, but the Java API now supports efficient multiple file upload without having to write the multithreading on your own, by using TransferManager
一个文件(或文件的一部分)=一个HTTP请求,但Java API现在支持高效的多文件上载,而无需使用TransferManager自行编写多线程
#4
0
If you want to use Java program to do it you can do:
如果您想使用Java程序来执行此操作,您可以执行以下操作:
public void uploadFolder(String bucket, String path, boolean includeSubDirectories) {
File dir = new File(path);
MultipleFileUpload upload = transferManager.uploadDirectory(bucket, "", dir, includeSubDirectories);
try {
upload.waitForCompletion();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
Creation of s3client and transfer manager to connect to local S3 if you wish to test is as below:
如果您想测试,创建s3client和传输管理器以连接到本地S3如下:
AWSCredentials credentials = new BasicAWSCredentials(accessKey, token);
s3Client = new AmazonS3Client(credentials); // This is deprecated but you can create using standard beans provided by spring/aws
s3Client.setEndpoint("http://127.0.0.1:9000");//If you wish to connect to local S3 using minio etc...
TransferManager transferManager = TransferManagerBuilder.standard().withS3Client(s3Client).build();
#1
26
Does the s3 API support uploading multiple objects in a single HTTP call?
s3 API是否支持在单个HTTP调用中上传多个对象?
No, the S3 PUT operation only supports uploading one object per HTTP request.
不,S3 PUT操作仅支持每个HTTP请求上传一个对象。
You could install S3 Tools on your machine that you want to synchronize with the remote bucket, and run the following command:
您可以在计算机上安装要与远程存储桶同步的S3 Tools,然后运行以下命令:
s3cmd sync localdirectory s3://bucket/
Then you could place this command in a script and create a scheduled job to run this command each night.
然后,您可以将此命令放在脚本中,并创建一个计划作业,以便每晚运行此命令。
This should do what you want.
这应该做你想要的。
The tool performs the file synchronization based on MD5 hashes and filesize, so collision should be rare (if you really want you could just use the "s3cmd put" command to force blind overwriting of objects in your target bucket).
该工具基于MD5哈希值和文件大小执行文件同步,因此碰撞应该很少(如果您真的希望您可以使用“s3cmd put”命令强制盲目覆盖目标存储桶中的对象)。
EDIT: Also make sure that you read the documentation on the site I linked for S3 Tools - there are different flags needed for whether you want files deleted locally to be deleted from the bucket or ignored etc.
编辑:还要确保您阅读我为S3 Tools链接的站点上的文档 - 您是否希望从桶中删除本地删除的文件或忽略等所需的不同标志。
#2
32
Alternatively, you can upload S3 via AWS CLI tool using the sync command.
或者,您可以使用sync命令通过AWS CLI工具上传S3。
aws s3 sync local_folder s3://bucket-name
aws s3 sync local_folder s3:// bucket-name
You can use this method to batch upload files to S3 very fast.
您可以使用此方法非常快速地将文件批量上载到S3。
#3
0
One file (or part of a file) = one HTTP request, but the Java API now supports efficient multiple file upload without having to write the multithreading on your own, by using TransferManager
一个文件(或文件的一部分)=一个HTTP请求,但Java API现在支持高效的多文件上载,而无需使用TransferManager自行编写多线程
#4
0
If you want to use Java program to do it you can do:
如果您想使用Java程序来执行此操作,您可以执行以下操作:
public void uploadFolder(String bucket, String path, boolean includeSubDirectories) {
File dir = new File(path);
MultipleFileUpload upload = transferManager.uploadDirectory(bucket, "", dir, includeSubDirectories);
try {
upload.waitForCompletion();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
Creation of s3client and transfer manager to connect to local S3 if you wish to test is as below:
如果您想测试,创建s3client和传输管理器以连接到本地S3如下:
AWSCredentials credentials = new BasicAWSCredentials(accessKey, token);
s3Client = new AmazonS3Client(credentials); // This is deprecated but you can create using standard beans provided by spring/aws
s3Client.setEndpoint("http://127.0.0.1:9000");//If you wish to connect to local S3 using minio etc...
TransferManager transferManager = TransferManagerBuilder.standard().withS3Client(s3Client).build();