如何在列出对象时更改AWS S3 V2 Java api的限制[对于具有超过10亿个对象的存储桶]?

时间:2022-03-01 17:16:00

I am working on project where I need to download keys from Amazon S3 bucket, which has more than 1 Billion objects. I wrote a code using Java V2 API but it doesn't help as it downloads only 1000 keys at a time. Its takes days to get list of all keys from this bucket. Is there any faster way to get all list of keys.

我正在开发项目,我需要从Amazon S3存储桶下载密钥,该存储桶拥有超过10亿个对象。我使用Java V2 API编写了一个代码,但它没有帮助,因为它一次只下载1000个密钥。从这个桶中获取所有密钥的列表需要几天的时间。有没有更快的方法来获取所有键列表。

I have checked other answers related to this topic and it didn't help.

我已经检查了与此主题相关的其他答案,但没有帮助。

Thanks

1 个解决方案

#1


1  

We had the same issue with a large number of objects.

我们对大量对象有同样的问题。

We followed a pattern timestamp in 10 increments in their object name. It looks like this,

我们在对象名称中以10个增量跟随模式时间戳。看起来像这样,

s3://bucket-name/timestamp/actualobject.extension

Eg.,
s3://mys3bucket/1506237300/datafile001.json

When you iterate through I have parallel threads running for each timestamp for 15-minute increments and everything was read very fast.

当您遍历时,我为每个时间戳运行并行线程,以15分钟为增量,所有内容都被非常快速地读取。

The key way to solve is to find out the pattern you have used in storing those objects and list the object names based on those patterns.

解决的关键方法是找出用于存储这些对象的模式,并根据这些模式列出对象名称。

Hope it helps.

希望能帮助到你。

#1


1  

We had the same issue with a large number of objects.

我们对大量对象有同样的问题。

We followed a pattern timestamp in 10 increments in their object name. It looks like this,

我们在对象名称中以10个增量跟随模式时间戳。看起来像这样,

s3://bucket-name/timestamp/actualobject.extension

Eg.,
s3://mys3bucket/1506237300/datafile001.json

When you iterate through I have parallel threads running for each timestamp for 15-minute increments and everything was read very fast.

当您遍历时,我为每个时间戳运行并行线程,以15分钟为增量,所有内容都被非常快速地读取。

The key way to solve is to find out the pattern you have used in storing those objects and list the object names based on those patterns.

解决的关键方法是找出用于存储这些对象的模式,并根据这些模式列出对象名称。

Hope it helps.

希望能帮助到你。