使用Apache Camel基于S3事件通知传输S3文件

时间:2022-07-12 10:34:00

I have an application (running on an EC2 instance) that parses new log files arriving in an S3 bucket. I have setup S3 notifications, on this bucket, to put a message in SQS queue on PUT request. The aim is to transfer a newly arrived log file from S3 to the EC2 instance.

我有一个应用程序(在EC2实例上运行),它解析到达S3存储桶的新日志文件。我在此存储桶上设置了S3通知,以便在PUT请求中将消息放入SQS队列中。目的是将新到达的日志文件从S3传输到EC2实例。

I think Apache Camel would be an ideal candidate for this problem. I have following questions/observations on the solution and would greatly appreciate some feedback on that:

我认为Apache Camel将是解决这个问题的理想人选。我对解决方案有以下问题/观察,并非常感谢对此的一些反馈:

  1. We can have a Camel route to read messages from SQS and then extract the object key from the JSON message. Is it possible to use this object key in another Camel route (using S3 component) that copies this object from S3 to local FS? From the looks of it seems that the Camel S3 component is only capable of processing new objects in a bucket since it takes only bucket name in the URI. Thus it might not be possible to copy a specific key from a S3 bucket. Also I am not sure if its possible to have dynamic URIs in Camel since object name would be different each time.

    我们可以使用Camel路由从SQS读取消息,然后从JSON消息中提取对象密钥。是否可以在另一个Camel路由(使用S3组件)中将此对象密钥从S3复制到本地FS?从外观上看,Camel S3组件似乎只能处理存储桶中的新对象,因为它只占用URI中的存储桶名称。因此,可能无法从S3存储桶复制特定密钥。此外,我不确定它是否可能在Camel中具有动态URI,因为每次对象名称都不同。

  2. If Camel S3 component is indeed capable of processing new objects arriving in a S3 bucket then we dont need S3 notifications altogether. But this has a problem (since its stateless) that if for some reason this application goes down then it will miss the objects that arrived during downtime.

    如果Camel S3组件确实能够处理到达S3存储桶的新对象,那么我们完全不需要S3通知。但这有一个问题(因为它的无状态),如果由于某种原因这个应用程序失败,那么它将错过在停机期间到达的对象。

  3. Another possibility is to use Camel for reading messages from SQS, process those messages and copy object from S3 using a normal Java class. But this kind of doesnt fully utilize the strengths of Apache Camel.

    另一种可能性是使用Camel从SQS读取消息,使用普通的Java类处理这些消息并从S3复制对象。但这种方式并没有充分利用Apache Camel的优势。

1 个解决方案

#1


0  

Having a SNS to decouple the file processing and file transfer is good. But do you really need this? It will add the complexity of your application. If you just want to processing the incoming files to the S3 bucket without miss, the simplest way is to create a bucket say "incoming" and another call "processed". Each time you application can scan the incoming bucket and process it, after done put the file to the processed bucket.

有一个SNS来解耦文件处理和文件传输是好的。但你真的需要这个吗?它会增加应用程序的复杂性。如果您只想将传入的文件处理到S3存储桶而不会错过,最简单的方法是创建一个说“传入”的存储桶,另一个调用“已处理”。每次您的应用程序都可以扫描传入的存储桶并对其进行处理,完成后将文件放入已处理的存储桶。

One thing you need to take care of is to flag the file under processing to avoid conflict. You can put special suffix to it.

您需要注意的一件事是标记正在处理的文件以避免冲突。你可以为它添加特殊后缀。

Also AWS provide third way to do "Event Notifications" call " Lambda function". You can also take a look of that. It can enable some simple handler which do not need writing extra application yourself.

此外,AWS还提供了第三种方法来执行“事件通知”调用“Lambda函数”。你也可以看一下。它可以启用一些简单的处理程序,不需要自己编写额外的应用程序。

#1


0  

Having a SNS to decouple the file processing and file transfer is good. But do you really need this? It will add the complexity of your application. If you just want to processing the incoming files to the S3 bucket without miss, the simplest way is to create a bucket say "incoming" and another call "processed". Each time you application can scan the incoming bucket and process it, after done put the file to the processed bucket.

有一个SNS来解耦文件处理和文件传输是好的。但你真的需要这个吗?它会增加应用程序的复杂性。如果您只想将传入的文件处理到S3存储桶而不会错过,最简单的方法是创建一个说“传入”的存储桶,另一个调用“已处理”。每次您的应用程序都可以扫描传入的存储桶并对其进行处理,完成后将文件放入已处理的存储桶。

One thing you need to take care of is to flag the file under processing to avoid conflict. You can put special suffix to it.

您需要注意的一件事是标记正在处理的文件以避免冲突。你可以为它添加特殊后缀。

Also AWS provide third way to do "Event Notifications" call " Lambda function". You can also take a look of that. It can enable some simple handler which do not need writing extra application yourself.

此外,AWS还提供了第三种方法来执行“事件通知”调用“Lambda函数”。你也可以看一下。它可以启用一些简单的处理程序,不需要自己编写额外的应用程序。