使用客户端加密将DynamoDB表导出到S3

时间:2021-07-14 23:06:52

I'm trying to use Data Pipeline to export data to s3 from Dynamo. However, I can't figure out how to apply client side encryption before the file is written to s3. Is there a way to do this with Data Pipeline? I am able to set up everything except the client side encryption with Data Pipeline. The ideal flow is a dynamo source node, an activity to encrypt, and a S3 destination node.

我正在尝试使用Data Pipeline将数据从Dynamo导出到s3。但是,在将文件写入s3之前,我无法弄清楚如何应用客户端加密。有没有办法用Data Pipeline做到这一点?我可以使用Data Pipeline设置除客户端加密之外的所有内容。理想的流程是发电机源节点,加密活动和S3目标节点。

I also tried Elastic MapReduce, but I don't see how to write a mapper and a reducer since I'm not transforming any data - I just need to move it to an encrypted file on s3. I should be able to use EMR with a hive program, but I am struggling to understand how to use EMR without writing custom map/reduce code. Ideally, no code is stored in S3.

我也试过Elastic MapReduce,但我没有看到如何编写映射器和reducer,因为我没有转换任何数据 - 我只需要将它移动到s3上的加密文件。我应该能够将EMR与hive程序一起使用,但我很难理解如何在不编写自定义map / reduce代码的情况下使用EMR。理想情况下,S3中不存储任何代码。

Server side encryption isn't an option and the data needs to be encrypted before being written to s3.

服务器端加密不是一种选择,数据在写入s3之前需要加密。

I am looking for some ideas on how to do this or someone who had a similar challenge.

我正在寻找有关如何做到这一点或有类似挑战的人的一些想法。

2 个解决方案

#1


The current Data Pipelines solution doesn't currently support hooks for custom pre or post-processing.

当前的数据管道解决方案目前不支持自定义预处理或后处理的挂钩。

How large is your table? How long is acceptable for the export process to complete?

你的桌子有多大?出口过程可以接受多长时间?

It should be possible to do this with DynamoDB parallel scan: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html#QueryAndScanParallelScan

应该可以使用DynamoDB并行扫描执行此操作:http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html#QueryAndScanParallelScan

Essentially you would write a program to use multiple threads to process the scan segments for the parallel scan, perform the encryption, and store the encrypted items in S3. Each DynamoDB scan page should return ~1MB of data, so you could aggregate multiple pages before publishing to S3.

基本上,您可以编写一个程序来使用多个线程来处理并行扫描的扫描段,执行加密,并将加密的项存储在S3中。每个DynamoDB扫描页面应返回约1MB的数据,因此您可以在发布到S3之前聚合多个页面。

To restore the data, you would load the S3 files, decrypt, and then write back to DynamoDB.

要还原数据,您将加载S3文件,解密,然后写回DynamoDB。

#2


If this is acceptable for your use case, you can do client-side encryption before writing your data in DynamoDB. You could then use Data Pipelines to export your encrypted data to S3.

如果这对于您的用例是可接受的,则可以在DynamoDB中写入数据之前进行客户端加密。然后,您可以使用数据管道将加密数据导出到S3。

I have a similar setup for my application using a client-side encryption library provided by aws-labs. We export the tables daily to keep backups. Restoring the data works as long as the encryption metadata is exported with it.

我使用aws-labs提供的客户端加密库为我的应用程序设置了类似的设置。我们每天导出表以保持备份。只要使用加密元数据导出,就可以恢复数据。

#1


The current Data Pipelines solution doesn't currently support hooks for custom pre or post-processing.

当前的数据管道解决方案目前不支持自定义预处理或后处理的挂钩。

How large is your table? How long is acceptable for the export process to complete?

你的桌子有多大?出口过程可以接受多长时间?

It should be possible to do this with DynamoDB parallel scan: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html#QueryAndScanParallelScan

应该可以使用DynamoDB并行扫描执行此操作:http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html#QueryAndScanParallelScan

Essentially you would write a program to use multiple threads to process the scan segments for the parallel scan, perform the encryption, and store the encrypted items in S3. Each DynamoDB scan page should return ~1MB of data, so you could aggregate multiple pages before publishing to S3.

基本上,您可以编写一个程序来使用多个线程来处理并行扫描的扫描段,执行加密,并将加密的项存储在S3中。每个DynamoDB扫描页面应返回约1MB的数据,因此您可以在发布到S3之前聚合多个页面。

To restore the data, you would load the S3 files, decrypt, and then write back to DynamoDB.

要还原数据,您将加载S3文件,解密,然后写回DynamoDB。

#2


If this is acceptable for your use case, you can do client-side encryption before writing your data in DynamoDB. You could then use Data Pipelines to export your encrypted data to S3.

如果这对于您的用例是可接受的,则可以在DynamoDB中写入数据之前进行客户端加密。然后,您可以使用数据管道将加密数据导出到S3。

I have a similar setup for my application using a client-side encryption library provided by aws-labs. We export the tables daily to keep backups. Restoring the data works as long as the encryption metadata is exported with it.

我使用aws-labs提供的客户端加密库为我的应用程序设置了类似的设置。我们每天导出表以保持备份。只要使用加密元数据导出,就可以恢复数据。