We have a DynamoDB table in production that is being continuously updated,we want to load all the records from dynamoDB to redshift.
我们正在生产的DynamoDB表正在不断更新,我们希望将所有记录从dynamoDB加载到redshift。
we tried using the copy command,but since new records are continuously being inserted in table the copy command runs forever.
我们尝试使用copy命令,但由于新记录不断插入表中,因此copy命令将永远运行。
We want to know what is the best way to load the data from live dynamodb to redshift.
我们想知道从live dynamodb加载数据到redshift的最佳方法是什么。
2 个解决方案
#1
Consider looking into a DynamoDB Streams based solution. Streams provides an ordered log of the data plane events transpiring on each DynamoDB partition (so events for each primary key are absolutely ordered). You can use Kinesis Client Library and DynamoDB Streams Kinesis Adapter to process the Stream to Redshift.
考虑查看基于DynamoDB Streams的解决方案。 Streams提供每个DynamoDB分区上发生的数据平面事件的有序日志(因此每个主键的事件都是绝对有序的)。您可以使用Kinesis Client Library和DynamoDB Streams Kinesis Adapter将Stream处理为Redshift。
DynamoDB Streams is currently in preview, but should be generally available soon.
DynamoDB Streams目前处于预览阶段,但应该很快就会推出。
#2
You can utilize the following pattern:
您可以使用以下模式:
DynamoDB Streams --> AWS Lambda --> Amazon Kinesis Firehose --> Amazon Redshift.
DynamoDB Streams - > AWS Lambda - > Amazon Kinesis Firehose - > Amazon Redshift。
Diagram from AWS article DynamoDB Streams Use Cases and Design Patterns.
AWS文章DynamoDB Streams用例和设计模式的图表。
Please also see answer here, AWS DynamoDB Stream into Redshift.
另请参阅此处的答案,AWS DynamoDB Stream into Redshift。
DynamoDB streams are effectively the same as a Kinesis Data Stream, but it is automatically generated by new/changed data in DynamoDB. This allows applications to be notified when new data is added to a DynamoDB table, or when data is changed.
DynamoDB流实际上与Kinesis数据流相同,但它由DynamoDB中的新/更改数据自动生成。这允许在将新数据添加到DynamoDB表或更改数据时通知应用程序。
A Kinesis Data Firehose can automatically output a stream into Redshift (amongst other destinations).
Kinesis Data Firehose可以自动将流输出到Redshift(以及其他目的地)。
AWS Lambda can run code without provisioning or managing servers. You pay only for the compute time you consume — there's no charge when your code isn't running. You can run code for virtually any type of application or backend service — all with zero administration.
AWS Lambda可以在不配置或管理服务器的情况下运行代码。您只需为您消耗的计算时间付费 - 当您的代码未运行时,您将免费使用。您可以为几乎任何类型的应用程序或后端服务运行代码 - 所有这些都是零管理。
Lambda is useful for inspecting data coming through a stream. For example, it could be used to manipulate the data format or skip-over data that is not required.
Lambda对于检查通过流的数据非常有用。例如,它可用于操纵数据格式或跳过不需要的数据。
Putting it all together, you could have data added/modified in DynamoDB. This would cause a DynamoDB Stream to be sent that contains information about the change. An AWS Lambda function could inspect the data and manipulate/drop the message. It could then forward the data to Kinesis Data Firehose to automatically insert the data into Amazon Redshift.
总而言之,您可以在DynamoDB中添加/修改数据。这将导致发送包含有关更改信息的DynamoDB流。 AWS Lambda函数可以检查数据并操纵/删除消息。然后,它可以将数据转发到Kinesis Data Firehose,以自动将数据插入Amazon Redshift。
#1
Consider looking into a DynamoDB Streams based solution. Streams provides an ordered log of the data plane events transpiring on each DynamoDB partition (so events for each primary key are absolutely ordered). You can use Kinesis Client Library and DynamoDB Streams Kinesis Adapter to process the Stream to Redshift.
考虑查看基于DynamoDB Streams的解决方案。 Streams提供每个DynamoDB分区上发生的数据平面事件的有序日志(因此每个主键的事件都是绝对有序的)。您可以使用Kinesis Client Library和DynamoDB Streams Kinesis Adapter将Stream处理为Redshift。
DynamoDB Streams is currently in preview, but should be generally available soon.
DynamoDB Streams目前处于预览阶段,但应该很快就会推出。
#2
You can utilize the following pattern:
您可以使用以下模式:
DynamoDB Streams --> AWS Lambda --> Amazon Kinesis Firehose --> Amazon Redshift.
DynamoDB Streams - > AWS Lambda - > Amazon Kinesis Firehose - > Amazon Redshift。
Diagram from AWS article DynamoDB Streams Use Cases and Design Patterns.
AWS文章DynamoDB Streams用例和设计模式的图表。
Please also see answer here, AWS DynamoDB Stream into Redshift.
另请参阅此处的答案,AWS DynamoDB Stream into Redshift。
DynamoDB streams are effectively the same as a Kinesis Data Stream, but it is automatically generated by new/changed data in DynamoDB. This allows applications to be notified when new data is added to a DynamoDB table, or when data is changed.
DynamoDB流实际上与Kinesis数据流相同,但它由DynamoDB中的新/更改数据自动生成。这允许在将新数据添加到DynamoDB表或更改数据时通知应用程序。
A Kinesis Data Firehose can automatically output a stream into Redshift (amongst other destinations).
Kinesis Data Firehose可以自动将流输出到Redshift(以及其他目的地)。
AWS Lambda can run code without provisioning or managing servers. You pay only for the compute time you consume — there's no charge when your code isn't running. You can run code for virtually any type of application or backend service — all with zero administration.
AWS Lambda可以在不配置或管理服务器的情况下运行代码。您只需为您消耗的计算时间付费 - 当您的代码未运行时,您将免费使用。您可以为几乎任何类型的应用程序或后端服务运行代码 - 所有这些都是零管理。
Lambda is useful for inspecting data coming through a stream. For example, it could be used to manipulate the data format or skip-over data that is not required.
Lambda对于检查通过流的数据非常有用。例如,它可用于操纵数据格式或跳过不需要的数据。
Putting it all together, you could have data added/modified in DynamoDB. This would cause a DynamoDB Stream to be sent that contains information about the change. An AWS Lambda function could inspect the data and manipulate/drop the message. It could then forward the data to Kinesis Data Firehose to automatically insert the data into Amazon Redshift.
总而言之,您可以在DynamoDB中添加/修改数据。这将导致发送包含有关更改信息的DynamoDB流。 AWS Lambda函数可以检查数据并操纵/删除消息。然后,它可以将数据转发到Kinesis Data Firehose,以自动将数据插入Amazon Redshift。