Can we directly copy a table from one redshift cluster to another redshift cluster?
我们可以直接将表从一个红移群集复制到另一个红移群集吗?
I know table copying can be achieved using s3 as temp storage(i.e. unload to s3 from first cluster and then copy from s3 to another cluster).
我知道可以使用s3作为临时存储来实现表复制(即从第一个集群卸载到s3,然后从s3复制到另一个集群)。
1 个解决方案
#1
So the answer is NO. Following is the reply I got from AWS Support.
所以答案是否定的。以下是我从AWS Support获得的回复。
Hello, Thank you very much for contacting AWS Support. With Amazon RedShift, we do not have a mechanism to directly copy data from a table in a RedShift cluster to another table in another RedShift cluster. The normal procedure to achieve a similar result would be:
您好,非常感谢您与AWS Support联系。使用Amazon RedShift,我们没有一种机制可以将数据从RedShift集群中的表直接复制到另一个RedShift集群中的另一个表。实现类似结果的正常程序是:
(1) UNLOAD to S3, then COPY from S3
(1)卸载到S3,然后从S3复制
With this approach, you use S3 as the intermediate storage. First you UNLOAD the data from the source cluster to S3, then COPY the data from S3 on the destination cluster. This is the method that you are familiar with, and is also the method we recommend. RedShift was designed to work with S3, and can achieve high efficiency with relatively low cost in doing this. For more information about UNLOADD and COPY operations in RedShift, please refer to the following AWS documentation:
使用此方法,您可以使用S3作为中间存储。首先,将源数据集中的数据卸载到S3,然后从目标集群上的S3复制数据。这是您熟悉的方法,也是我们推荐的方法。 RedShift旨在与S3配合使用,并且可以以相对较低的成本实现高效率。有关RedShift中UNLOADD和COPY操作的更多信息,请参阅以下AWS文档:
http://docs.aws.amazon.com/redshift/latest/dg/t_Unloading_tables.html http://docs.aws.amazon.com/redshift/latest/dg/t_loading-tables-from-s3.html
(2) Using a cluster snapshot
(2)使用群集快照
An alternative approach would be creating a snapshot of the source cluster, then restore the snapshot as the destination cluster. After that, drop the unnecessary tables from the destination cluster. The issue is, if you only need a small portion of the data (for example, one of the ten tables) on the destination cluster, then you might be using a (relatively) big cluster for a (relatively) small application.
另一种方法是创建源群集的快照,然后将快照还原为目标群集。之后,从目标群集中删除不必要的表。问题是,如果您只需要目标集群上的一小部分数据(例如,十个表中的一个),那么您可能正在为(相对)小型应用程序使用(相对)大型集群。
For more information about managing RedShift cluster snapshots, please refer to the following AWS documentation:
有关管理RedShift群集快照的更多信息,请参阅以下AWS文档:
http://docs.aws.amazon.com/redshift/latest/mgmt/managing-snapshots-console.html
In summary, we prefer the UNLOAD and COPY process, which is quite straight forward and cost-effective.
总之,我们更喜欢UNLOAD和COPY流程,这是一个非常直接且具有成本效益的流程。
#1
So the answer is NO. Following is the reply I got from AWS Support.
所以答案是否定的。以下是我从AWS Support获得的回复。
Hello, Thank you very much for contacting AWS Support. With Amazon RedShift, we do not have a mechanism to directly copy data from a table in a RedShift cluster to another table in another RedShift cluster. The normal procedure to achieve a similar result would be:
您好,非常感谢您与AWS Support联系。使用Amazon RedShift,我们没有一种机制可以将数据从RedShift集群中的表直接复制到另一个RedShift集群中的另一个表。实现类似结果的正常程序是:
(1) UNLOAD to S3, then COPY from S3
(1)卸载到S3,然后从S3复制
With this approach, you use S3 as the intermediate storage. First you UNLOAD the data from the source cluster to S3, then COPY the data from S3 on the destination cluster. This is the method that you are familiar with, and is also the method we recommend. RedShift was designed to work with S3, and can achieve high efficiency with relatively low cost in doing this. For more information about UNLOADD and COPY operations in RedShift, please refer to the following AWS documentation:
使用此方法,您可以使用S3作为中间存储。首先,将源数据集中的数据卸载到S3,然后从目标集群上的S3复制数据。这是您熟悉的方法,也是我们推荐的方法。 RedShift旨在与S3配合使用,并且可以以相对较低的成本实现高效率。有关RedShift中UNLOADD和COPY操作的更多信息,请参阅以下AWS文档:
http://docs.aws.amazon.com/redshift/latest/dg/t_Unloading_tables.html http://docs.aws.amazon.com/redshift/latest/dg/t_loading-tables-from-s3.html
(2) Using a cluster snapshot
(2)使用群集快照
An alternative approach would be creating a snapshot of the source cluster, then restore the snapshot as the destination cluster. After that, drop the unnecessary tables from the destination cluster. The issue is, if you only need a small portion of the data (for example, one of the ten tables) on the destination cluster, then you might be using a (relatively) big cluster for a (relatively) small application.
另一种方法是创建源群集的快照,然后将快照还原为目标群集。之后,从目标群集中删除不必要的表。问题是,如果您只需要目标集群上的一小部分数据(例如,十个表中的一个),那么您可能正在为(相对)小型应用程序使用(相对)大型集群。
For more information about managing RedShift cluster snapshots, please refer to the following AWS documentation:
有关管理RedShift群集快照的更多信息,请参阅以下AWS文档:
http://docs.aws.amazon.com/redshift/latest/mgmt/managing-snapshots-console.html
In summary, we prefer the UNLOAD and COPY process, which is quite straight forward and cost-effective.
总之,我们更喜欢UNLOAD和COPY流程,这是一个非常直接且具有成本效益的流程。