Question looks fairly simple but in my case it has some intricacies involved in it. So here is the deal:
问题看起来相当简单,但就我而言,它有一些复杂性。所以这里是交易:
- There is one sql database hosted on-premise and one sql Azure hosted database. We need to keep both databases in sync. These databases contain 50 tables.
-
Azure database will not be updated by any application but on premise database will keep on updating frequently. So we need modified/inserted data from on premise database to be moved to Azure database. We are using Azure Data Factory (ADF) for this.
任何应用程序都不会更新Azure数据库,但是内部部署数据库会不断更新。因此,我们需要将来自内部数据库的修改/插入数据移动到Azure数据库。我们正在使用Azure数据工厂(ADF)。
-
All database tables contain one column called LastModifiedDate indicating when was record modified.
所有数据库表都包含一个名为LastModifiedDate的列,指示何时修改了记录。
- Currently we have created staging tables corresponding to all 50 tables. We are maintaining one watermark table which contains table name and it's highest LastModifiedDate.
- There is one activity in ADF job which executes stored procedure which takes records from all tables having LastModifiedDate > corresponding LastModifiedDate in watermark table and dumps them into staging table.
- When execution of this stored procedure completes, all data from staging table is synced with Azure database tables. Finally watermark table is updated LastModifiedDate for each table. All staging tables are then flushed.
- This process will keep on repeating periodically, so that whenever data from on-premise is updated, Azure database will also update.
有一个sql数据库托管在内部和一个sql Azure托管数据库。我们需要保持两个数据库同步。这些数据库包含50个表。
目前,我们已经创建了与所有50个表相对应的登台表。我们维护一个包含表名的水印表,它的LastModifiedDate最高。
ADF作业中有一个活动执行存储过程,该过程从具有LastModifiedDate>水印表中相应LastModifiedDate的所有表中获取记录,并将它们转储到临时表中。
完成此存储过程的执行后,来自登台表的所有数据都将与Azure数据库表同步。最后,水印表为每个表更新LastModifiedDate。然后刷新所有临时表。
此过程将不断重复,以便每当更新内部部署数据时,Azure数据库也将更新。
Problems with current approach:
当前方法的问题:
Creating staging table correspoding to each table doesn't look like a good idea. If number of tables in database increases, we need those many corresponding staging table.
创建临时表对应每个表并不是一个好主意。如果数据库中的表数量增加,我们需要那些许多相应的临时表。
Question:
Is there any better approach to handle this scenario with using ADF and without creating huge number of staging tables?
有没有更好的方法来使用ADF并且不创建大量的临时表来处理这种情况?
1 个解决方案
#1
1
You can try using SQL Data Sync instead and make SQL Data Sync just to sync in one direction, from on-premise to Azure SQL Database. When configuring SQL Data Sync choose "To the hub" on "Sync Directions" as shown on below image.
您可以尝试使用SQL数据同步,并使SQL数据同步只是在一个方向同步,从内部部署到Azure SQL数据库。配置SQL数据同步时,在“同步方向”上选择“到集线器”,如下图所示。
#1
1
You can try using SQL Data Sync instead and make SQL Data Sync just to sync in one direction, from on-premise to Azure SQL Database. When configuring SQL Data Sync choose "To the hub" on "Sync Directions" as shown on below image.
您可以尝试使用SQL数据同步,并使SQL数据同步只是在一个方向同步,从内部部署到Azure SQL数据库。配置SQL数据同步时,在“同步方向”上选择“到集线器”,如下图所示。