I have implemented delta detection while loading data warehouse from transaction systems using an identity column or date-time column in source transaction tables. When data needs to be extracted next time, the maximum date-time value extracted last time is used in the filter of extraction query to identify new or changed records. This was good enough except when there were multiple transactions at the same milli second.
我在使用源事务表中的标识列或日期时间列从事务系统加载数据仓库时实现了增量检测。当下次需要提取数据时,上次提取的最大日期时间值用于提取查询的过滤器,以识别新的或已更改的记录。这很好,除非在同一毫秒内有多个交易。
But now we have Change Data Capture (CDC) with SQL Server 2008 and it provides a new stuff called LSN (Log Sequence Number) which is binary of length 10. Now I am confused. Which data should be stored for windowing purpose, the LSN or the date-time. Of course LSN eliminates the need for storing additional date-time values in large transaction tables, but does this have any disadvantages? Which one should I use? I feel, the mapping of LSN to date-time and then storing date-time is not a reliable method. What is your opinion?
但是现在我们有了SQL Server 2008的变更数据捕获(CDC),它提供了一个名为LSN(日志序列号)的新东西,它是长度为10的二进制文件。现在我很困惑。应存储哪些数据用于窗口目的,LSN或日期时间。当然LSN消除了在大型事务表中存储额外日期时间值的需要,但这有什么缺点吗?我应该使用哪一个?我觉得,LSN到日期时间的映射然后存储日期时间并不是一种可靠的方法。你有什么意见?
PS: To, non-BI professionals, Sorry.
PS:对非BI专业人士,对不起。
3 个解决方案
#1
See Improving Incremental Loads with Change Data Capture for information on using CDC with SSIS.
有关将CDC与SSIS结合使用的信息,请参阅使用更改数据捕获改进增量负载。
#2
After a lot of wait I don't see any further answers here. I have used LSN in my current project for windowing and I find it better than date time values as it is more precise and the process is simple. I recommend using LSN. If anyone out there disagree, please let me know...
经过很多等待后,我在这里看不到任何进一步的答案。我在当前的窗口项目中使用了LSN,我觉得它比日期时间值更好,因为它更精确,过程很简单。我推荐使用LSN。如果有人不同意,请告诉我...
#3
If you set up CDC, you get a system table added to your database with the name cdc.lsn_time_mapping so you can use either.
如果设置了CDC,则会在系统中添加一个名为cdc.lsn_time_mapping的系统表,以便您可以使用。
#1
See Improving Incremental Loads with Change Data Capture for information on using CDC with SSIS.
有关将CDC与SSIS结合使用的信息,请参阅使用更改数据捕获改进增量负载。
#2
After a lot of wait I don't see any further answers here. I have used LSN in my current project for windowing and I find it better than date time values as it is more precise and the process is simple. I recommend using LSN. If anyone out there disagree, please let me know...
经过很多等待后,我在这里看不到任何进一步的答案。我在当前的窗口项目中使用了LSN,我觉得它比日期时间值更好,因为它更精确,过程很简单。我推荐使用LSN。如果有人不同意,请告诉我...
#3
If you set up CDC, you get a system table added to your database with the name cdc.lsn_time_mapping so you can use either.
如果设置了CDC,则会在系统中添加一个名为cdc.lsn_time_mapping的系统表,以便您可以使用。