如何从两个实时dbs创建和同步仅报告的组合数据库？

I need to quickly implement a read-only database containing data pulled from two identically structured live databases.

我需要快速实现一个只读数据库,其中包含从两个结构相同的实时数据库中提取的数据。

The live dbs are actually company dbs from a Dynamics accounting system so I'm happy for any Dynamics specific advice but this is mostly a SQL question. It's a fairly old version of Dynamics from before Great Plains was acquired by Microsoft. This is on SQL Server 2000.

实时dbs实际上是来自Dynamics会计系统的公司dbs,因此我对任何Dynamics特定建议感到高兴,但这主要是一个SQL问题。在Great Plains被微软收购之前,这是一个相当古老的Dynamics版本。这是在SQL Server 2000上。

We have reports and applications which access the Dynamics data. These apps are designed to look at one company db. Now we need to add another. It's appropriate that most of these reports and apps see combined data. They don't really care which company an order or invoice exists in. They only look at a small number of the tables.

我们有访问Dynamics数据的报告和应用程序。这些应用程序旨在查看一个公司数据库。现在我们需要添加另一个。这些报告和应用程序中的大多数都可以看到组合数据。他们并不关心订单或发票存在于哪家公司。他们只关注少数几个表。

It seems to me that the simplest solution is to create a reports only db with combined data. Preferably, we need an efficient way to update this db with changes several times a day.

在我看来,最简单的解决方案是创建一个仅包含数据报告的数据库。最好是,我们需要一种有效的方法来更新此数据库,每天更改几次。

I'm a developer, not a db expert but here's my plan:

我是开发人员,不是数据库专家,但这是我的计划:

Create the combined reporting db with the required tables initially with the same table structure as the live dbs.

最初使用与实时dbs相同的表结构创建具有所需表的组合报告数据库。

All Dynamics tables seem to have an int identity column called DEX_ROW_ID. I'm not sure what it's used for, (it's not indexed) but that seems like the obvious generic way to uniquely identify rows. On the reporting db I will change it to a normal int (not an identity). I will create a unique index on DEX_ROW_ID in all dbs.

所有Dynamics表似乎都有一个名为DEX_ROW_ID的int标识列。我不确定它用于什么,(它没有索引),但这似乎是唯一标识行的通用方法。在报告db上,我将其更改为普通int(不是标识)。我将在所有dbs中的DEX_ROW_ID上创建一个唯一索引。

Dynamics does not have timestamps so I will add a timestamp column to tables in the live dbs and a corresponding binary(8) column in the reporting db. I'm assuming and hoping that Dynamics won't be upset by the additional index and column.

Dynamics没有时间戳,因此我将向live dbs中的表添加timestamp列,并在报告db中添加相应的二进制(8)列。我假设并希望Dynamics不会被额外的索引和列所打扰。

Add an int CompanyId column to the reporting db tables and add it to the end of any unique indexes. Most data will be naturally unique even without that. ie, order and invoice numbers etc will be different for the two live dbs. We may need to make some minor changes to the applications but I'm not expecting to do much other than point them to the new reporting db.

将int CompanyId列添加到报告db表,并将其添加到任何唯一索引的末尾。即使没有它,大多数数据也将是自然独特的。即两个实时dbs的订单和发票号码等不同。我们可能需要对应用程序进行一些小的更改,但除了将它们指向新的报告数据库之外,我不希望做太多其他事情。

Assuming my reporting db is called Reports, the live dbs are Live1 and Live2, the timestamp column is called TS and all dbs are on the same server ... here's my first attempt at an update script for copying the changes in one table called MyTable in Live1 to the reporting db.

假设我的报告数据库名为Reports,实时数据库是Live1和Live2,时间戳列称为TS,所有数据库都在同一台服务器上...这是我第一次尝试更新脚本,用于在一个名为MyTable的表中复制更改在Live1中报告数据库。

USE Reports

CREATE TABLE #Changes
(
ReportId int,
LiveId int
)

/* Collect in a temp table the ids or rows which have been deleted or changed
in the live db L.DEX_ROW_ID will be null if the row has been deleted */

INSERT INTO #Changes
SELECT R.DEX_ROW_ID, L.DEX_ROW_ID
FROM MyTable R LEFT OUTER JOIN Live1.dbo.MyTable L ON L.DEX_ROW_ID = R.DEX_ROW_ID
WHERE R.CompanyId = 1 AND L.DEX_ROW_ID IS NULL OR L.TS <> R.TS

/* Delete rows that have been deleted or changed on the live db 
I wonder if using join syntax would run better than the subquery. */
DELETE FROM MyTable
WHERE CompanyId = 1 AND DEX_ROW_ID IN (SELECT ReportId FROM #Changes)

/* Recopy rows that have changed in the live db */
INSERT INTO MyTable
SELECT 1 AS CompanyId, * FROM Live1.dbo.MyTable L
WHERE L.DEX_ROW_ID IN (SELECT ReportId FROM #Changes WHERE LiveId IS NOT NULL)

/* Copy the rows that are new in the live db */
INSERT INTO MyTable
SELECT 1 AS CompanyId, * FROM Live1.dbo.MyTable
WHERE DEX_ROW_ID > (SELECT MAX(DEX_ROW_ID) FROM MyTable WHERE CompanyId = 1)

Then do the same for the Live2 db. Repeat for every table in Reports. I know I should use a parameter @CompanyId instead of the literal but I can't do that for the live db name some I might generate these dynamically with a C# program or something.

然后为Live2 db做同样的事情。对Reports中的每个表重复此操作。我知道我应该使用参数@CompanyId而不是文字,但我不能为实时数据库名称执行此操作,我可能会使用C#程序或其他动态生成这些名称。

I'm looking for any advice, suggestions or critique on what I'm doing here. I know it won't be atomic. Things could be happening on the live db while this script runs. I think we can live with that. We'll probably do a full copy either nightly or weekly when nothing is happening on the live dbs.

我正在寻找关于我在这里做什么的任何建议,建议或批评。我知道它不会是原子的。当这个脚本运行时,实时数据库上可能会发生一些事情。我想我们可以忍受这一点。我们可能会在每晚或每周都做一个完整的副本,因为现场dbs什么也没发生。

We need to favor performance over elegance or perfection. Some initial testing has the first query with the TS comparisons running at about 30 seconds for the biggest table so I'm optimistic that this is going to work but I'd also like to know if I'm missing something obvious or not seeing the forest for the trees.

我们需要优雅表现而不是优雅或完美。一些初始测试有第一个查询,TS比较大约30秒运行最大的表,所以我很乐观这是可行的,但我也想知道我是否遗漏了一些明显或没有看到的东西森林里的树木。

We don't really want to deal with log files on the reporting db. Can we just set that to simple recovery model and forget about logs?

我们真的不想处理报告数据库上的日志文件。我们可以将其设置为简单的恢复模型并忘记日志吗?

Thanks

2 个解决方案

#1

I think there are a couple open questions here.

我想这里有几个未解决的问题。

Do you need these reports to be near-real-time? Or is this this sort of reporting that could live with daily updates? But assume you need up-to-the-minute data.

您是否需要这些报告几乎是实时的?或者这种报告可以与每日更新一起生活吗?但假设您需要最新的数据。
Have you considered querying the databases directly and merging the data per-report on the fly? You'll have to do a lot of reporting to duplicate the effort that's going to go into designing, creating, and supporting a real-time merged replicated database.

您是否考虑过直接查询数据库并动态合并每个报告的数据?您将不得不进行大量报告,以复制设计,创建和支持实时合并复制数据库的工作量。
Thirty seconds is (IMHO) unacceptable for any single query against a production database. There could be any number of tuning-related reasons for taking this long, but it at least means you're going to need serious professional SQL Server optimization resources (i.e. people). And if this is a problem for the queries for reports, it doesn't bode well for the queries to maintain a separate database for reporting.

对于生产数据库的任何单个查询,三十秒(恕我直言)是不可接受的。花费这么长时间可能有许多与调优相关的原因,但这至少意味着您将需要严格的专业SQL Server优化资源(即人员)。如果这对于报告查询是个问题,那么查询维护单独的数据库以进行报告就不是好兆头。
Tuck into the back of your mind the consideration that, if you need to consolidate to a single database, it's worth considering whether you should make it an OLAP database rather than a mirror. The mirror will be quicker and easier, but the OLAP would be far more flexible and powerful in the long term; and it might be well to go the whole way from the beginning.

如果您需要整合到单个数据库,那么考虑是否应该将其设置为OLAP数据库而不是镜像是值得考虑的问题。镜子将更快更容易,但从长远来看,OLAP将更加灵活和强大;从一开始就走完全路可能会很好。

#2

The last thing I'd want to do is write a custom update script. Try these bulletproof methods first:

我要做的最后一件事是编写自定义更新脚本。首先尝试这些防弹方法:

Let's hope your production databases are backed up. Restore those backups every night to the reporting server. You can automate restores with the RESTORE command, which will work with a file on a network server.

我们希望您的生产数据库得到备份。每晚将这些备份还原到报告服务器。您可以使用RESTORE命令自动执行还原,该命令将与网络服务器上的文件一起使用。

Use SQL Server replication to push data from the live servers to the backend.

使用SQL Server复制将数据从实时服务器推送到后端。

Schedule a DTS package every night to import the entire production database.

每晚安排一个DTS包导入整个生产数据库。

This might seem like brute force. But since you're copying a 2000-era database, brute force cannot be a problem with today's hardware. As an added advantage, these methods can be supported by a sysadmin instead of a developer.

这可能看起来像蛮力。但是,由于您正在复制2000年代的数据库,因此对于今天的硬件来说,暴力破解不是问题。作为一个额外的优点,sysadmin而不是开发人员可以支持这些方法。

Method 1 has the added added advantage of serving as backup verification. :)

方法1具有作为备份验证的额外附加优势。 :)

#1