将关系数据从数据库复制到数据库

时间:2020-11-26 16:30:41

Edit: Let me completely rephrase this, because I'm not sure there's an XML way like I was originally describing.

编辑:让我完全改写一下,因为我不确定是否有像我最初描述的XML方式。

Yet another edit: This needs to be a repeatable process, and it has to be able to be set up in a way that it can be called in C# code.

还有另一个编辑:这需要是一个可重复的过程,并且必须能够以可以在C#代码中调用的方式进行设置。

In database A, I have a set of tables, related by PKs and FKs. A parent table, with child and grandchild tables, let's say.

在数据库A中,我有一组表,由PK和FK相关。一个父表,包括子表和孙表,让我们说。

I want to copy a set of rows from database A to database B, which has identically named tables and fields. For each table, I want to insert into the same table in database B. But I can't be constrained to use the same primary keys. The copy routine must create new PKs for each row in database B, and must propagate those to the child rows. I'm keeping the same relations between the data, in other words, but not the same exact PKs and FKs.

我想将一组行从数据库A复制到数据库B,数据库B具有相同名称的表和字段。对于每个表,我想插入数据库B中的同一个表。但我不能限制使用相同的主键。复制例程必须为数据库B中的每一行创建新的PK,并且必须将这些PK传播到子行。换句话说,我在数据之间保持相同的关系,但不是完全相同的PK和FK。

How would you solve this? I'm open to suggestions. SSIS isn't completely ruled out, but it doesn't look to me like it'll do this exact thing. I'm also open to a solution in LINQ, or using typed DataSets, or using some XML thing, or just about anything that'll work in SQL Server 2005 and/or C# (.NET 3.5). The best solution wouldn't require SSIS, and wouldn't require writing a lot of code. But I'll concede that this "best" solution may not exist.

你怎么解决这个问题?我愿意接受建议。 SSIS并没有完全被排除在外,但我并不认为它会做到这一点。我也对LINQ中的解决方案开放,或者使用类型化的DataSet,或者使用一些XML的东西,或者只是在SQL Server 2005和/或C#(.NET 3.5)中可以使用的任何东西。最好的解决方案不需要SSIS,也不需要编写大量代码。但我会承认,这种“最佳”解决方案可能不存在。

(I didn't make this task up myself, nor the constraints; this is how it was given to me.)

(我没有自己完成这项任务,也没有限制;这就是给我的方式。)

11 个解决方案

#1


2  

I think the SQL Server utility tablediff.exe might be what you are looking for.

我认为SQL Server实用程序tablediff.exe可能正是您要找的。

See also this thread.

另见这个主题。

#2


1  

First, let me say that SSIS is your best bet. But, to answer the question you asked...

首先,我要说SSIS是你最好的选择。但是,要回答你问的问题......

I don't believe you will be able to get away with creating new id's all around, although you could but you would need to take the original IDs to use for lookups.

我不相信你能够随时创建新的id,尽管你可以但你需要将原始ID用于查找。

The best you can get is one insert statement for table. Here is an example of the code to do SELECTs to get you the data from your XML Sample:

你能得到的最好的是一个表的插入语句。下面是一个代码示例,用于执行SELECT以从XML Sample获取数据:

declare @xml xml 
set @xml='<People Key="1" FirstName="Bob" LastName="Smith">
  <PeopleAddresses PeopleKey="1" AddressesKey="1">
    <Addresses Key="1" Street="123 Main" City="St Louis" State="MO" ZIP="12345" />
  </PeopleAddresses>
</People>
<People Key="2" FirstName="Harry" LastName="Jones">
  <PeopleAddresses PeopleKey="2" AddressesKey="2">
    <Addresses Key="2" Street="555 E 5th St" City="Chicago" State="IL" ZIP="23456" />
  </PeopleAddresses>
</People>
<People Key="3" FirstName="Sally" LastName="Smith">
  <PeopleAddresses PeopleKey="3" AddressesKey="1">
    <Addresses Key="1" Street="123 Main" City="St Louis" State="MO" ZIP="12345" />
  </PeopleAddresses>
</People>
<People Key="4" FirstName="Sara" LastName="Jones">
  <PeopleAddresses PeopleKey="4" AddressesKey="2">
    <Addresses Key="2" Street="555 E 5th St" City="Chicago" State="IL" ZIP="23456" />
  </PeopleAddresses>
</People>
'

select t.b.value('./@Key', 'int') PeopleKey,
    t.b.value('./@FirstName', 'nvarchar(50)') FirstName,
    t.b.value('./@LastName', 'nvarchar(50)') LastName
from @xml.nodes('//People') t(b)

select t.b.value('../../@Key', 'int') PeopleKey,
    t.b.value('./@Street', 'nvarchar(50)') Street,
    t.b.value('./@City', 'nvarchar(50)') City,
    t.b.value('./@State', 'char(2)') [State],
    t.b.value('./@Zip', 'char(5)') Zip
from 
@xml.nodes('//Addresses') t(b)

What this does is take Nodes from the XML and parse out the data. To get the relational id from people we use ../../ to go up the chain.

这样做是从XML中获取节点并解析数据。为了获得人们的关系id,我们使用../../来上链。

#3


0  

Dump the XML approach and use the import wizard / SSIS.

转储XML方法并使用导入向导/ SSIS。

#4


0  

By far the easiest way is Red Gate's SQL Data Compare. You can set it up to do just what you described in a minute or two.

到目前为止,最简单的方法是Red Gate的SQL数据比较。您可以将其设置为在一两分钟内完成您所描述的内容。

#5


0  

I love Red Gate's SQL Compare and Data Compare too but it won't meet his requirements for the changing primary keys as far as I can tell.

我也喜欢Red Gate的SQL Compare和Data Compare,但据我所知,它不会满足他对更改主键的要求。

If cross database queries/linked servers are an option you could do this with a stored procedure that copies the records from parent/child in DB A into temporary tables on DB B and then add a column for the new primary key in the temp child table that you would update after inserting the headers.

如果跨数据库查询/链接服务器是一个选项,您可以使用存储过程执行此操作,该存储过程将DB A中父/子的记录复制到DB B上的临时表中,然后在临时子表中添加新主键的列您将在插入标题后更新。

My question is if the records don't have the same primary key how do you tell if it's a new record? Is there some other candidate key? If these are new tables why can't they have the same primary key?

我的问题是,如果记录没有相同的主键,您如何判断它是否是新记录?还有其他候选人钥匙吗?如果这些是新表,为什么他们不能拥有相同的主键?

#6


0  

I have created the same thing with a set of stored procedures.

我用一组存储过程创建了相同的东西。

Database B will have its own primary keys, but store Database A's primary keys, for debuging purposes. It means I can have more than one Database A!

数据库B将拥有自己的主键,但存储数据库A的主键,用于存储目的。这意味着我可以拥有多个数据库A!

Data is copied via a linked server. Not too fast; SSIS is faster. But SSIS is not for beginners, and it is not easy to code something that works with changing source tables.

数据通过链接服务器复制。不太快; SSIS更快。但是SSIS不适合初学者,编写适用于更改源表的代码并不容易。

And it is easy to call a stored procedure from C#.

并且很容易从C#调用存储过程。

#7


0  

I'd script it in a Stored Procedure, using Inserts to do the hard work. Your code will take the PKs from Table A (presumably via @@Scope_Identity) - I assume that the PK for Table A is an Identity field?

我在一个存储过程中编写脚本,使用Inserts来完成艰苦的工作。你的代码将从表A中获取PK(可能是通过@@ Scope_Identity) - 我假设表A的PK是一个Identity字段?

You could use temporary tables, cursors or you might prefer to use the CLR - it might lend itself to this kind of operation.

您可以使用临时表,游标,或者您可能更喜欢使用CLR - 它可能适合这种操作。

I'd be surprised to find a tool that could do this off the shelf with either a) pre-determined keys, or b) identity fields (clearly Tables B & C don't have them).

我很惊讶地发现一种工具可以用a)预先确定的键,或b)标识字段(显然表B和C没有它们)来实现这一点。

#8


0  

Are you clearing the destination tables each time and then starting again? That will make a big difference to the solution you need to implement. If you are doing a complete re-import each time then you could do something like the following:

您是每次清除目的地表然后重新开始吗?这将对您需要实施的解决方案产生重大影响。如果您每次都进行完全重新导入,那么您可以执行以下操作:

Create a temporary table or table variable to record the old and new primary keys for the parent table.

创建临时表或表变量以记录父表的旧主键和新主键。

Insert the parent table data into the destination and use the OUTPUT clause to capture the new ID's and insert them with the old IDs into the temp table. NOTE: Using the output clause is efficient and allows you to do the insert in bulk without cycling through each record to be inserted.

将父表数据插入目标并使用OUTPUT子句捕获新ID,并将旧ID插入临时表。注意:使用output子句是有效的,允许您批量插入而不循环遍历要插入的每个记录。

Insert the child table data. Join to the temp table to retrieve the new foreign key required.

插入子表数据。加入临时表以检索所需的新外键。

The above process could be done using T-SQL Script, C# code or SSIS. My preference would be for SSIS.

上述过程可以使用T-SQL Script,C#代码或SSIS完成。我倾向于SSIS。

#9


0  

If you are adding each time then you may need to keep a permanent table to track the relationship between source database primary keys and destination database primary keys (at least for the parent table). If you needed to keep this kind of data out of the destination database, you could get SSIS to store/retrieve it from some kind of logging database or even a flat file.

如果每次添加,则可能需要保留永久表以跟踪源数据库主键和目标数据库主键之间的关系(至少对于父表)。如果您需要将此类数据保留在目标数据库之外,则可以使SSIS从某种日志数据库甚至是平面文件中存储/检索它。

You could probably avoid the above scenario if there is a combination of fields in the parent table that can be used to uniquely identify that record and therefore "find" the primary key for that record in the destination database.

如果父表中存在可用于唯一标识该记录的字段组合,则可以避免上述情况,从而“找到”目标数据库中该记录的主键。

#10


0  

I think most likely what I'm going to use is typed datasets. It won't be a generalized solution; we'll have to regenerate them if any of the tables change. But based on what I've been told, that's not a problem; the tables aren't expected to change much.

我想我最有可能使用的是打字数据集。它不是一个普遍的解决方案;如果任何表发生变化,我们将不得不重新生成它们。但根据我所说的,这不是问题;表格预计不会有太大变化。

Datasets will make it reasonably easy to loop through the data hierarchically and refresh PKs from the database after insert.

数据集将使分层次循环数据并在插入后从数据库刷新PK变得相当容易。

#11


0  

When dealing with similar tasks I simply created a set of stored procedures to do the job.

在处理类似的任务时,我只是创建了一组存储过程来完成这项工作。

As the task that you specified is pretty custom, you are not likely to find "ready to use" solution.

由于您指定的任务非常自定义,因此您不太可能找到“随时可用”的解决方案。

Just to give you some hints:

只是给你一些提示:

  • If the databases are on different servers use linked servers so you can access both source and destination tables simply through TSQL
  • 如果数据库位于不同的服务器上,则使用链接服务器,这样您就可以通过TSQL访问源表和目标表

In the stored procedure:

在存储过程中:

  • Identify the parent items that need to be copied - you said that the primary keys are different so you need to use unique constraints instead (you should be able to define them if the tables are normalised)
  • 确定需要复制的父项 - 您说主键是不同的,因此您需要使用唯一约束(如果表已规范化,您应该能够定义它们)

  • Identify the child items that need to be copied based on the identified parents, to check if some of them are already in the destination db use the unique constraints approach again
  • 根据已识别的父项确定需要复制的子项,以检查其中是否有一些已在目标数据库中再次使用唯一约束方法

  • Identify the grandchild items (same logic as with parent-child)
  • 识别孙子项目(与父子项目相同的逻辑)

  • Copy data over starting with the lowest level (grandchildren, children, parents)
  • 从最低级别开始复制数据(孙子,孩子,父母)

There is no need for cursors etc, simply store the immediate results in the temporary table (or table variable if working within one stored procedure)

不需要游标等,只需将临时结果存储在临时表中(如果在一个存储过程中工作,则存储表变量)

That approach worked for me pretty well.

这种方法对我很有用。

You can of course add parameter to the main stored procedure so you can either copy all new records or only ones that you specify.

您当然可以将参数添加到主存储过程,以便您可以复制所有新记录或仅复制您指定的记录。

Let me know if that is of any help.

如果有任何帮助,请告诉我。

#1


2  

I think the SQL Server utility tablediff.exe might be what you are looking for.

我认为SQL Server实用程序tablediff.exe可能正是您要找的。

See also this thread.

另见这个主题。

#2


1  

First, let me say that SSIS is your best bet. But, to answer the question you asked...

首先,我要说SSIS是你最好的选择。但是,要回答你问的问题......

I don't believe you will be able to get away with creating new id's all around, although you could but you would need to take the original IDs to use for lookups.

我不相信你能够随时创建新的id,尽管你可以但你需要将原始ID用于查找。

The best you can get is one insert statement for table. Here is an example of the code to do SELECTs to get you the data from your XML Sample:

你能得到的最好的是一个表的插入语句。下面是一个代码示例,用于执行SELECT以从XML Sample获取数据:

declare @xml xml 
set @xml='<People Key="1" FirstName="Bob" LastName="Smith">
  <PeopleAddresses PeopleKey="1" AddressesKey="1">
    <Addresses Key="1" Street="123 Main" City="St Louis" State="MO" ZIP="12345" />
  </PeopleAddresses>
</People>
<People Key="2" FirstName="Harry" LastName="Jones">
  <PeopleAddresses PeopleKey="2" AddressesKey="2">
    <Addresses Key="2" Street="555 E 5th St" City="Chicago" State="IL" ZIP="23456" />
  </PeopleAddresses>
</People>
<People Key="3" FirstName="Sally" LastName="Smith">
  <PeopleAddresses PeopleKey="3" AddressesKey="1">
    <Addresses Key="1" Street="123 Main" City="St Louis" State="MO" ZIP="12345" />
  </PeopleAddresses>
</People>
<People Key="4" FirstName="Sara" LastName="Jones">
  <PeopleAddresses PeopleKey="4" AddressesKey="2">
    <Addresses Key="2" Street="555 E 5th St" City="Chicago" State="IL" ZIP="23456" />
  </PeopleAddresses>
</People>
'

select t.b.value('./@Key', 'int') PeopleKey,
    t.b.value('./@FirstName', 'nvarchar(50)') FirstName,
    t.b.value('./@LastName', 'nvarchar(50)') LastName
from @xml.nodes('//People') t(b)

select t.b.value('../../@Key', 'int') PeopleKey,
    t.b.value('./@Street', 'nvarchar(50)') Street,
    t.b.value('./@City', 'nvarchar(50)') City,
    t.b.value('./@State', 'char(2)') [State],
    t.b.value('./@Zip', 'char(5)') Zip
from 
@xml.nodes('//Addresses') t(b)

What this does is take Nodes from the XML and parse out the data. To get the relational id from people we use ../../ to go up the chain.

这样做是从XML中获取节点并解析数据。为了获得人们的关系id,我们使用../../来上链。

#3


0  

Dump the XML approach and use the import wizard / SSIS.

转储XML方法并使用导入向导/ SSIS。

#4


0  

By far the easiest way is Red Gate's SQL Data Compare. You can set it up to do just what you described in a minute or two.

到目前为止,最简单的方法是Red Gate的SQL数据比较。您可以将其设置为在一两分钟内完成您所描述的内容。

#5


0  

I love Red Gate's SQL Compare and Data Compare too but it won't meet his requirements for the changing primary keys as far as I can tell.

我也喜欢Red Gate的SQL Compare和Data Compare,但据我所知,它不会满足他对更改主键的要求。

If cross database queries/linked servers are an option you could do this with a stored procedure that copies the records from parent/child in DB A into temporary tables on DB B and then add a column for the new primary key in the temp child table that you would update after inserting the headers.

如果跨数据库查询/链接服务器是一个选项,您可以使用存储过程执行此操作,该存储过程将DB A中父/子的记录复制到DB B上的临时表中,然后在临时子表中添加新主键的列您将在插入标题后更新。

My question is if the records don't have the same primary key how do you tell if it's a new record? Is there some other candidate key? If these are new tables why can't they have the same primary key?

我的问题是,如果记录没有相同的主键,您如何判断它是否是新记录?还有其他候选人钥匙吗?如果这些是新表,为什么他们不能拥有相同的主键?

#6


0  

I have created the same thing with a set of stored procedures.

我用一组存储过程创建了相同的东西。

Database B will have its own primary keys, but store Database A's primary keys, for debuging purposes. It means I can have more than one Database A!

数据库B将拥有自己的主键,但存储数据库A的主键,用于存储目的。这意味着我可以拥有多个数据库A!

Data is copied via a linked server. Not too fast; SSIS is faster. But SSIS is not for beginners, and it is not easy to code something that works with changing source tables.

数据通过链接服务器复制。不太快; SSIS更快。但是SSIS不适合初学者,编写适用于更改源表的代码并不容易。

And it is easy to call a stored procedure from C#.

并且很容易从C#调用存储过程。

#7


0  

I'd script it in a Stored Procedure, using Inserts to do the hard work. Your code will take the PKs from Table A (presumably via @@Scope_Identity) - I assume that the PK for Table A is an Identity field?

我在一个存储过程中编写脚本,使用Inserts来完成艰苦的工作。你的代码将从表A中获取PK(可能是通过@@ Scope_Identity) - 我假设表A的PK是一个Identity字段?

You could use temporary tables, cursors or you might prefer to use the CLR - it might lend itself to this kind of operation.

您可以使用临时表,游标,或者您可能更喜欢使用CLR - 它可能适合这种操作。

I'd be surprised to find a tool that could do this off the shelf with either a) pre-determined keys, or b) identity fields (clearly Tables B & C don't have them).

我很惊讶地发现一种工具可以用a)预先确定的键,或b)标识字段(显然表B和C没有它们)来实现这一点。

#8


0  

Are you clearing the destination tables each time and then starting again? That will make a big difference to the solution you need to implement. If you are doing a complete re-import each time then you could do something like the following:

您是每次清除目的地表然后重新开始吗?这将对您需要实施的解决方案产生重大影响。如果您每次都进行完全重新导入,那么您可以执行以下操作:

Create a temporary table or table variable to record the old and new primary keys for the parent table.

创建临时表或表变量以记录父表的旧主键和新主键。

Insert the parent table data into the destination and use the OUTPUT clause to capture the new ID's and insert them with the old IDs into the temp table. NOTE: Using the output clause is efficient and allows you to do the insert in bulk without cycling through each record to be inserted.

将父表数据插入目标并使用OUTPUT子句捕获新ID,并将旧ID插入临时表。注意:使用output子句是有效的,允许您批量插入而不循环遍历要插入的每个记录。

Insert the child table data. Join to the temp table to retrieve the new foreign key required.

插入子表数据。加入临时表以检索所需的新外键。

The above process could be done using T-SQL Script, C# code or SSIS. My preference would be for SSIS.

上述过程可以使用T-SQL Script,C#代码或SSIS完成。我倾向于SSIS。

#9


0  

If you are adding each time then you may need to keep a permanent table to track the relationship between source database primary keys and destination database primary keys (at least for the parent table). If you needed to keep this kind of data out of the destination database, you could get SSIS to store/retrieve it from some kind of logging database or even a flat file.

如果每次添加,则可能需要保留永久表以跟踪源数据库主键和目标数据库主键之间的关系(至少对于父表)。如果您需要将此类数据保留在目标数据库之外,则可以使SSIS从某种日志数据库甚至是平面文件中存储/检索它。

You could probably avoid the above scenario if there is a combination of fields in the parent table that can be used to uniquely identify that record and therefore "find" the primary key for that record in the destination database.

如果父表中存在可用于唯一标识该记录的字段组合,则可以避免上述情况,从而“找到”目标数据库中该记录的主键。

#10


0  

I think most likely what I'm going to use is typed datasets. It won't be a generalized solution; we'll have to regenerate them if any of the tables change. But based on what I've been told, that's not a problem; the tables aren't expected to change much.

我想我最有可能使用的是打字数据集。它不是一个普遍的解决方案;如果任何表发生变化,我们将不得不重新生成它们。但根据我所说的,这不是问题;表格预计不会有太大变化。

Datasets will make it reasonably easy to loop through the data hierarchically and refresh PKs from the database after insert.

数据集将使分层次循环数据并在插入后从数据库刷新PK变得相当容易。

#11


0  

When dealing with similar tasks I simply created a set of stored procedures to do the job.

在处理类似的任务时,我只是创建了一组存储过程来完成这项工作。

As the task that you specified is pretty custom, you are not likely to find "ready to use" solution.

由于您指定的任务非常自定义,因此您不太可能找到“随时可用”的解决方案。

Just to give you some hints:

只是给你一些提示:

  • If the databases are on different servers use linked servers so you can access both source and destination tables simply through TSQL
  • 如果数据库位于不同的服务器上,则使用链接服务器,这样您就可以通过TSQL访问源表和目标表

In the stored procedure:

在存储过程中:

  • Identify the parent items that need to be copied - you said that the primary keys are different so you need to use unique constraints instead (you should be able to define them if the tables are normalised)
  • 确定需要复制的父项 - 您说主键是不同的,因此您需要使用唯一约束(如果表已规范化,您应该能够定义它们)

  • Identify the child items that need to be copied based on the identified parents, to check if some of them are already in the destination db use the unique constraints approach again
  • 根据已识别的父项确定需要复制的子项,以检查其中是否有一些已在目标数据库中再次使用唯一约束方法

  • Identify the grandchild items (same logic as with parent-child)
  • 识别孙子项目(与父子项目相同的逻辑)

  • Copy data over starting with the lowest level (grandchildren, children, parents)
  • 从最低级别开始复制数据(孙子,孩子,父母)

There is no need for cursors etc, simply store the immediate results in the temporary table (or table variable if working within one stored procedure)

不需要游标等,只需将临时结果存储在临时表中(如果在一个存储过程中工作,则存储表变量)

That approach worked for me pretty well.

这种方法对我很有用。

You can of course add parameter to the main stored procedure so you can either copy all new records or only ones that you specify.

您当然可以将参数添加到主存储过程,以便您可以复制所有新记录或仅复制您指定的记录。

Let me know if that is of any help.

如果有任何帮助,请告诉我。