从XML RAW数据批量插入SQL Server 2008 R2

I have an XML structure like the following:

我有一个类似如下的XML结构:

<tables>
  <table name="tableName1">
    <row ID="34" col1="data" col2="dom" />
    <row ID="35" col1="data2" col2="dom2" />
  </table>
  <table name="tableName2">
    <row ID="1" col1="data" col2="dom" col3="item1" />
    <row ID="3" col1="data2" col2="dom2" col3="item2" />
    <row ID="7" col1="data4" col3="item3" />
  </table>
  ...
<tables>

Basically the table nodes contain RAW data created by selecting FOR XML RAW.

基本上,表节点包含通过选择FOR XML RAW创建的RAW数据。

Now I wish to do the reverse: read the XML and insert data into respective tables of a SQL Server 2008 R2 database. However I want the loading process to be robust, meaning I do not want to mess with column names and table names if they change in the future. I need the process to read table names from @name attributes of table nodes and insert data into columns specified by attributes in <Row> nodes. I thought of a stored procedure that gets an XML as input and does the rest.

现在我想反过来:读取XML并将数据插入到SQL Server 2008 R2数据库的各个表中。但是我希望加载过程是健壮的,这意味着如果它们将来发生变化,我不想搞乱列名和表名。我需要该过程从表节点的@name属性中读取表名,并将数据插入到节点中的属性指定的列中。我想到了一个存储过程,它将XML作为输入并完成其余的工作。

The amount of data is approx. 70 tables ranging from 10 to 30 000 rows, altogether no more than 100 000 rows. I need to do it as efficiently as possible, bulk loading would be the best.

数据量约为。 70个表,10到30 000行,总共不超过10万行。我需要尽可能高效地进行,批量加载将是最好的。

The process should not take care of foreign keys as the order of tables inside the XML is built so that FK constraints can be kept in place by loading one table after the other.

该过程不应该处理外键,因为XML中的表的顺序是构建的,因此可以通过将一个表加载到另一个表之后来保持FK约束。

However there are identity columns in each table so I must do a

但是每个表中都有标识列,所以我必须这样做

SET Identity_Insert ON and SET Identity_Insert OFF

before and after processing each table. I also need to reseed each table after inserting all rows. Oh,and I need to do the whole shebang in a transaction so that I could roll back if something goes wrong.

处理每个表之前和之后。插入所有行后,我还需要重新设置每个表。哦,我需要在交易中完成整个shebang,以便在出现问题时我可以回滚。

Which way do you suggest I go: should I stay with T-SQL or try to write the SP in CLR SQL? Should I use XQuery or can I use some bulk insert method?

你建议我采用哪种方式:我应该继续使用T-SQL还是尝试在CLR SQL中编写SP?我应该使用XQuery还是可以使用一些批量插入方法?

Thanks for all the help!

感谢您的帮助!

2 个解决方案

#1

As you are dealing with fairly big XML documents, I recommend at this point to use a .net shredder. You can do that in a CLR procedure or an external tool. You could also use the build in xquery of SQL Server, but that will be slow.

当您处理相当大的XML文档时,我建议此时使用.net碎纸机。您可以在CLR过程或外部工具中执行此操作。您也可以使用SQL Server的xquery中的构建,但这将很慢。

However, looking at this and your previous question (Dump data into single XML file from MS SQL Server 2008 R2), I am thinking you might be better of using something like the BCP utility or even replication. What are your exact requirements?

但是,看看这个和你之前的问题(从MS SQL Server 2008 R2将数据转储到单个XML文件中),我想你可能更喜欢使用像BCP实用程序甚至复制这样的东西。你有什么要求?

#2

Basically you will have to loop through your XML and write the queries based on the result set.

基本上,您必须循环遍历XML并根据结果集编写查询。

Try this to start:

试试这个开始:

declare @i int;
declare @x xml;

------
SELECT @x = N'
<tables>
  <table name="tableName1">
    <row ID="34" col1="data" col2="dom" />
    <row ID="35" col1="data2" col2="dom2" />
  </table>
  <table name="tableName2">
    <row ID="1" col1="data" col2="dom" col3="item1" />
    <row ID="3" col1="data2" col2="dom2" col3="item2" />
    <row ID="7" col1="data4" col3="item3" />
  </table>
</tables>';


exec sp_xml_preparedocument @i output, @x


select ID, col1, col2
from OpenXml(@i, '/tables/table/row')
with (ID int, col1 nvarchar(30), col2 nvarchar(30))

exec sp_xml_removedocument @i

It will get you the list of columns you need to inset data into (you can get the table names one level before, just change the SQL)

它将为您提供插入数据所需的列列表(您可以在之前获取表级别,只需更改SQL)

34  data    dom
35  data2   dom2
1   data    dom
3   data2   dom2
7   data4   NULL

what you need to do next is write the statements looping on this result set.

接下来你需要做的是编写循环在这个结果集上的语句。

FYI, you don't need to write the XML, you can read from a file like this:

仅供参考,您不需要编写XML,您可以从这样的文件中读取:

SELECT @x = xCol.BulkColumn FROM OPENROWSET (BULK 'c:\Update.xml', SINGLE_BLOB) AS xCol;

#1