将XML数据分解成SQL Server数据库列的最佳方法。

时间:2021-07-30 01:53:32

What is the best way to shred XML data into various database columns? So far I have mainly been using the nodes and value functions like so:

将XML数据分解到各种数据库列的最佳方式是什么?到目前为止,我主要使用的节点和值函数如下:

INSERT INTO some_table (column1, column2, column3)
SELECT
Rows.n.value('(@column1)[1]', 'varchar(20)'),
Rows.n.value('(@column2)[1]', 'nvarchar(100)'),
Rows.n.value('(@column3)[1]', 'int'),
FROM @xml.nodes('//Rows') Rows(n)

However I find that this is getting very slow for even moderate size xml data.

然而,我发现即使是中等大小的xml数据,这也变得非常缓慢。

8 个解决方案

#1


46  

Stumbled across this question whilst having a very similar problem, I'd been running a query processing a 7.5MB XML file (~approx 10,000 nodes) for around 3.5~4 hours before finally giving up.

当遇到类似的问题时,我偶然发现了这个问题,在最终放弃之前,我已经运行了大约3.5~4个小时的查询处理7.5MB的XML文件(大约10,000个节点)。

However, after a little more research I found that having typed the XML using a schema and created an XML Index (I'd bulk inserted into a table) the same query completed in ~ 0.04ms.

然而,在进行了一些研究之后,我发现使用模式键入XML并创建XML索引(我将大量插入到表中),同样的查询在大约0.04ms内完成。

How's that for a performance improvement!

这对性能的提高有什么帮助?

Code to create a schema:

创建模式的代码:

IF EXISTS ( SELECT * FROM sys.xml_schema_collections where [name] = 'MyXmlSchema')
DROP XML SCHEMA COLLECTION [MyXmlSchema]
GO

DECLARE @MySchema XML
SET @MySchema = 
(
    SELECT * FROM OPENROWSET
    (
        BULK 'C:\Path\To\Schema\MySchema.xsd', SINGLE_CLOB 
    ) AS xmlData
)

CREATE XML SCHEMA COLLECTION [MyXmlSchema] AS @MySchema 
GO

Code to create the table with a typed XML column:

使用类型化XML列创建表的代码:

CREATE TABLE [dbo].[XmlFiles] (
    [Id] [uniqueidentifier] NOT NULL,

    -- Data from CV element 
    [Data] xml(CONTENT dbo.[MyXmlSchema]) NOT NULL,

CONSTRAINT [PK_XmlFiles] PRIMARY KEY NONCLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

Code to create Index

代码创建索引

CREATE PRIMARY XML INDEX PXML_Data
ON [dbo].[XmlFiles] (Data)

There are a few things to bear in mind though. SQL Server's implementation of Schema doesn't support xsd:include. This means that if you have a schema which references other schema, you'll have to copy all of these into a single schema and add that.

不过,有一些事情需要记住。SQL Server的模式实现不支持xsd:include。这意味着,如果您有一个引用其他模式的模式,您将不得不将所有这些模式复制到一个模式中并添加该模式。

Also I would get an error:

我还会得到一个错误:

XQuery [dbo.XmlFiles.Data.value()]: Cannot implicitly atomize or apply 'fn:data()' to complex content elements, found type 'xs:anyType' within inferred type 'element({http://www.mynamespace.fake/schemas}:SequenceNumber,xs:anyType) ?'.

if I tried to navigate above the node I had selected with the nodes function. E.g.

如果我试图导航到我用node函数选择的节点上面。如。

SELECT
    ,C.value('CVElementId[1]', 'INT') AS [CVElementId]
    ,C.value('../SequenceNumber[1]', 'INT') AS [Level]
FROM 
    [dbo].[XmlFiles]
CROSS APPLY
    [Data].nodes('/CVSet/Level/CVElement') AS T(C)

Found that the best way to handle this was to use the OUTER APPLY to in effect perform an "outer join" on the XML.

发现处理此问题的最佳方法是使用OUTER APPLY在XML上执行“外部连接”。

SELECT
    ,C.value('CVElementId[1]', 'INT') AS [CVElementId]
    ,B.value('SequenceNumber[1]', 'INT') AS [Level]
FROM 
    [dbo].[XmlFiles]
CROSS APPLY
    [Data].nodes('/CVSet/Level') AS T(B)
OUTER APPLY
    B.nodes ('CVElement') AS S(C)

Hope that that helps someone as that's pretty much been my day.

希望这能对别人有所帮助,因为那几乎是我的一天。

#2


5  

in my case i'm running SQL 2005 SP2 (9.0).

在我的例子中,我正在运行SQL 2005 SP2(9.0)。

The only thing that helped was adding OPTION ( OPTIMIZE FOR ( @your_xml_var = NULL ) ). Explanation is on the link below.

唯一有帮助的是添加选项(为(@your_xml_var = NULL))。解释在下面的链接上。

Example:

例子:

INSERT INTO @tbl (Tbl_ID, Name, Value, ParamData)
SELECT     1,
    tbl.cols.value('name[1]', 'nvarchar(255)'),
    tbl.cols.value('value[1]', 'nvarchar(255)'),
    tbl.cols.query('./paramdata[1]')
FROM @xml.nodes('//root') as tbl(cols) OPTION ( OPTIMIZE FOR ( @xml = NULL ) )

https://connect.microsoft.com/SQLServer/feedback/details/562092/an-insert-statement-using-xml-nodes-is-very-very-very-slow-in-sql2008-sp1

https://connect.microsoft.com/SQLServer/feedback/details/562092/an-insert-statement-using-xml-nodes-is-very-very-very-slow-in-sql2008-sp1

#3


3  

I'm not sure what is the best method. I used OPENXML construction:

我不知道什么是最好的方法。我用OPENXML建设:

INSERT INTO Test
SELECT Id, Data 
FROM OPENXML (@XmlDocument, '/Root/blah',2)
WITH (Id   int         '@ID',
      Data varchar(10) '@DATA')

To speed it up, you can create XML indices. You can set index specifically for value function performance optimization. Also you can use typed xml columns, which performs better.

要加快速度,可以创建XML索引。您可以为值函数性能优化专门设置索引。您还可以使用类型化的xml列,它的性能更好。

#4


3  

We had a similar issue here. Our DBA (SP, you the man) took a look at my code, made a little tweak to the syntax, and we got the speed we had been expecting. It was unusual because my select from XML was plenty fast, but the insert was way slow. So try this syntax instead:

我们这里也有类似的问题。我们的DBA (SP, you the man)查看了我的代码,对语法做了一点修改,我们得到了预期的速度。这很不寻常,因为我从XML中选择的速度非常快,但是插入的速度很慢。所以试试这种语法吧:

INSERT INTO some_table (column1, column2, column3)
    SELECT 
        Rows.n.value(N'(@column1/text())[1]', 'varchar(20)'), 
        Rows.n.value(N'(@column2/text())[1]', 'nvarchar(100)'), 
        Rows.n.value(N'(@column3/text())[1]', 'int')
    FROM @xml.nodes('//Rows') Rows(n) 

So specifying the text() parameter really seems to make a difference in performance. Took our insert of 2K rows from 'I must have written that wrong - let me stop it' to about 3 seconds. Which was 2x faster than the raw insert statements we had been running through the connection.

因此,指定text()参数似乎确实会对性能产生影响。从“我一定写错了——让我停止它”到大约3秒,我们插入了2K行。这比我们在连接中运行的原始插入语句快了2x。

#5


2  

I wouldn't claim this is the "best" solution, but I've written a generic SQL CLR procedure for this exact purpose - it takes a "tabular" Xml structure (such as that returned by FOR XML RAW) and outputs a resultset.

我不认为这是“最好的”解决方案,但我已经为此编写了一个通用的SQL CLR过程——它使用一个“表格”Xml结构(例如Xml RAW返回的结构)并输出一个resultset。

It does not require any customization / knowledge of the structure of the "table" in the Xml, and turns out to be extremely fast / efficient (although this wasn't a design goal). I just shredded a 25MB (untyped) xml variable in under 20 seconds, returning 25,000 rows of a pretty wide table.

它不需要对Xml中的“表”结构进行任何自定义或了解,而且结果是非常快速/高效的(尽管这不是设计目标)。我刚刚在20秒内分解了一个25MB(非类型化)xml变量,返回了一个相当宽的表的25,000行。

Hope this helps someone: http://architectshack.com/ClrXmlShredder.ashx

希望这对某些人有所帮助:http://architect tshack.com/clrxmlshredder.ashx

#6


0  

This isn't an answer, more an addition to this question - I have just come across the same problem and I can give figures as edg asks for in the comment.

这不是一个答案,更多的是对这个问题的补充——我刚刚遇到了同样的问题,我可以给出edg在评论中要求的数字。

My test has xml which results in 244 records being inserted - so 244 nodes.

我的测试有xml,结果是插入244条记录,即244个节点。

The code that I am rewriting takes on average 0.4 seconds to run.(10 tests run, spread from .56 secs to .344 secs) Performance is not the main reason the code is being rewritten, but the new code needs to perform as well or better. This old code loops the xml nodes, calling a sp to insert once per loop

我要重写的代码运行平均需要0.4秒。(运行10个测试,从0.56秒扩展到0.344秒)性能不是代码被重写的主要原因,但是新的代码需要执行得更好。这个旧代码循环xml节点,每次循环调用一个sp来插入一次

The new code is pretty much just a single sp; pass the xml in; shred it.

新代码几乎只是一个sp;通过xml;分解它。

Tests with the new code switched in show the new sp takes on average 3.7 seconds - almost 10 times slower.

使用新代码进行的测试显示,新的sp平均耗时3.7秒——几乎慢了10倍。

My query is in the form posted in this question;

我的查询是在这个问题的表格中;

INSERT INTO some_table (column1, column2, column3)
SELECT
Rows.n.value('(@column1)[1]', 'varchar(20)'),
Rows.n.value('(@column2)[1]', 'nvarchar(100)'),
Rows.n.value('(@column3)[1]', 'int'),
FROM @xml.nodes('//Rows') Rows(n)

The execution plan appears to show that for each column, sql server is doing a separate "Table Valued Function [XMLReader]" returning all 244 rows, joining all back up with Nested Loops(Inner Join). So In my case where I am shredding from/ inserting into about 30 columns, this appears to happen separately 30 times.

执行计划显示,对于每个列,sql server都在执行一个单独的“表值函数[XMLReader]”,返回所有244行,并将所有行与嵌套循环(内部连接)连接起来。所以在我的例子中,当我从大约30列中分割/插入时,这种情况会分别发生30次。

I am going to have to dump this code, I don't think any optimisation is going to get over this method being inherently slow. I am going to try the sp_xml_preparedocument/OPENXML method and see if the performance is better for that. If anyone comes across this question from a web search (as I did) I would highly advise you to do some performance testing before using this type of shredding in SQL Server

我将不得不转储这段代码,我不认为任何优化都将克服这种方法固有的缓慢。我将尝试sp_xml_preparedocument/OPENXML方法,看看性能是否更好。如果有人在web搜索中遇到这个问题(正如我所做的),我强烈建议您在使用SQL Server中这种类型的分解之前进行一些性能测试

#7


0  

There is an XML Bulk load COM object (.NET Example)

有一个XML Bulk load COM对象(。净的例子)

From MSDN:

从MSDN:

You can insert XML data into a SQL Server database by using an INSERT statement and the OPENXML function; however, the Bulk Load utility provides better performance when you need to insert large amounts of XML data.

可以使用insert语句和OPENXML函数将XML数据插入SQL Server数据库;但是,当您需要插入大量XML数据时,批量加载实用程序提供了更好的性能。

#8


0  

My current solution for large XML sets (> 500 nodes) is to use SQL Bulk Copy (System.Data.SqlClient.SqlBulkCopy) by using a DataSet to load the XML into memory and then pass the table to SqlBulkCopy (defining a XML schema helps).

我目前针对大型XML集(> 500个节点)的解决方案是使用SQL大容量拷贝(System.Data.SqlClient.SqlBulkCopy),使用数据集将XML加载到内存中,然后将表传递给SqlBulkCopy(定义XML模式会有所帮助)。

Obviously there a pitfalls such as needlessly using a DataSet and loading the whole document into memory first. I would like to go further in the future and implement my own IDataReader to bypass the DataSet method but currently the DataSet is "good enough" for the job.

显然存在一些缺陷,比如不必要地使用数据集并首先将整个文档加载到内存中。我想在将来更进一步,实现我自己的IDataReader来绕过DataSet方法,但是目前的数据集已经“足够好”了。

Basically I never found a solution to my original question regarding the slow performance for that type of XML shredding. It could be slow due to the typed xml queries being inherently slow or something to do with transactions and the the SQL Server log. I guess the typed xml functions were never designed for operating on non-trivial node sizes.

基本上,对于我最初提出的关于这种类型的XML分解的缓慢性能的问题,我没有找到一个解决方案。它可能很慢,因为类型xml查询天生就很慢,或者与事务和SQL服务器日志有关。我猜想类型化的xml函数从来没有被设计用于操作非平凡的节点大小。

XML Bulk Load: I tried this and it was fast but I had trouble getting the COM dll to work under 64bit environments and I generally try to avoid COM dlls that no longer appear to be supported.

XML大容量负载:我尝试过这个方法,而且速度很快,但是在64位环境下让COM dll工作时遇到了麻烦,我通常会尽量避免出现不受支持的COM dll。

sp_xml_preparedocument/OPENXML: I never went down this road so would be interested to see how it performs.

sp_xml_preparedocument/OPENXML:我从来没有沿着这条路走下去,所以有兴趣看看它是如何执行的。

#1


46  

Stumbled across this question whilst having a very similar problem, I'd been running a query processing a 7.5MB XML file (~approx 10,000 nodes) for around 3.5~4 hours before finally giving up.

当遇到类似的问题时,我偶然发现了这个问题,在最终放弃之前,我已经运行了大约3.5~4个小时的查询处理7.5MB的XML文件(大约10,000个节点)。

However, after a little more research I found that having typed the XML using a schema and created an XML Index (I'd bulk inserted into a table) the same query completed in ~ 0.04ms.

然而,在进行了一些研究之后,我发现使用模式键入XML并创建XML索引(我将大量插入到表中),同样的查询在大约0.04ms内完成。

How's that for a performance improvement!

这对性能的提高有什么帮助?

Code to create a schema:

创建模式的代码:

IF EXISTS ( SELECT * FROM sys.xml_schema_collections where [name] = 'MyXmlSchema')
DROP XML SCHEMA COLLECTION [MyXmlSchema]
GO

DECLARE @MySchema XML
SET @MySchema = 
(
    SELECT * FROM OPENROWSET
    (
        BULK 'C:\Path\To\Schema\MySchema.xsd', SINGLE_CLOB 
    ) AS xmlData
)

CREATE XML SCHEMA COLLECTION [MyXmlSchema] AS @MySchema 
GO

Code to create the table with a typed XML column:

使用类型化XML列创建表的代码:

CREATE TABLE [dbo].[XmlFiles] (
    [Id] [uniqueidentifier] NOT NULL,

    -- Data from CV element 
    [Data] xml(CONTENT dbo.[MyXmlSchema]) NOT NULL,

CONSTRAINT [PK_XmlFiles] PRIMARY KEY NONCLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

Code to create Index

代码创建索引

CREATE PRIMARY XML INDEX PXML_Data
ON [dbo].[XmlFiles] (Data)

There are a few things to bear in mind though. SQL Server's implementation of Schema doesn't support xsd:include. This means that if you have a schema which references other schema, you'll have to copy all of these into a single schema and add that.

不过,有一些事情需要记住。SQL Server的模式实现不支持xsd:include。这意味着,如果您有一个引用其他模式的模式,您将不得不将所有这些模式复制到一个模式中并添加该模式。

Also I would get an error:

我还会得到一个错误:

XQuery [dbo.XmlFiles.Data.value()]: Cannot implicitly atomize or apply 'fn:data()' to complex content elements, found type 'xs:anyType' within inferred type 'element({http://www.mynamespace.fake/schemas}:SequenceNumber,xs:anyType) ?'.

if I tried to navigate above the node I had selected with the nodes function. E.g.

如果我试图导航到我用node函数选择的节点上面。如。

SELECT
    ,C.value('CVElementId[1]', 'INT') AS [CVElementId]
    ,C.value('../SequenceNumber[1]', 'INT') AS [Level]
FROM 
    [dbo].[XmlFiles]
CROSS APPLY
    [Data].nodes('/CVSet/Level/CVElement') AS T(C)

Found that the best way to handle this was to use the OUTER APPLY to in effect perform an "outer join" on the XML.

发现处理此问题的最佳方法是使用OUTER APPLY在XML上执行“外部连接”。

SELECT
    ,C.value('CVElementId[1]', 'INT') AS [CVElementId]
    ,B.value('SequenceNumber[1]', 'INT') AS [Level]
FROM 
    [dbo].[XmlFiles]
CROSS APPLY
    [Data].nodes('/CVSet/Level') AS T(B)
OUTER APPLY
    B.nodes ('CVElement') AS S(C)

Hope that that helps someone as that's pretty much been my day.

希望这能对别人有所帮助,因为那几乎是我的一天。

#2


5  

in my case i'm running SQL 2005 SP2 (9.0).

在我的例子中,我正在运行SQL 2005 SP2(9.0)。

The only thing that helped was adding OPTION ( OPTIMIZE FOR ( @your_xml_var = NULL ) ). Explanation is on the link below.

唯一有帮助的是添加选项(为(@your_xml_var = NULL))。解释在下面的链接上。

Example:

例子:

INSERT INTO @tbl (Tbl_ID, Name, Value, ParamData)
SELECT     1,
    tbl.cols.value('name[1]', 'nvarchar(255)'),
    tbl.cols.value('value[1]', 'nvarchar(255)'),
    tbl.cols.query('./paramdata[1]')
FROM @xml.nodes('//root') as tbl(cols) OPTION ( OPTIMIZE FOR ( @xml = NULL ) )

https://connect.microsoft.com/SQLServer/feedback/details/562092/an-insert-statement-using-xml-nodes-is-very-very-very-slow-in-sql2008-sp1

https://connect.microsoft.com/SQLServer/feedback/details/562092/an-insert-statement-using-xml-nodes-is-very-very-very-slow-in-sql2008-sp1

#3


3  

I'm not sure what is the best method. I used OPENXML construction:

我不知道什么是最好的方法。我用OPENXML建设:

INSERT INTO Test
SELECT Id, Data 
FROM OPENXML (@XmlDocument, '/Root/blah',2)
WITH (Id   int         '@ID',
      Data varchar(10) '@DATA')

To speed it up, you can create XML indices. You can set index specifically for value function performance optimization. Also you can use typed xml columns, which performs better.

要加快速度,可以创建XML索引。您可以为值函数性能优化专门设置索引。您还可以使用类型化的xml列,它的性能更好。

#4


3  

We had a similar issue here. Our DBA (SP, you the man) took a look at my code, made a little tweak to the syntax, and we got the speed we had been expecting. It was unusual because my select from XML was plenty fast, but the insert was way slow. So try this syntax instead:

我们这里也有类似的问题。我们的DBA (SP, you the man)查看了我的代码,对语法做了一点修改,我们得到了预期的速度。这很不寻常,因为我从XML中选择的速度非常快,但是插入的速度很慢。所以试试这种语法吧:

INSERT INTO some_table (column1, column2, column3)
    SELECT 
        Rows.n.value(N'(@column1/text())[1]', 'varchar(20)'), 
        Rows.n.value(N'(@column2/text())[1]', 'nvarchar(100)'), 
        Rows.n.value(N'(@column3/text())[1]', 'int')
    FROM @xml.nodes('//Rows') Rows(n) 

So specifying the text() parameter really seems to make a difference in performance. Took our insert of 2K rows from 'I must have written that wrong - let me stop it' to about 3 seconds. Which was 2x faster than the raw insert statements we had been running through the connection.

因此,指定text()参数似乎确实会对性能产生影响。从“我一定写错了——让我停止它”到大约3秒,我们插入了2K行。这比我们在连接中运行的原始插入语句快了2x。

#5


2  

I wouldn't claim this is the "best" solution, but I've written a generic SQL CLR procedure for this exact purpose - it takes a "tabular" Xml structure (such as that returned by FOR XML RAW) and outputs a resultset.

我不认为这是“最好的”解决方案,但我已经为此编写了一个通用的SQL CLR过程——它使用一个“表格”Xml结构(例如Xml RAW返回的结构)并输出一个resultset。

It does not require any customization / knowledge of the structure of the "table" in the Xml, and turns out to be extremely fast / efficient (although this wasn't a design goal). I just shredded a 25MB (untyped) xml variable in under 20 seconds, returning 25,000 rows of a pretty wide table.

它不需要对Xml中的“表”结构进行任何自定义或了解,而且结果是非常快速/高效的(尽管这不是设计目标)。我刚刚在20秒内分解了一个25MB(非类型化)xml变量,返回了一个相当宽的表的25,000行。

Hope this helps someone: http://architectshack.com/ClrXmlShredder.ashx

希望这对某些人有所帮助:http://architect tshack.com/clrxmlshredder.ashx

#6


0  

This isn't an answer, more an addition to this question - I have just come across the same problem and I can give figures as edg asks for in the comment.

这不是一个答案,更多的是对这个问题的补充——我刚刚遇到了同样的问题,我可以给出edg在评论中要求的数字。

My test has xml which results in 244 records being inserted - so 244 nodes.

我的测试有xml,结果是插入244条记录,即244个节点。

The code that I am rewriting takes on average 0.4 seconds to run.(10 tests run, spread from .56 secs to .344 secs) Performance is not the main reason the code is being rewritten, but the new code needs to perform as well or better. This old code loops the xml nodes, calling a sp to insert once per loop

我要重写的代码运行平均需要0.4秒。(运行10个测试,从0.56秒扩展到0.344秒)性能不是代码被重写的主要原因,但是新的代码需要执行得更好。这个旧代码循环xml节点,每次循环调用一个sp来插入一次

The new code is pretty much just a single sp; pass the xml in; shred it.

新代码几乎只是一个sp;通过xml;分解它。

Tests with the new code switched in show the new sp takes on average 3.7 seconds - almost 10 times slower.

使用新代码进行的测试显示,新的sp平均耗时3.7秒——几乎慢了10倍。

My query is in the form posted in this question;

我的查询是在这个问题的表格中;

INSERT INTO some_table (column1, column2, column3)
SELECT
Rows.n.value('(@column1)[1]', 'varchar(20)'),
Rows.n.value('(@column2)[1]', 'nvarchar(100)'),
Rows.n.value('(@column3)[1]', 'int'),
FROM @xml.nodes('//Rows') Rows(n)

The execution plan appears to show that for each column, sql server is doing a separate "Table Valued Function [XMLReader]" returning all 244 rows, joining all back up with Nested Loops(Inner Join). So In my case where I am shredding from/ inserting into about 30 columns, this appears to happen separately 30 times.

执行计划显示,对于每个列,sql server都在执行一个单独的“表值函数[XMLReader]”,返回所有244行,并将所有行与嵌套循环(内部连接)连接起来。所以在我的例子中,当我从大约30列中分割/插入时,这种情况会分别发生30次。

I am going to have to dump this code, I don't think any optimisation is going to get over this method being inherently slow. I am going to try the sp_xml_preparedocument/OPENXML method and see if the performance is better for that. If anyone comes across this question from a web search (as I did) I would highly advise you to do some performance testing before using this type of shredding in SQL Server

我将不得不转储这段代码,我不认为任何优化都将克服这种方法固有的缓慢。我将尝试sp_xml_preparedocument/OPENXML方法,看看性能是否更好。如果有人在web搜索中遇到这个问题(正如我所做的),我强烈建议您在使用SQL Server中这种类型的分解之前进行一些性能测试

#7


0  

There is an XML Bulk load COM object (.NET Example)

有一个XML Bulk load COM对象(。净的例子)

From MSDN:

从MSDN:

You can insert XML data into a SQL Server database by using an INSERT statement and the OPENXML function; however, the Bulk Load utility provides better performance when you need to insert large amounts of XML data.

可以使用insert语句和OPENXML函数将XML数据插入SQL Server数据库;但是,当您需要插入大量XML数据时,批量加载实用程序提供了更好的性能。

#8


0  

My current solution for large XML sets (> 500 nodes) is to use SQL Bulk Copy (System.Data.SqlClient.SqlBulkCopy) by using a DataSet to load the XML into memory and then pass the table to SqlBulkCopy (defining a XML schema helps).

我目前针对大型XML集(> 500个节点)的解决方案是使用SQL大容量拷贝(System.Data.SqlClient.SqlBulkCopy),使用数据集将XML加载到内存中,然后将表传递给SqlBulkCopy(定义XML模式会有所帮助)。

Obviously there a pitfalls such as needlessly using a DataSet and loading the whole document into memory first. I would like to go further in the future and implement my own IDataReader to bypass the DataSet method but currently the DataSet is "good enough" for the job.

显然存在一些缺陷,比如不必要地使用数据集并首先将整个文档加载到内存中。我想在将来更进一步,实现我自己的IDataReader来绕过DataSet方法,但是目前的数据集已经“足够好”了。

Basically I never found a solution to my original question regarding the slow performance for that type of XML shredding. It could be slow due to the typed xml queries being inherently slow or something to do with transactions and the the SQL Server log. I guess the typed xml functions were never designed for operating on non-trivial node sizes.

基本上,对于我最初提出的关于这种类型的XML分解的缓慢性能的问题,我没有找到一个解决方案。它可能很慢,因为类型xml查询天生就很慢,或者与事务和SQL服务器日志有关。我猜想类型化的xml函数从来没有被设计用于操作非平凡的节点大小。

XML Bulk Load: I tried this and it was fast but I had trouble getting the COM dll to work under 64bit environments and I generally try to avoid COM dlls that no longer appear to be supported.

XML大容量负载:我尝试过这个方法,而且速度很快,但是在64位环境下让COM dll工作时遇到了麻烦,我通常会尽量避免出现不受支持的COM dll。

sp_xml_preparedocument/OPENXML: I never went down this road so would be interested to see how it performs.

sp_xml_preparedocument/OPENXML:我从来没有沿着这条路走下去,所以有兴趣看看它是如何执行的。