使用XPath查询XML列的T-SQL非常慢 - 如何改进或替代?

时间:2021-11-26 04:18:36

I have a table that contains a XML data type column. Right now the approach is to use XPath to query values that are within the XML. Unfortunately this method is turning out to be extremely slow.

我有一个包含XML数据类型列的表。现在,方法是使用XPath来查询XML中的值。不幸的是,这种方法变得极其缓慢。

The table has about 500,000 rows. It is actually a staging table that receives new data every day, so applying XML indexing on the column is not practical - the daily INSERT operation then takes hours to complete. Without indexing, it finishes in about a minute.

该表有大约500,000行。它实际上是一个每天接收新数据的临时表,因此在列上应用XML索引是不切实际的 - 每日INSERT操作需要数小时才能完成。没有索引,它会在大约一分钟内完成。

Are there any alternatives to query this XML data that would be much faster?

是否有任何替代方法可以更快地查询此XML数据?

2 个解决方案

#1


7  

How many of the items inside the XML do you need to query on a regular basis?? Just a few??

XML中有多少项需要定期查询?一些??

The solution we've chosen facing the same issues is this:

我们选择面临同样问题的解决方案是:

  • create a stored function that takes an XML parameter as its input
  • 创建一个以XML参数作为输入的存储函数

  • in that function, extract the information you need from the XML using XQuery/XPath
  • 在该函数中,使用XQuery / XPath从XML中提取所需的信息

  • create a computed persisted column on your table that references that function
  • 在表上创建一个引用该函数的计算持久列

In that way, we're pulling out the three, four most frequently used items of information (often just an INT, in our case) and making them available as columns on the base table. Since they're persisted, they won't be recalculated on every access - only if the XML contents changes; and also since they're persisted, you can put a regular nonclustered index on them, if need be.

通过这种方式,我们将提取三个,四个最常用的信息项(在我们的例子中通常只是一个INT),并将它们作为基表上的列提供。由于它们是持久的,因此只有在XML内容发生变化时才会在每次访问时重新计算它们。而且因为它们是持久的,所以如果需要,你可以在它们上面加上常规的非聚集索引。

Example:

we have a function that extracts a BIT from the XML telling us whether or not a given contract has a VPN connection or not:

我们有一个从XML中提取BIT的函数,告诉我们给定的合约是否有VPN连接:

CREATE FUNCTION dbo.GetVPNFlag(@Data XML)
RETURNS BIT
WITH SCHEMABINDING
AS BEGIN
  DECLARE @VPNFlag BIT

  SELECT  
    @VPNFlag = ISNULL(@Data.value('(EntryIP/VPNOption)[1]', 'bit'), 0)

  RETURN @VPNFlag
END

Given an XML, this will pick out the VPN flag and return it. Next, we created a computed persisted column on our base table:

给定一个XML,这将选择VPN标志并返回它。接下来,我们在基表上创建了一个计算持久列:

ALTER TABLE dbo.ContractData
  ADD IsVPN AS dbo.GetVPNFlag(XmlData) PERSISTED

Here, we're passing in the XmlData contents from the ContractData table, into the function. We're getting back a BIT, which is stored as IsVPN column on the ContractData table.

在这里,我们将从ContractData表中的XmlData内容传递到函数中。我们正在返回一个BIT,它在ContractData表中存储为IsVPN列。

We can now easily get all contracts with VPN like this:

我们现在可以轻松地获得与VPN的所有合同:

SELECT (list of fields) 
FROM dbo.ContractData
WHERE IsVPN = 1

#2


0  

We had the same situation and amount of data and after tuning ended up in having a insert & update trigger inserting data in "datawarehouse tables". This gives a slower insert but workable for our users.

我们有相同的情况和数据量,调整结束后插入和更新触发器插入“datawarehouse表”中的数据。这样可以减慢插入速度,但对我们的用户来说是可行的

#1


7  

How many of the items inside the XML do you need to query on a regular basis?? Just a few??

XML中有多少项需要定期查询?一些??

The solution we've chosen facing the same issues is this:

我们选择面临同样问题的解决方案是:

  • create a stored function that takes an XML parameter as its input
  • 创建一个以XML参数作为输入的存储函数

  • in that function, extract the information you need from the XML using XQuery/XPath
  • 在该函数中,使用XQuery / XPath从XML中提取所需的信息

  • create a computed persisted column on your table that references that function
  • 在表上创建一个引用该函数的计算持久列

In that way, we're pulling out the three, four most frequently used items of information (often just an INT, in our case) and making them available as columns on the base table. Since they're persisted, they won't be recalculated on every access - only if the XML contents changes; and also since they're persisted, you can put a regular nonclustered index on them, if need be.

通过这种方式,我们将提取三个,四个最常用的信息项(在我们的例子中通常只是一个INT),并将它们作为基表上的列提供。由于它们是持久的,因此只有在XML内容发生变化时才会在每次访问时重新计算它们。而且因为它们是持久的,所以如果需要,你可以在它们上面加上常规的非聚集索引。

Example:

we have a function that extracts a BIT from the XML telling us whether or not a given contract has a VPN connection or not:

我们有一个从XML中提取BIT的函数,告诉我们给定的合约是否有VPN连接:

CREATE FUNCTION dbo.GetVPNFlag(@Data XML)
RETURNS BIT
WITH SCHEMABINDING
AS BEGIN
  DECLARE @VPNFlag BIT

  SELECT  
    @VPNFlag = ISNULL(@Data.value('(EntryIP/VPNOption)[1]', 'bit'), 0)

  RETURN @VPNFlag
END

Given an XML, this will pick out the VPN flag and return it. Next, we created a computed persisted column on our base table:

给定一个XML,这将选择VPN标志并返回它。接下来,我们在基表上创建了一个计算持久列:

ALTER TABLE dbo.ContractData
  ADD IsVPN AS dbo.GetVPNFlag(XmlData) PERSISTED

Here, we're passing in the XmlData contents from the ContractData table, into the function. We're getting back a BIT, which is stored as IsVPN column on the ContractData table.

在这里,我们将从ContractData表中的XmlData内容传递到函数中。我们正在返回一个BIT,它在ContractData表中存储为IsVPN列。

We can now easily get all contracts with VPN like this:

我们现在可以轻松地获得与VPN的所有合同:

SELECT (list of fields) 
FROM dbo.ContractData
WHERE IsVPN = 1

#2


0  

We had the same situation and amount of data and after tuning ended up in having a insert & update trigger inserting data in "datawarehouse tables". This gives a slower insert but workable for our users.

我们有相同的情况和数据量,调整结束后插入和更新触发器插入“datawarehouse表”中的数据。这样可以减慢插入速度,但对我们的用户来说是可行的