I'm working on a logging database in SQL Server 2008. It'll consist mainly of one table something like this:
我正在开发SQL Server 2008中的日志数据库。它将主要包括一张这样的表格:
StepLog
----------------
StepLogID
ClientID
LogContent XML
CreateDate
Basically what will happen in this table is that various clients will log certain activities to this table. The LogContent field will be XML - untyped because we don't know what clients want to log.
基本上,这个表中会发生的事情是,不同的客户端会将某些活动记录到这个表中。LogContent字段将是XML - un类型化的,因为我们不知道客户想要记录什么。
To allow the LogContent field to be searched, the current plan is to shred out the LogContent field programmatically. The metadata for shredding would be in a table something like the following:
要允许搜索LogContent字段,当前的计划是以编程方式分解LogContent字段。用于分解的元数据将在如下表中:
XPathAttribute
----------------
XPathAttributeID
AttributeName
AttributeDescription
XPath
Upon insert of a record into StepLog, we would have a stored procedure that would take all the Xpaths defined in XPathAttribute, and write them out to another table, XPathAttributeValue
在将记录插入到StepLog之后,我们将有一个存储过程,该过程将获取XPathAttribute中定义的所有xpath,并将它们写到另一个表XPathAttributeValue中
XPathAttributeValue
----------------
XPathAttributeValueID
StepLogID
AttributeID
AttributeValue
My original idea, when looking at this design, was "why not just use the XML indexes, both primary and secondary? That would avoid lots of work on our side, and use built-in functionality.
在查看这个设计时,我最初的想法是“为什么不使用XML索引,包括主索引和辅助索引?”这将避免我们这边的大量工作,并使用内置功能。
I don't have a lot of experience with XML indexes, and the original designer had some poor experiences with XML indexes (poor performance) in SQL Server2005, that's how this design originated.
我对XML索引没有很多经验,最初的设计人员在SQL Server2005中使用XML索引(性能很差)的经验也很差,这就是这个设计的起源。
Feedback would be very much appreciated!
非常感谢您的反馈!
thanks, Sylvia
谢谢,西尔维娅
2 个解决方案
#1
3
XML indexes help in particular scenarios, as described in Secondary XML Indexes:
XML索引可以在特定场景中提供帮助,如辅助XML索引所述:
Following are some guidelines for creating one or more secondary indexes:
以下是创建一个或多个次要索引的一些指导方针:
- If your workload uses path expressions significantly on XML columns, the PATH secondary XML index is likely to speed up your workload. The most common case is the use of the
exist()
method on XML columns in the WHERE clause of Transact-SQL.- 如果工作负载在XML列上大量使用路径表达式,那么路径辅助XML索引可能会加快工作负载。最常见的情况是对Transact-SQL的WHERE子句中的XML列使用exist()方法。
- If your workload retrieves multiple values from individual XML instances by using path expressions, clustering paths within each XML instance in the PROPERTY index may be helpful. This scenario typically occurs in a property bag scenario when properties of an object are fetched and its primary key value is known.
- 如果工作负载通过使用路径表达式从单个XML实例检索多个值,那么属性索引中的每个XML实例中的集群路径可能会有帮助。此场景通常发生在属性包场景中,当对象的属性被获取且其主键值已知时。
- If your workload involves querying for values within XML instances without knowing the element or attribute names that contain those values, you may want to create the VALUE index. This typically occurs with descendant axes lookups, such as
//author[last-name="Howard"]
, where elements can occur at any level of the hierarchy. It also occurs in wildcard queries, such as/book [@* = "novel"]
, where the query looks for<book>
elements that have some attribute having the value "novel".- 如果您的工作负载涉及在XML实例中查询值,而不知道包含这些值的元素或属性名称,那么您可能希望创建值索引。这通常发生在后代轴查找中,例如//author[last-name="Howard"],元素可以出现在层次结构的任何级别。它也出现在通配符查询中,例如/book [@* = "novel"],查询查找具有值为"novel"的属性的
元素。
As you can see, each type of index is appropriate for a particular scenario. With an open ended approach like your project, is hard to tell which indexes are going to be helpful and which not.
如您所见,每种类型的索引都适用于特定的场景。使用像您的项目这样的开放结尾方法,很难判断哪些索引是有用的,哪些不是。
Another thing to consider is that XML performs much better if you can declare an XML schema for the column, but the nature of your project does not allow this.
另一件需要考虑的事情是,如果可以为列声明XML模式,那么XML的性能要好得多,但项目的性质不允许这样做。
So overall I'd say... measure and see. Shredding the XML and storing the values in relational tables is very likely to boost performance over raw XML access. But that would apply if you know the schema and shred out a specific set of information, that you then index properly. Right now, even though you shred out some information, you shred it out into what basically is an EAV structure, which is difficult both to query and to optimize. I also recommend you read up on Best Practices for Semantic Data Modeling for Performance and Scalability for some discussions around the EAV shortcomings and how to avoid some problems.
所以总的来说,我想说…测量和看到的。分解XML并将值存储在关系表中,很可能比原始的XML访问提高性能。但是,如果您知道模式并分解出一组特定的信息,那么这将适用,然后您可以对这些信息进行适当的索引。现在,即使您分解出一些信息,您也将它分解成一个基本上是EAV结构的结构,这很难查询和优化。我还建议您阅读关于性能和可扩展性的语义数据建模的最佳实践,以便讨论EAV的缺点以及如何避免一些问题。
#2
2
I basically agree with what @Remus has said.
我基本上同意@Remus所说的。
Which is to say, by all means use the built-in XML indexes. SQL Server handles huge XML collections remarkably well (IMHO). The time saving over rolling your own will be immeasurable.
也就是说,使用内置的XML索引。SQL Server能够很好地处理大型XML集合(IMHO)。节省的时间将是无法估量的。
One thing I would mention — adding a schema hurt performance in my case. I'd hoped it would help the query optimizer, but it didn't, so I just left it out. (You said it was untyped, so this shouldn't come up.)
我要提到的一件事是,在我的例子中添加模式会影响性能。我希望它能帮助查询优化器,但它没有,所以我把它漏掉了。(你说它是无类型的,所以不应该出现这种情况。)
#1
3
XML indexes help in particular scenarios, as described in Secondary XML Indexes:
XML索引可以在特定场景中提供帮助,如辅助XML索引所述:
Following are some guidelines for creating one or more secondary indexes:
以下是创建一个或多个次要索引的一些指导方针:
- If your workload uses path expressions significantly on XML columns, the PATH secondary XML index is likely to speed up your workload. The most common case is the use of the
exist()
method on XML columns in the WHERE clause of Transact-SQL.- 如果工作负载在XML列上大量使用路径表达式,那么路径辅助XML索引可能会加快工作负载。最常见的情况是对Transact-SQL的WHERE子句中的XML列使用exist()方法。
- If your workload retrieves multiple values from individual XML instances by using path expressions, clustering paths within each XML instance in the PROPERTY index may be helpful. This scenario typically occurs in a property bag scenario when properties of an object are fetched and its primary key value is known.
- 如果工作负载通过使用路径表达式从单个XML实例检索多个值,那么属性索引中的每个XML实例中的集群路径可能会有帮助。此场景通常发生在属性包场景中,当对象的属性被获取且其主键值已知时。
- If your workload involves querying for values within XML instances without knowing the element or attribute names that contain those values, you may want to create the VALUE index. This typically occurs with descendant axes lookups, such as
//author[last-name="Howard"]
, where elements can occur at any level of the hierarchy. It also occurs in wildcard queries, such as/book [@* = "novel"]
, where the query looks for<book>
elements that have some attribute having the value "novel".- 如果您的工作负载涉及在XML实例中查询值,而不知道包含这些值的元素或属性名称,那么您可能希望创建值索引。这通常发生在后代轴查找中,例如//author[last-name="Howard"],元素可以出现在层次结构的任何级别。它也出现在通配符查询中,例如/book [@* = "novel"],查询查找具有值为"novel"的属性的
元素。
As you can see, each type of index is appropriate for a particular scenario. With an open ended approach like your project, is hard to tell which indexes are going to be helpful and which not.
如您所见,每种类型的索引都适用于特定的场景。使用像您的项目这样的开放结尾方法,很难判断哪些索引是有用的,哪些不是。
Another thing to consider is that XML performs much better if you can declare an XML schema for the column, but the nature of your project does not allow this.
另一件需要考虑的事情是,如果可以为列声明XML模式,那么XML的性能要好得多,但项目的性质不允许这样做。
So overall I'd say... measure and see. Shredding the XML and storing the values in relational tables is very likely to boost performance over raw XML access. But that would apply if you know the schema and shred out a specific set of information, that you then index properly. Right now, even though you shred out some information, you shred it out into what basically is an EAV structure, which is difficult both to query and to optimize. I also recommend you read up on Best Practices for Semantic Data Modeling for Performance and Scalability for some discussions around the EAV shortcomings and how to avoid some problems.
所以总的来说,我想说…测量和看到的。分解XML并将值存储在关系表中,很可能比原始的XML访问提高性能。但是,如果您知道模式并分解出一组特定的信息,那么这将适用,然后您可以对这些信息进行适当的索引。现在,即使您分解出一些信息,您也将它分解成一个基本上是EAV结构的结构,这很难查询和优化。我还建议您阅读关于性能和可扩展性的语义数据建模的最佳实践,以便讨论EAV的缺点以及如何避免一些问题。
#2
2
I basically agree with what @Remus has said.
我基本上同意@Remus所说的。
Which is to say, by all means use the built-in XML indexes. SQL Server handles huge XML collections remarkably well (IMHO). The time saving over rolling your own will be immeasurable.
也就是说,使用内置的XML索引。SQL Server能够很好地处理大型XML集合(IMHO)。节省的时间将是无法估量的。
One thing I would mention — adding a schema hurt performance in my case. I'd hoped it would help the query optimizer, but it didn't, so I just left it out. (You said it was untyped, so this shouldn't come up.)
我要提到的一件事是,在我的例子中添加模式会影响性能。我希望它能帮助查询优化器,但它没有,所以我把它漏掉了。(你说它是无类型的,所以不应该出现这种情况。)