为什么使用XML类型在SQL Server中存储XML数据?

时间:2022-06-30 16:27:08

I'm playing around and learning to use Microsoft SQL Server. I want to store XML documents in a table, parts of the XML document won't be modified within the table (i.e. any changes will be done by updating the whole XML document in that cell).

我正在玩并学习使用Microsoft SQL Server。我想将XML文档存储在表中,不会在表中修改部分XML文档(即,将通过更新该单元中的整个XML文档来完成任何更改)。

From what I can see, I can store the XML documents in a column of type Xml or in a varchar(MAX).

从我所看到的,我可以将XML文档存储在Xml类型的列或varchar(MAX)中。

What are the pros and cons of each?

各自的优点和缺点是什么?

6 个解决方案

#1


4  

Yes, you can.

是的你可以。

Now, go on reading the documentation further. The part about better search for XML - you can put an index on a XML field and it will allow you a lot more query syntax specific for XML than a text field because XML fields internally parse the XML.

现在,继续阅读文档。关于更好地搜索XML的部分 - 您可以在XML字段上放置索引,它将允许您比文本字段更多的特定于XML的查询语法,因为XML字段在内部解析XML。

#2


6  

XML datatype supports:

XML数据类型支持:

Besides, using an XML type it will be harder for you to do the typical mistakes junior developers do when handling XML: threat it as a string, mix or ignore encodings like UTF8 and UTF16, ignore namespaces, confuse or ignore processing instructions etc etc.

此外,使用XML类型,你很难做到初级开发人员在处理XML时遇到的典型错误:将其作为字符串威胁,混合或忽略UTF8和UTF16等编码,忽略命名空间,混淆或忽略处理指令等。

Please read XML Best Practices for Microsoft SQL Server 2005

请阅读Microsoft SQL Server 2005的XML最佳实践

#3


3  

Quoted from the below SO post: Microsoft SQL Server 2005/2008: XML vs text/varchar data type

引自以下SO帖子:Microsoft SQL Server 2005/2008:XML与text / varchar数据类型

If you store xml in an xml typed column, the data will not get stored as simple text, as in the nvarchar case, it will be stored in some sort of parsed data tree, which in turn will be smaller than the unparsed xml version. This not only decreases the database size, but gives you other advantages, like validation, easy manipulation etc. (even though you're not using any of these, still, they are there for future use).

如果将xml存储在xml类型的列中,则数据将不会以简单文本形式存储,如在nvarchar情况下,它将存储在某种解析数据树中,而后者将比未解析的xml版本小。这不仅减少了数据库的大小,而且还为您提供了其他优势,例如验证,易于操作等等(即使您没有使用其中任何一个,仍然可以将它们用于将来使用)。

On the other hand, the server will have to parse the data upon insertion, which will probably slow your database down - you have to make a decision of speed vs. size.

另一方面,服务器必须在插入时解析数据,这可能会减慢数据库速度 - 您必须决定速度与大小。

Personally, I think that data in the database should be stored as xml only when it has structure which is hard to implement in a relational model, e.g. layouts, style descriptions etc. Usually that means that there won't be much data and speed is not an issue, thus added xml features, like data validation and manipulation ability (also, last but not least, the ability to click on the value in managment studio and see formatted xml - I really love that feature!), outweight the costs.

就个人而言,我认为数据库中的数据只有在具有难以在关系模型中实现的结构时才应存储为xml。布局,样式描述等。通常这意味着没有太多的数据和速度不是问题,因此添加了xml功能,如数据验证和操作能力(同样,最后但并非最不重要的,点击值的能力在管理工作室,看到格式化的xml - 我真的很喜欢这个功能!),超重成本。

I don't have direct experience in storing large amounts of xml in the database and I wouldn't do that if I had the option, since it is almost always slower that a relational model, but if that would be the case, I'd recommend profiling both options, and choosing between size and speed that best suit your needs.

我没有在数据库中存储大量xml的直接经验,如果我有选项,我不会这样做,因为它几乎总是比关系模型慢,但如果是这样,我' d建议分析两种选项,并在最适合您需求的尺寸和速度之间进行选择。

#4


0  

1.It is based on a Standard: SQLXML, so you can expect other major databases to have similar capabilities.

1.它基于标准:SQLXML,因此您可以期望其他主要数据库具有类似的功能。

2.Queries may use standards such as XPATH

2.Queries可以使用XPATH等标准

3.You can index the data

3.您可以索引数据

4.If you have a schema for data storage (less) and query optimizations is performed based on type information

4.如果您有数据存储架构(更少),则根据类型信息执行查询优化

#5


0  

Cons: If you are storing structured xml data in an xml data field then replication currently will NOT sync changes between publisher and subscriber.

缺点:如果要将结构化xml数据存储在xml数据字段中,则复制当前不会同步发布者和订阅者之间的更改。

e.g. if the subscriber changes an xml element and the publisher changes a different element of the same xml data column then there will be a conflict - one will lose and you have to manually find a solution to the missing data.

例如如果订阅者更改了一个xml元素并且发布者更改了同一个xml数据列的另一个元素,则会发生冲突 - 一个会丢失,您必须手动找到丢失数据的解决方案。

Pros: Many web/desktop applications store their data as xml data types - this can be easily mapped to a sql xml data type.

优点:许多Web /桌面应用程序将其数据存储为xml数据类型 - 这可以轻松映射到sql xml数据类型。

#6


0  

I did some tests to compare insert performance between untyped XML, typed XML, and NVARCHAR(MAX). I found that XML was the fasted and used the least storage on disk. The test that I did, inserted 7,936,510 rows. It used the XSD at https://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd.

我做了一些测试来比较非类型化XML,类型化XML和NVARCHAR(MAX)之间的插入性能。我发现XML是禁食的,并且在磁盘上使用的存储空间最小。我做的测试,插入7,936,510行。它使用了https://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd上的XSD。

I ran the typed XML test twice. The first time took 01:23:26.1355961. The second time I took 01:15:15.5957446. The size on disk was 57,520,685,056.

我运行了两次类型化的XML测试。第一次采取01:23:26.1355961。我第二次参加01:15:15.5957446。磁盘大小为57,520,685,056。

The untyped XML test took 00:48:48.6290364 and was 36,515,610,624 on disk.

无类型的XML测试占用了00:48:48.6290364,在磁盘上为36,515,610,624。

The NVARCHAR(MAX) test took 00:50:22.1841067 and was 72,620,179,456 on disk.

NVARCHAR(MAX)测试时间为00:50:22.1841067,磁盘上为72,620,179,456。

Note, I dropped and recreated the database for each test.

注意,我删除并重新创建了每个测试的数据库。

My take away from this is that it's best to use untyped XML instead of NVARCHAR(MAX) because it uses a lot less disk. Maybe if you just used non-Unicode VARCHAR it would be less of a difference. I'm thinking it's probably using two bytes to store each character. But, also, there is a lot of whitespace in the files. So, that's a lot of wasted storage there. So, that might have had something to do with it.

我对此的看法是,最好使用非类型化的XML而不是NVARCHAR(MAX),因为它使用的磁盘少得多。也许如果您只使用非Unicode VARCHAR,那就不那么重要了。我认为它可能使用两个字节来存储每个字符。但是,文件中还有很多空格。所以那里存在很多浪费的存储空间。所以,这可能与它有关。

I'm not sure how much of the extra slowness associated with using typed XML versus untyped XML is due to the validation, or, if there are other differences. If I remember correctly, I once read that the data is stored in hidden tables relationally. I'm not sure if it does this for both typed and untyped XML.

我不确定使用类型化XML与非类型化XML相关的额外缓慢程度是由于验证,或者,如果存在其他差异。如果我没记错的话,我曾经读过数据存储在隐藏表中的关系。我不确定它是否对类型和非类型化XML都这样做。

I haven't yet tested query performance. I'm assuming it would be faster for typed XML.

我还没有测试过查询性能。我假设键入的XML会更快。

Also, I specified that the typed XML was DOCUMENT, not the default CONTENT.

另外,我指定键入的XML是DOCUMENT,而不是默认的CONTENT。

#1


4  

Yes, you can.

是的你可以。

Now, go on reading the documentation further. The part about better search for XML - you can put an index on a XML field and it will allow you a lot more query syntax specific for XML than a text field because XML fields internally parse the XML.

现在,继续阅读文档。关于更好地搜索XML的部分 - 您可以在XML字段上放置索引,它将允许您比文本字段更多的特定于XML的查询语法,因为XML字段在内部解析XML。

#2


6  

XML datatype supports:

XML数据类型支持:

Besides, using an XML type it will be harder for you to do the typical mistakes junior developers do when handling XML: threat it as a string, mix or ignore encodings like UTF8 and UTF16, ignore namespaces, confuse or ignore processing instructions etc etc.

此外,使用XML类型,你很难做到初级开发人员在处理XML时遇到的典型错误:将其作为字符串威胁,混合或忽略UTF8和UTF16等编码,忽略命名空间,混淆或忽略处理指令等。

Please read XML Best Practices for Microsoft SQL Server 2005

请阅读Microsoft SQL Server 2005的XML最佳实践

#3


3  

Quoted from the below SO post: Microsoft SQL Server 2005/2008: XML vs text/varchar data type

引自以下SO帖子:Microsoft SQL Server 2005/2008:XML与text / varchar数据类型

If you store xml in an xml typed column, the data will not get stored as simple text, as in the nvarchar case, it will be stored in some sort of parsed data tree, which in turn will be smaller than the unparsed xml version. This not only decreases the database size, but gives you other advantages, like validation, easy manipulation etc. (even though you're not using any of these, still, they are there for future use).

如果将xml存储在xml类型的列中,则数据将不会以简单文本形式存储,如在nvarchar情况下,它将存储在某种解析数据树中,而后者将比未解析的xml版本小。这不仅减少了数据库的大小,而且还为您提供了其他优势,例如验证,易于操作等等(即使您没有使用其中任何一个,仍然可以将它们用于将来使用)。

On the other hand, the server will have to parse the data upon insertion, which will probably slow your database down - you have to make a decision of speed vs. size.

另一方面,服务器必须在插入时解析数据,这可能会减慢数据库速度 - 您必须决定速度与大小。

Personally, I think that data in the database should be stored as xml only when it has structure which is hard to implement in a relational model, e.g. layouts, style descriptions etc. Usually that means that there won't be much data and speed is not an issue, thus added xml features, like data validation and manipulation ability (also, last but not least, the ability to click on the value in managment studio and see formatted xml - I really love that feature!), outweight the costs.

就个人而言,我认为数据库中的数据只有在具有难以在关系模型中实现的结构时才应存储为xml。布局,样式描述等。通常这意味着没有太多的数据和速度不是问题,因此添加了xml功能,如数据验证和操作能力(同样,最后但并非最不重要的,点击值的能力在管理工作室,看到格式化的xml - 我真的很喜欢这个功能!),超重成本。

I don't have direct experience in storing large amounts of xml in the database and I wouldn't do that if I had the option, since it is almost always slower that a relational model, but if that would be the case, I'd recommend profiling both options, and choosing between size and speed that best suit your needs.

我没有在数据库中存储大量xml的直接经验,如果我有选项,我不会这样做,因为它几乎总是比关系模型慢,但如果是这样,我' d建议分析两种选项,并在最适合您需求的尺寸和速度之间进行选择。

#4


0  

1.It is based on a Standard: SQLXML, so you can expect other major databases to have similar capabilities.

1.它基于标准:SQLXML,因此您可以期望其他主要数据库具有类似的功能。

2.Queries may use standards such as XPATH

2.Queries可以使用XPATH等标准

3.You can index the data

3.您可以索引数据

4.If you have a schema for data storage (less) and query optimizations is performed based on type information

4.如果您有数据存储架构(更少),则根据类型信息执行查询优化

#5


0  

Cons: If you are storing structured xml data in an xml data field then replication currently will NOT sync changes between publisher and subscriber.

缺点:如果要将结构化xml数据存储在xml数据字段中,则复制当前不会同步发布者和订阅者之间的更改。

e.g. if the subscriber changes an xml element and the publisher changes a different element of the same xml data column then there will be a conflict - one will lose and you have to manually find a solution to the missing data.

例如如果订阅者更改了一个xml元素并且发布者更改了同一个xml数据列的另一个元素,则会发生冲突 - 一个会丢失,您必须手动找到丢失数据的解决方案。

Pros: Many web/desktop applications store their data as xml data types - this can be easily mapped to a sql xml data type.

优点:许多Web /桌面应用程序将其数据存储为xml数据类型 - 这可以轻松映射到sql xml数据类型。

#6


0  

I did some tests to compare insert performance between untyped XML, typed XML, and NVARCHAR(MAX). I found that XML was the fasted and used the least storage on disk. The test that I did, inserted 7,936,510 rows. It used the XSD at https://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd.

我做了一些测试来比较非类型化XML,类型化XML和NVARCHAR(MAX)之间的插入性能。我发现XML是禁食的,并且在磁盘上使用的存储空间最小。我做的测试,插入7,936,510行。它使用了https://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd上的XSD。

I ran the typed XML test twice. The first time took 01:23:26.1355961. The second time I took 01:15:15.5957446. The size on disk was 57,520,685,056.

我运行了两次类型化的XML测试。第一次采取01:23:26.1355961。我第二次参加01:15:15.5957446。磁盘大小为57,520,685,056。

The untyped XML test took 00:48:48.6290364 and was 36,515,610,624 on disk.

无类型的XML测试占用了00:48:48.6290364,在磁盘上为36,515,610,624。

The NVARCHAR(MAX) test took 00:50:22.1841067 and was 72,620,179,456 on disk.

NVARCHAR(MAX)测试时间为00:50:22.1841067,磁盘上为72,620,179,456。

Note, I dropped and recreated the database for each test.

注意,我删除并重新创建了每个测试的数据库。

My take away from this is that it's best to use untyped XML instead of NVARCHAR(MAX) because it uses a lot less disk. Maybe if you just used non-Unicode VARCHAR it would be less of a difference. I'm thinking it's probably using two bytes to store each character. But, also, there is a lot of whitespace in the files. So, that's a lot of wasted storage there. So, that might have had something to do with it.

我对此的看法是,最好使用非类型化的XML而不是NVARCHAR(MAX),因为它使用的磁盘少得多。也许如果您只使用非Unicode VARCHAR,那就不那么重要了。我认为它可能使用两个字节来存储每个字符。但是,文件中还有很多空格。所以那里存在很多浪费的存储空间。所以,这可能与它有关。

I'm not sure how much of the extra slowness associated with using typed XML versus untyped XML is due to the validation, or, if there are other differences. If I remember correctly, I once read that the data is stored in hidden tables relationally. I'm not sure if it does this for both typed and untyped XML.

我不确定使用类型化XML与非类型化XML相关的额外缓慢程度是由于验证,或者,如果存在其他差异。如果我没记错的话,我曾经读过数据存储在隐藏表中的关系。我不确定它是否对类型和非类型化XML都这样做。

I haven't yet tested query performance. I'm assuming it would be faster for typed XML.

我还没有测试过查询性能。我假设键入的XML会更快。

Also, I specified that the typed XML was DOCUMENT, not the default CONTENT.

另外,我指定键入的XML是DOCUMENT,而不是默认的CONTENT。