I have heard from several sources that storing XML in a database is "bad", but I have never seen/heard an actual explanation of why that is. Is it true? If it is true, can you explain why? Moreover, can you tell me what a "good" case for storing XML in a database is?
我从几个来源听说,在数据库中存储XML是“不好的”,但我从未见过/听到过对其原因的真正解释。是真的吗?如果是真的,你能解释一下为什么吗?此外,您能告诉我在数据库中存储XML的“好”情况是什么吗?
6 个解决方案
#1
19
It's not bad at all. Microsoft SQL Server has an XML data type. One use case for storing XML is a situation we found ourselves in. For each row in a particular table, we needed to store a variable number of attributes related to that row. And the number of these attributes can change over time, and with each row. We found it more efficient to store these attributes, and their values in an XML format. In the future, each time we adjust the number of attributes, we don't need to make schema changes.
一点也不坏。Microsoft SQL Server具有XML数据类型。存储XML的一个用例是我们发现自己的情况。对于特定表中的每一行,我们需要存储与该行相关的变量数量的属性。这些属性的数量会随时间变化,每一行也会变化。我们发现将这些属性和它们的值存储在XML格式中更有效。将来,每次我们调整属性的数量时,都不需要进行模式更改。
#2
19
There are some really stupid answers here - just because a database supports a data type does not mean you should be using it. These things are invariably added in as features because the competition have them, not because they are the right thing to do. Global variables? Triggers? Would anyone like to defend them too just because you can use them and they're there?
这里有一些非常愚蠢的答案——仅仅因为数据库支持数据类型并不意味着您应该使用它。这些东西总是作为特性添加进来,因为竞争对手有它们,而不是因为它们是正确的。全局变量?触发器?有没有人愿意仅仅因为你可以使用它们而保护它们?
If you have multiple attributes, the best way to handle them in a relational database is with a one to many relationship. Parse out your useful data from the XML overhead. You then just store the ID (primary key) of the parent record with each of the rows stored in a second table, one row per attribute. You can have any number of attributes per parent record. It's database design 101, nothing clever. Storing it as unstructured XML just to store a variable number of attributes is not the way to go, it's a sledgehammer to crack a peanut. A one to many relationship between two tables is simpler, easier to understand, much faster to query, much less effort coding, and less storage (which means faster queries). Everyone wins, apart from the storage vendors.
如果您有多个属性,那么在关系数据库中处理它们的最佳方式是使用一个到多个关系。从XML开销中解析有用的数据。然后,您只需将父记录的ID(主键)存储在第二个表中,每个属性存储一行。每个父记录可以有任意数量的属性。它是数据库设计101,没什么聪明的。将它存储为非结构化XML仅仅是为了存储可变数量的属性是不可取的,这是一件非常棘手的事情。两个表之间的许多关系更简单,更容易理解,更容易查询,更省力的编码,更少的存储(这意味着更快的查询)。每个人都赢了,除了存储供应商。
XML is a data transfer protocol; as GolezTrol rightly said, "It is a way to export (and import) data" - i.e.: it is simply an overhead used to facilitate the communication of the structure of the data between different systems. Once received, the tags should be stripped out and the data (and only the data) stored in your database engine of choice, whatever that might be. Not the XML itself. The overhead for XML is ~10x that of the data it's describing. Want to tell your boss why that 100GB of data is occupying 1TB of space on your hyper expensive SAN? Or taking all night to back up over a saturated network link? Or causing performance problems in production? If you don't parse out the data from the now pointless tags, you will just push the problem and ongoing, daily support costs onto operational support for the next ten years. Sloppy, sloppy, sloppy. This keeps vendors like EMC in business.
XML是一种数据传输协议;正如GolezTrol所言,“这是一种导出(和导入)数据的方式”——即。:它只是用于促进不同系统之间数据结构的通信的开销。一旦收到,标签应该被删除,并且数据(只有数据)存储在您选择的数据库引擎中,不管它是什么。而不是XML本身。XML的开销是它所描述的数据的10倍。想要告诉你的老板为什么100GB的数据占用了你昂贵的SAN上1TB的空间?还是通宵在一个饱和的网络连接上进行备份?或者在生产中造成性能问题?如果您不从现在毫无意义的标签中解析数据,您将把问题和持续的、每日的支持成本推到未来十年的操作支持上。邋遢,邋遢,草率的。这使得像EMC这样的供应商能够继续经营下去。
XML is metadata. Nothing clever, just a schema descriptor. Once it's transferred and parsed it's lost its usefulness and is just clutter that clogs up whatever database you use. Get rid of it, unless you're compulsively addicted to hording yesterday's pointless crappy description metadata, stored many times over. Wake up. It's typical "Emperor's New Clothes" syndrome, stopped being conned by something simple and disposable. It's only metadata and it should not be stored or worshipped, it's junk once it's parsed. And what's better? To parse it once, or to uselessly parse it every time you need data from it? The answer's pretty darned obvious to me.
XML元数据。没什么聪明的,只是一个模式描述符。一旦它被传输和解析,它就失去了它的有用性,它只是一个杂乱的东西,阻塞了您使用的任何数据库。摆脱它,除非你*沉迷于昨天毫无意义的糟糕的描述元数据,它被存储了很多次。醒来。这是典型的“皇帝的新衣”综合症,不再被简单和一次性的东西欺骗。它只是元数据,不应该被存储或崇拜,一旦被解析,它就是垃圾。更好的是什么?要对它进行一次解析,还是每次需要数据时都对它进行无用的解析?答案对我来说相当明显。
#3
11
Storing XML, JSON, YAML, comma-seperated lists, binary blobs, or anything else in a database is not bad ... per se.
存储XML、JSON、YAML、逗号分隔列表、二进制blob或数据库中的任何其他内容都不坏……本身。
It can indicate a lack of understanding of what a database is for (storing data that is related to other data) and conjures up visions of databases with single column tables called data1
, data2
, etc. ... with each table row holding a +5 MB entry of XML encoded relational data.
它可以表示对数据库的用途(存储与其他数据相关的数据)缺乏了解,并使用称为data1、data2等的单列表来想象数据库的前景。每个表行包含一个+5 MB的XML编码关系数据条目。
On the other hand, there are many valid cases that can be made for such a structure -- rapidly changing configurations might be represented in JSON and stored in a two column table structured like this:
另一方面,对于这种结构有很多有效的情况——快速变化的配置可以用JSON表示,并存储在这样的两个列表中:
dbo.good_table
ApplicationID (bigint)
Configuration (varchar(max))
The difference between the above table and a table like this:
上表与如下表的区别:
dbo.bad_table
ApplicationID (bigint)
ApplicationMembers(xml)
Is that good_table
is enabling rapid access to a piece of data (the configuration), while the bad_table
is using the database as an ofttimes expensive (and slow) hard disk.
good_table支持对数据块(配置)的快速访问,而bad_table则将数据库用作昂贵(且速度较慢)的硬盘。
#4
4
XML is itself a kind storage file. It is most practically used for transportation of data, because it provides a common mechanic for structuring data. There are fixed rules for reading and writing XML that allow XML files to be read by anyone. Also validations and transformation to other output formats are relatively easy (using xslt). XML, however, is not the best way to store data in. It is time consuming to read XML files and they take up relatively much space. It is best to store your data in a structured manner in your database, and export the data from certain queries to XML if you need them in reports, on a website or to pass them to other parties.
XML本身就是一种存储文件。它最实际地用于数据传输,因为它提供了结构化数据的通用机制。读写XML有固定的规则,允许任何人读取XML文件。另外,验证和转换到其他输出格式也相对容易(使用xslt)。然而,XML并不是存储数据的最佳方式。读取XML文件需要时间,而且它们占用的空间相对较大。最好以结构化的方式在数据库中存储数据,如果需要,最好将某些查询中的数据导出到XML,如报表、网站或其他方。
There are XML databases, but they also don't store there data in XML. They merely provide a way to save and load hierarchical data (XML is an hierarchical structure), instead of the standard table structure.
有XML数据库,但它们也不以XML存储数据。它们仅仅提供了一种保存和加载分层数据(XML是一种分层结构)的方法,而不是标准的表结构。
So it is right to say that storing XML content in a blob in a database is generally not the right way to go, but there are always exceptions ofcourse.
因此,将XML内容存储在数据库中的blob中通常不是正确的做法,但当然也有例外。
XML is -in contrast with what others say here- not a way to display data. It is a way to export (and import) data. It is a logical choice for transportation of data. That is because you are totally flexible in the way that you want it to export, it can easily be transformed to other formats. Like, if you have a webshop, and you want to export prices and productinformation to other parties, you could choose XML. These other parties can write easy rules to transform this data to their needs. Neither party has to know the way there prices are stored on the other side, and neither party has to write a complex tool to parse some hard to read binary that someone else has made up.
与此相反,XML并不是一种显示数据的方式。它是一种导出(和导入)数据的方法。它是数据传输的逻辑选择。这是因为您希望它导出的方式完全灵活,可以很容易地将其转换为其他格式。例如,如果您有一个web商店,并且您希望将价格和产品信息导出到其他方,您可以选择XML。其他各方可以编写简单的规则,将这些数据转换为他们的需要。任何一方都不需要知道另一方价格的存储方式,也不需要编写复杂的工具来解析另一方编造的难以读懂的二进制代码。
#5
3
Not, it is not.
不是,它不是。
Actually several databases already have data types for storing XML documents.
实际上,有几个数据库已经有用于存储XML文档的数据类型。
#6
2
I think storing a database would be bad for perhaps speed reasons (parsing etc). However a good case would be that it fits the semi-structured model there are some advantages of this listed here.
我认为存储一个数据库可能会因为速度的原因(解析等等)而不好。然而,一个很好的例子是它适合半结构化模型,这里列出了一些优点。
#1
19
It's not bad at all. Microsoft SQL Server has an XML data type. One use case for storing XML is a situation we found ourselves in. For each row in a particular table, we needed to store a variable number of attributes related to that row. And the number of these attributes can change over time, and with each row. We found it more efficient to store these attributes, and their values in an XML format. In the future, each time we adjust the number of attributes, we don't need to make schema changes.
一点也不坏。Microsoft SQL Server具有XML数据类型。存储XML的一个用例是我们发现自己的情况。对于特定表中的每一行,我们需要存储与该行相关的变量数量的属性。这些属性的数量会随时间变化,每一行也会变化。我们发现将这些属性和它们的值存储在XML格式中更有效。将来,每次我们调整属性的数量时,都不需要进行模式更改。
#2
19
There are some really stupid answers here - just because a database supports a data type does not mean you should be using it. These things are invariably added in as features because the competition have them, not because they are the right thing to do. Global variables? Triggers? Would anyone like to defend them too just because you can use them and they're there?
这里有一些非常愚蠢的答案——仅仅因为数据库支持数据类型并不意味着您应该使用它。这些东西总是作为特性添加进来,因为竞争对手有它们,而不是因为它们是正确的。全局变量?触发器?有没有人愿意仅仅因为你可以使用它们而保护它们?
If you have multiple attributes, the best way to handle them in a relational database is with a one to many relationship. Parse out your useful data from the XML overhead. You then just store the ID (primary key) of the parent record with each of the rows stored in a second table, one row per attribute. You can have any number of attributes per parent record. It's database design 101, nothing clever. Storing it as unstructured XML just to store a variable number of attributes is not the way to go, it's a sledgehammer to crack a peanut. A one to many relationship between two tables is simpler, easier to understand, much faster to query, much less effort coding, and less storage (which means faster queries). Everyone wins, apart from the storage vendors.
如果您有多个属性,那么在关系数据库中处理它们的最佳方式是使用一个到多个关系。从XML开销中解析有用的数据。然后,您只需将父记录的ID(主键)存储在第二个表中,每个属性存储一行。每个父记录可以有任意数量的属性。它是数据库设计101,没什么聪明的。将它存储为非结构化XML仅仅是为了存储可变数量的属性是不可取的,这是一件非常棘手的事情。两个表之间的许多关系更简单,更容易理解,更容易查询,更省力的编码,更少的存储(这意味着更快的查询)。每个人都赢了,除了存储供应商。
XML is a data transfer protocol; as GolezTrol rightly said, "It is a way to export (and import) data" - i.e.: it is simply an overhead used to facilitate the communication of the structure of the data between different systems. Once received, the tags should be stripped out and the data (and only the data) stored in your database engine of choice, whatever that might be. Not the XML itself. The overhead for XML is ~10x that of the data it's describing. Want to tell your boss why that 100GB of data is occupying 1TB of space on your hyper expensive SAN? Or taking all night to back up over a saturated network link? Or causing performance problems in production? If you don't parse out the data from the now pointless tags, you will just push the problem and ongoing, daily support costs onto operational support for the next ten years. Sloppy, sloppy, sloppy. This keeps vendors like EMC in business.
XML是一种数据传输协议;正如GolezTrol所言,“这是一种导出(和导入)数据的方式”——即。:它只是用于促进不同系统之间数据结构的通信的开销。一旦收到,标签应该被删除,并且数据(只有数据)存储在您选择的数据库引擎中,不管它是什么。而不是XML本身。XML的开销是它所描述的数据的10倍。想要告诉你的老板为什么100GB的数据占用了你昂贵的SAN上1TB的空间?还是通宵在一个饱和的网络连接上进行备份?或者在生产中造成性能问题?如果您不从现在毫无意义的标签中解析数据,您将把问题和持续的、每日的支持成本推到未来十年的操作支持上。邋遢,邋遢,草率的。这使得像EMC这样的供应商能够继续经营下去。
XML is metadata. Nothing clever, just a schema descriptor. Once it's transferred and parsed it's lost its usefulness and is just clutter that clogs up whatever database you use. Get rid of it, unless you're compulsively addicted to hording yesterday's pointless crappy description metadata, stored many times over. Wake up. It's typical "Emperor's New Clothes" syndrome, stopped being conned by something simple and disposable. It's only metadata and it should not be stored or worshipped, it's junk once it's parsed. And what's better? To parse it once, or to uselessly parse it every time you need data from it? The answer's pretty darned obvious to me.
XML元数据。没什么聪明的,只是一个模式描述符。一旦它被传输和解析,它就失去了它的有用性,它只是一个杂乱的东西,阻塞了您使用的任何数据库。摆脱它,除非你*沉迷于昨天毫无意义的糟糕的描述元数据,它被存储了很多次。醒来。这是典型的“皇帝的新衣”综合症,不再被简单和一次性的东西欺骗。它只是元数据,不应该被存储或崇拜,一旦被解析,它就是垃圾。更好的是什么?要对它进行一次解析,还是每次需要数据时都对它进行无用的解析?答案对我来说相当明显。
#3
11
Storing XML, JSON, YAML, comma-seperated lists, binary blobs, or anything else in a database is not bad ... per se.
存储XML、JSON、YAML、逗号分隔列表、二进制blob或数据库中的任何其他内容都不坏……本身。
It can indicate a lack of understanding of what a database is for (storing data that is related to other data) and conjures up visions of databases with single column tables called data1
, data2
, etc. ... with each table row holding a +5 MB entry of XML encoded relational data.
它可以表示对数据库的用途(存储与其他数据相关的数据)缺乏了解,并使用称为data1、data2等的单列表来想象数据库的前景。每个表行包含一个+5 MB的XML编码关系数据条目。
On the other hand, there are many valid cases that can be made for such a structure -- rapidly changing configurations might be represented in JSON and stored in a two column table structured like this:
另一方面,对于这种结构有很多有效的情况——快速变化的配置可以用JSON表示,并存储在这样的两个列表中:
dbo.good_table
ApplicationID (bigint)
Configuration (varchar(max))
The difference between the above table and a table like this:
上表与如下表的区别:
dbo.bad_table
ApplicationID (bigint)
ApplicationMembers(xml)
Is that good_table
is enabling rapid access to a piece of data (the configuration), while the bad_table
is using the database as an ofttimes expensive (and slow) hard disk.
good_table支持对数据块(配置)的快速访问,而bad_table则将数据库用作昂贵(且速度较慢)的硬盘。
#4
4
XML is itself a kind storage file. It is most practically used for transportation of data, because it provides a common mechanic for structuring data. There are fixed rules for reading and writing XML that allow XML files to be read by anyone. Also validations and transformation to other output formats are relatively easy (using xslt). XML, however, is not the best way to store data in. It is time consuming to read XML files and they take up relatively much space. It is best to store your data in a structured manner in your database, and export the data from certain queries to XML if you need them in reports, on a website or to pass them to other parties.
XML本身就是一种存储文件。它最实际地用于数据传输,因为它提供了结构化数据的通用机制。读写XML有固定的规则,允许任何人读取XML文件。另外,验证和转换到其他输出格式也相对容易(使用xslt)。然而,XML并不是存储数据的最佳方式。读取XML文件需要时间,而且它们占用的空间相对较大。最好以结构化的方式在数据库中存储数据,如果需要,最好将某些查询中的数据导出到XML,如报表、网站或其他方。
There are XML databases, but they also don't store there data in XML. They merely provide a way to save and load hierarchical data (XML is an hierarchical structure), instead of the standard table structure.
有XML数据库,但它们也不以XML存储数据。它们仅仅提供了一种保存和加载分层数据(XML是一种分层结构)的方法,而不是标准的表结构。
So it is right to say that storing XML content in a blob in a database is generally not the right way to go, but there are always exceptions ofcourse.
因此,将XML内容存储在数据库中的blob中通常不是正确的做法,但当然也有例外。
XML is -in contrast with what others say here- not a way to display data. It is a way to export (and import) data. It is a logical choice for transportation of data. That is because you are totally flexible in the way that you want it to export, it can easily be transformed to other formats. Like, if you have a webshop, and you want to export prices and productinformation to other parties, you could choose XML. These other parties can write easy rules to transform this data to their needs. Neither party has to know the way there prices are stored on the other side, and neither party has to write a complex tool to parse some hard to read binary that someone else has made up.
与此相反,XML并不是一种显示数据的方式。它是一种导出(和导入)数据的方法。它是数据传输的逻辑选择。这是因为您希望它导出的方式完全灵活,可以很容易地将其转换为其他格式。例如,如果您有一个web商店,并且您希望将价格和产品信息导出到其他方,您可以选择XML。其他各方可以编写简单的规则,将这些数据转换为他们的需要。任何一方都不需要知道另一方价格的存储方式,也不需要编写复杂的工具来解析另一方编造的难以读懂的二进制代码。
#5
3
Not, it is not.
不是,它不是。
Actually several databases already have data types for storing XML documents.
实际上,有几个数据库已经有用于存储XML文档的数据类型。
#6
2
I think storing a database would be bad for perhaps speed reasons (parsing etc). However a good case would be that it fits the semi-structured model there are some advantages of this listed here.
我认为存储一个数据库可能会因为速度的原因(解析等等)而不好。然而,一个很好的例子是它适合半结构化模型,这里列出了一些优点。