Why do we need text inside end tags in XML? I understand why end tags are present. But having text is redundant since software's can recognize that the current last opened tag is ending when they encounter . By removing text inside the end tags we can save approximately 1/4th of the data consumed by the files. This would results in saving of billions of bytes on global level.
为什么我们需要XML中的结束标记内的文本?我理解为什么存在最终标签。但是文本是多余的,因为软件可以识别当前最后打开的标签在遇到时结束。通过删除结束标记内的文本,我们可以节省文件消耗的大约1/4的数据。这将导致在全局级别上节省数十亿字节。
Why can't we use this format
为什么我们不能使用这种格式
<CD>
<TITLE>Empire Burlesque</>
</>
instead of
代替
<CD>
<TITLE>Empire Burlesque</TITLE>
</CD>
1 个解决方案
#1
1
I take the question to be "Why did the designers of XML not allow empty end-tags? The element type name in the end-tag is redundant; why is it required?"
我提出的问题是“为什么XML的设计者不允许空的结束标记?结束标记中的元素类型名称是多余的;为什么需要它?”
Yes, it's redundant. I believe the designers of XML chose to require the element type name in the end-tag because both of the obvious alternatives seemed to have problems of their own.
是的,这是多余的。我相信XML的设计者选择在end-tag中要求元素类型名称,因为两个明显的替代品似乎都有自己的问题。
-
Requiring the use of empty end-tags (which would have the form
</>
, following the syntax of SGML) would lead to confusion and errors whenever the start- and end-tag were more than a few lines apart, as Kevin Brown has already pointed out in a comment. (This was certainly my experience in ten years of using SGML, and my recollection is that others reported similar views.)需要使用空的end-tag(格式为,遵循SGML的语法),只要开始和结束标记分开几行就会导致混淆和错误,正如Kevin Brown所说的那样已在评论中指出。 (这肯定是我使用SGML十年的经验,而我的回忆是其他人报告了类似的观点。)
-
Making the element type name optional in the end-tag would make the spec more complex. Not very much more complex, but perceptibly.
在end-tag中使元素类型名称可选会使规范更复杂。不是很复杂,但可以理解。
Also, the drawback of the extra bytes just did not (and does not) seem important. One of the initial design principles for XML (see the spec) was:
此外,额外字节的缺点似乎没有(也没有)似乎很重要。 XML的初始设计原则之一(参见规范)是:
- Terseness in XML markup is of minimal importance.
- XML标记中的Terseness具有最低限度的重要性。
I think you overestimate the cost of the syntactic rule in question. Using empty end-tags will save 1/4 of the bytes in an XML document in cases where about half the bytes in the document are start- or end-tags and none of the elements have attributes; if any of the elements have attributes, the markup will need to be more than half of the document size. There are documents like that, but in my experience they are rather rare. Even in the example data you give, using empty end-tags would not save 1/4 of the bytes (7 out of 44 is 1/6, not 1/4).
我认为你高估了有问题的句法规则的代价。如果文档中大约一半的字节是开始或结束标记,并且没有元素具有属性,则使用空的结束标记将节省1/4的字节在XML文档中;如果任何元素具有属性,则标记将需要超过文档大小的一半。有这样的文件,但根据我的经验,它们是相当罕见的。即使在您给出的示例数据中,使用空的结束标记也不会节省1/4的字节(44个中的7个是1/6,而不是1/4)。
If file size were really important, and worth making serious efforts to minimize, then word-processor formats like Word, or rendering formats like PDF, would be much less popular than they are, because a typical human-readable document will be two to ten times larger as a Word file or PDF file than as an XML document. Are Word and PDF dying out because they make documents so much larger than they would be in a more compact format like XML?
如果文件大小非常重要,并且值得认真努力以最小化,那么像Word这样的字处理器格式或像PDF这样的渲染格式将比它们更不受欢迎,因为典型的人类可读文档将是2到10个作为Word文件或PDF文件的时间大于XML文档的时间。 Word和PDF是否因为它们使文档比XML更紧凑的格式更大而消失?
Given the relative unimportance of file size in a world where disk capacity continues to grow faster than anything else in computing, and the obvious utility of the redundancy in helping diagnose syntactic errors or data corruption in an XML data stream, the designers of XML made a choice that seemed reasonable to them at the time. It does not seem any less reasonable now.
鉴于磁盘容量持续增长的速度比计算中的任何其他东西都快,文件大小相对不重要,以及冗余在帮助诊断XML数据流中的语法错误或数据损坏方面的明显效用,XML的设计者做了一个当时对他们来说似乎合理的选择。现在似乎没那么合理了。
#1
1
I take the question to be "Why did the designers of XML not allow empty end-tags? The element type name in the end-tag is redundant; why is it required?"
我提出的问题是“为什么XML的设计者不允许空的结束标记?结束标记中的元素类型名称是多余的;为什么需要它?”
Yes, it's redundant. I believe the designers of XML chose to require the element type name in the end-tag because both of the obvious alternatives seemed to have problems of their own.
是的,这是多余的。我相信XML的设计者选择在end-tag中要求元素类型名称,因为两个明显的替代品似乎都有自己的问题。
-
Requiring the use of empty end-tags (which would have the form
</>
, following the syntax of SGML) would lead to confusion and errors whenever the start- and end-tag were more than a few lines apart, as Kevin Brown has already pointed out in a comment. (This was certainly my experience in ten years of using SGML, and my recollection is that others reported similar views.)需要使用空的end-tag(格式为,遵循SGML的语法),只要开始和结束标记分开几行就会导致混淆和错误,正如Kevin Brown所说的那样已在评论中指出。 (这肯定是我使用SGML十年的经验,而我的回忆是其他人报告了类似的观点。)
-
Making the element type name optional in the end-tag would make the spec more complex. Not very much more complex, but perceptibly.
在end-tag中使元素类型名称可选会使规范更复杂。不是很复杂,但可以理解。
Also, the drawback of the extra bytes just did not (and does not) seem important. One of the initial design principles for XML (see the spec) was:
此外,额外字节的缺点似乎没有(也没有)似乎很重要。 XML的初始设计原则之一(参见规范)是:
- Terseness in XML markup is of minimal importance.
- XML标记中的Terseness具有最低限度的重要性。
I think you overestimate the cost of the syntactic rule in question. Using empty end-tags will save 1/4 of the bytes in an XML document in cases where about half the bytes in the document are start- or end-tags and none of the elements have attributes; if any of the elements have attributes, the markup will need to be more than half of the document size. There are documents like that, but in my experience they are rather rare. Even in the example data you give, using empty end-tags would not save 1/4 of the bytes (7 out of 44 is 1/6, not 1/4).
我认为你高估了有问题的句法规则的代价。如果文档中大约一半的字节是开始或结束标记,并且没有元素具有属性,则使用空的结束标记将节省1/4的字节在XML文档中;如果任何元素具有属性,则标记将需要超过文档大小的一半。有这样的文件,但根据我的经验,它们是相当罕见的。即使在您给出的示例数据中,使用空的结束标记也不会节省1/4的字节(44个中的7个是1/6,而不是1/4)。
If file size were really important, and worth making serious efforts to minimize, then word-processor formats like Word, or rendering formats like PDF, would be much less popular than they are, because a typical human-readable document will be two to ten times larger as a Word file or PDF file than as an XML document. Are Word and PDF dying out because they make documents so much larger than they would be in a more compact format like XML?
如果文件大小非常重要,并且值得认真努力以最小化,那么像Word这样的字处理器格式或像PDF这样的渲染格式将比它们更不受欢迎,因为典型的人类可读文档将是2到10个作为Word文件或PDF文件的时间大于XML文档的时间。 Word和PDF是否因为它们使文档比XML更紧凑的格式更大而消失?
Given the relative unimportance of file size in a world where disk capacity continues to grow faster than anything else in computing, and the obvious utility of the redundancy in helping diagnose syntactic errors or data corruption in an XML data stream, the designers of XML made a choice that seemed reasonable to them at the time. It does not seem any less reasonable now.
鉴于磁盘容量持续增长的速度比计算中的任何其他东西都快,文件大小相对不重要,以及冗余在帮助诊断XML数据流中的语法错误或数据损坏方面的明显效用,XML的设计者做了一个当时对他们来说似乎合理的选择。现在似乎没那么合理了。