What problems was XML invented to solve? From what I can tell, it seems like it specifies a uniform syntax for things that may have vastly different semantics. Unlike, for example, an HTML file, a Java source file, or a .docx document, one cannot write a program to extract any kind of high-level meaning from an XML file without lots of additional information. What is the value of having the syntax rigidly specified by some standards committee even when the semantic meaning is completely unspecified? What advantages does XML have over just rolling your own ad-hoc format that does exactly what you need and nothing more? In short, what does XML accomplish and why is it so widely used?
XML发明要解决哪些问题?从我所知,它似乎为可能具有非常不同的语义的事物指定了统一的语法。例如,与HTML文件,Java源文件或.docx文档不同,人们无法编写程序以从XML文件中提取任何类型的高级含义而无需额外的信息。即使语义意义完全未指定,某些标准委员会严格指定语法的价值是什么? XML只有滚动您自己的ad-hoc格式才能完全满足您的需求,仅此而已?简而言之,XML可以实现什么?为什么它如此广泛使用?
5 个解决方案
#1
15
XML forces your data to be well-structured, so that a program which does not understand the semantics of your data will still be able to understand its syntax. This allows things like XSLT, which will transform one well-formed XML document into another. It means that you can manipulate data without having to interpret it. You can see the document is well-formed and valid according to its DTD without needing to understand the contents.
XML强制您的数据结构良好,因此不理解数据语义的程序仍然能够理解其语法。这允许像XSLT这样的东西,它将一个格式良好的XML文档转换为另一个。这意味着您可以在不必解释数据的情况下操纵数据。您可以根据其DTD看到文档格式正确且有效,而无需了解内容。
This was a huge step forward for data storage, interoperability, and machine-readability in general.
这对于数据存储,互操作性和机器可读性而言是一个巨大的进步。
#2
7
I personally find XML to be useful because I find writing parsers to be a pain. If you invent your own data format that is what you wind up spending a lot of your time writing parsing code - checking for correct input in what could be a lot of user data. Then after you get all the input and validity checking code completed for your parser, you then have the joy of developing documentation for your file format for anyone else who wants to use it, plus the further joy of finding bugs in your input validation code for your parser after they start sending data your way.
我个人认为XML很有用,因为我发现编写解析器很痛苦。如果您发明自己的数据格式,那么您需要花费大量时间编写解析代码 - 检查可能是大量用户数据的正确输入。然后,在为解析器完成所有输入和有效性检查代码之后,您就可以为其他任何想要使用它的人开发文件格式的文档,还可以在输入验证代码中查找错误。你的解析器开始以你的方式发送数据。
With XML the parsing mechanics are well defined, and with XML schema or DTDs you can specify the formats you are willing to accept. XML parsers are available for almost every major programming language, so you the amount of code you have to write, maintain, and document is greatly reduced.
使用XML,可以很好地定义解析机制,使用XML模式或DTD,您可以指定您愿意接受的格式。 XML解析器几乎适用于所有主要编程语言,因此您编写,维护和文档所需的代码量将大大减少。
#3
6
xml lets you be non-standard in a standard way :). It's ugly, it's verbose, it takes up a lot of space and it's absolutely invaluable for interoperability. Basically, xml is nice because it gives you a standard way of describing your data so that a single type of parser can handle data from disparate sources.
xml让你以标准的方式成为非标准的:)。它很丑陋,它很冗长,它占用了大量空间,对于互操作性来说绝对是无价之宝。基本上,xml很不错,因为它为您提供了一种描述数据的标准方法,以便单一类型的解析器可以处理来自不同来源的数据。
To use a more concrete example, I used to work in the semiconductor tool industry in the days before xml. Each tool used a recipe to describe how to process a particular wafer. Every one of those tools used a different format for their recipes. Now, pity the poor person (me!) who had to take several of those tools and integrate them into a single processing system. I had to write a different parser for each recipe type, convert recipes from a common store into the format appropriate for a particular tool, it was just a nightmare. If xml had been available, all those recipes could have been defined via xml and any conversions or transformations handled with simple xlst scripts. It would have saved me literally months of development effort just for that portion of the integration code.
为了使用更具体的例子,我曾经在xml之前的几天从事半导体工具行业。每个工具都使用配方来描述如何处理特定晶圆。这些工具中的每一个都使用不同的格式来制作食谱。现在,怜悯穷人(我!),他们必须使用这些工具并将它们集成到一个处理系统中。我必须为每种食谱类型编写不同的解析器,将食谱从普通商店转换为适合特定工具的格式,这只是一场噩梦。如果xml可用,则可以通过xml定义所有这些配方,并使用简单的xlst脚本处理任何转换或转换。它只会为集成代码的那一部分节省数月的开发工作量。
#4
4
Ad hoc solutions work fine within the confines of your own system, but when you need the ability to communicate with 1...N other systems, it's a good foundation that all parties can rely on to work at a minimum in a certain way. Yes, the data has no semantic meaning, but you're assured that the TRANSFER and CONVERSION of data will still be successful. There's many more reasons, but that's one of the most important I've always thought.
临时解决方案在您自己的系统范围内工作正常,但是当您需要能够与1 ... N个其他系统进行通信时,它是一个良好的基础,所有各方都可以依赖于以某种方式工作。是的,数据没有语义含义,但您确信TRANSFER和CONVERSION数据仍然会成功。还有更多的原因,但这是我一直以来最重要的原因之一。
This is a very primitive example but think of when systems used to communicate with flatfile data. You could have had a string that other parties had built communication around such as AAABBBCCCDDD. Other systems knew that they would get AAA "data" in the first 3 characters etc... Now someone changes something on your side and accidentally starts sending BBB AAA CCC DDD. Boom, everything is broken.
这是一个非常原始的例子,但想想系统何时用于与flatfile数据通信。你可能有一个其他方已建立通信的字符串,如AAABBBCCCDDD。其他系统知道他们会在前3个字符等中获得AAA“数据”......现在有人改变了你身边的东西并意外地开始发送BBB AAA CCC DDD。热潮,一切都破了。
With XML you could have both:
使用XML,您可以同时拥有:
<xml>
<a>AAA</a>
<b>BBB</b>
<c>CCC</c>
<d>DDD</d>
</xml>
AND
和
<xml>
<b>BBB</b>
<a>AAA</a>
<c>CCC</c>
<d>DDD</d>
</xml>
without breaking someone elses system.
没有打破别人的系统。
#5
3
The answer is in your own question. "From what I can tell, it seems like it specifies a uniform syntax for things that may have vastly different semantics." Having a uniform syntax solves part of the problem for things that have vastly different semantics, and it's not a trivial problem in the slightest.
答案在你自己的问题中。 “从我所知道的情况来看,似乎它为可能具有截然不同的语义的事物指定了统一的语法。”具有统一的语法可以解决部分问题,因为它具有截然不同的语义,并且丝毫不是一个微不足道的问题。
Similarly, text-encoding is used in markup (including XML), computer programs, writing human-readable documents and many more tasks with vastly different semantics. Would you like to reinvent Unicode every single time? Would you even know enough about all the issues to have a chance of doing so (or even a chance of re-inventing a passable ASCII?, ASCII only seems simple these days because so many of the complicated features of its control codes are no longer used, old school ASCII uses are often way more complicated than Unicode).
类似地,文本编码用于标记(包括XML),计算机程序,编写人类可读的文档以及具有截然不同的语义的更多任务。您想每次都重新发明Unicode吗?你是否对所有问题都有足够的了解(或者甚至有可能重新发明一个可通过的ASCII?)这几天ASCII看起来很简单,因为它的控制代码的许多复杂功能已经不复存在了使用过,旧学校的ASCII使用通常比Unicode更复杂)。
Numbers are used all over the place in computing, and we still have four different internal syntaxes in use (two endian styles, two complement styles) though the details are generally hidden these days.
数字在计算中被广泛使用,我们仍然使用四种不同的内部语法(两种端序样式,两种补充样式),尽管这些天通常隐藏细节。
As well as doing one chunk of the work of the creator of the format for them, and demonstrating one chunk of the work for the producer or consumer is one they are already familiar with (and hence may already have tools for), it completely eliminates one chunk of the work for a producer-consumer who is reading in one format and writing in another.
除了为他们做格式创建者的一大部分工作,并为生产者或消费者展示一大块工作是他们已经熟悉的(因此可能已经有工具),它完全消除了生产者 - 消费者的一大部分工作是以一种格式阅读并在另一种格式中书写。
#1
15
XML forces your data to be well-structured, so that a program which does not understand the semantics of your data will still be able to understand its syntax. This allows things like XSLT, which will transform one well-formed XML document into another. It means that you can manipulate data without having to interpret it. You can see the document is well-formed and valid according to its DTD without needing to understand the contents.
XML强制您的数据结构良好,因此不理解数据语义的程序仍然能够理解其语法。这允许像XSLT这样的东西,它将一个格式良好的XML文档转换为另一个。这意味着您可以在不必解释数据的情况下操纵数据。您可以根据其DTD看到文档格式正确且有效,而无需了解内容。
This was a huge step forward for data storage, interoperability, and machine-readability in general.
这对于数据存储,互操作性和机器可读性而言是一个巨大的进步。
#2
7
I personally find XML to be useful because I find writing parsers to be a pain. If you invent your own data format that is what you wind up spending a lot of your time writing parsing code - checking for correct input in what could be a lot of user data. Then after you get all the input and validity checking code completed for your parser, you then have the joy of developing documentation for your file format for anyone else who wants to use it, plus the further joy of finding bugs in your input validation code for your parser after they start sending data your way.
我个人认为XML很有用,因为我发现编写解析器很痛苦。如果您发明自己的数据格式,那么您需要花费大量时间编写解析代码 - 检查可能是大量用户数据的正确输入。然后,在为解析器完成所有输入和有效性检查代码之后,您就可以为其他任何想要使用它的人开发文件格式的文档,还可以在输入验证代码中查找错误。你的解析器开始以你的方式发送数据。
With XML the parsing mechanics are well defined, and with XML schema or DTDs you can specify the formats you are willing to accept. XML parsers are available for almost every major programming language, so you the amount of code you have to write, maintain, and document is greatly reduced.
使用XML,可以很好地定义解析机制,使用XML模式或DTD,您可以指定您愿意接受的格式。 XML解析器几乎适用于所有主要编程语言,因此您编写,维护和文档所需的代码量将大大减少。
#3
6
xml lets you be non-standard in a standard way :). It's ugly, it's verbose, it takes up a lot of space and it's absolutely invaluable for interoperability. Basically, xml is nice because it gives you a standard way of describing your data so that a single type of parser can handle data from disparate sources.
xml让你以标准的方式成为非标准的:)。它很丑陋,它很冗长,它占用了大量空间,对于互操作性来说绝对是无价之宝。基本上,xml很不错,因为它为您提供了一种描述数据的标准方法,以便单一类型的解析器可以处理来自不同来源的数据。
To use a more concrete example, I used to work in the semiconductor tool industry in the days before xml. Each tool used a recipe to describe how to process a particular wafer. Every one of those tools used a different format for their recipes. Now, pity the poor person (me!) who had to take several of those tools and integrate them into a single processing system. I had to write a different parser for each recipe type, convert recipes from a common store into the format appropriate for a particular tool, it was just a nightmare. If xml had been available, all those recipes could have been defined via xml and any conversions or transformations handled with simple xlst scripts. It would have saved me literally months of development effort just for that portion of the integration code.
为了使用更具体的例子,我曾经在xml之前的几天从事半导体工具行业。每个工具都使用配方来描述如何处理特定晶圆。这些工具中的每一个都使用不同的格式来制作食谱。现在,怜悯穷人(我!),他们必须使用这些工具并将它们集成到一个处理系统中。我必须为每种食谱类型编写不同的解析器,将食谱从普通商店转换为适合特定工具的格式,这只是一场噩梦。如果xml可用,则可以通过xml定义所有这些配方,并使用简单的xlst脚本处理任何转换或转换。它只会为集成代码的那一部分节省数月的开发工作量。
#4
4
Ad hoc solutions work fine within the confines of your own system, but when you need the ability to communicate with 1...N other systems, it's a good foundation that all parties can rely on to work at a minimum in a certain way. Yes, the data has no semantic meaning, but you're assured that the TRANSFER and CONVERSION of data will still be successful. There's many more reasons, but that's one of the most important I've always thought.
临时解决方案在您自己的系统范围内工作正常,但是当您需要能够与1 ... N个其他系统进行通信时,它是一个良好的基础,所有各方都可以依赖于以某种方式工作。是的,数据没有语义含义,但您确信TRANSFER和CONVERSION数据仍然会成功。还有更多的原因,但这是我一直以来最重要的原因之一。
This is a very primitive example but think of when systems used to communicate with flatfile data. You could have had a string that other parties had built communication around such as AAABBBCCCDDD. Other systems knew that they would get AAA "data" in the first 3 characters etc... Now someone changes something on your side and accidentally starts sending BBB AAA CCC DDD. Boom, everything is broken.
这是一个非常原始的例子,但想想系统何时用于与flatfile数据通信。你可能有一个其他方已建立通信的字符串,如AAABBBCCCDDD。其他系统知道他们会在前3个字符等中获得AAA“数据”......现在有人改变了你身边的东西并意外地开始发送BBB AAA CCC DDD。热潮,一切都破了。
With XML you could have both:
使用XML,您可以同时拥有:
<xml>
<a>AAA</a>
<b>BBB</b>
<c>CCC</c>
<d>DDD</d>
</xml>
AND
和
<xml>
<b>BBB</b>
<a>AAA</a>
<c>CCC</c>
<d>DDD</d>
</xml>
without breaking someone elses system.
没有打破别人的系统。
#5
3
The answer is in your own question. "From what I can tell, it seems like it specifies a uniform syntax for things that may have vastly different semantics." Having a uniform syntax solves part of the problem for things that have vastly different semantics, and it's not a trivial problem in the slightest.
答案在你自己的问题中。 “从我所知道的情况来看,似乎它为可能具有截然不同的语义的事物指定了统一的语法。”具有统一的语法可以解决部分问题,因为它具有截然不同的语义,并且丝毫不是一个微不足道的问题。
Similarly, text-encoding is used in markup (including XML), computer programs, writing human-readable documents and many more tasks with vastly different semantics. Would you like to reinvent Unicode every single time? Would you even know enough about all the issues to have a chance of doing so (or even a chance of re-inventing a passable ASCII?, ASCII only seems simple these days because so many of the complicated features of its control codes are no longer used, old school ASCII uses are often way more complicated than Unicode).
类似地,文本编码用于标记(包括XML),计算机程序,编写人类可读的文档以及具有截然不同的语义的更多任务。您想每次都重新发明Unicode吗?你是否对所有问题都有足够的了解(或者甚至有可能重新发明一个可通过的ASCII?)这几天ASCII看起来很简单,因为它的控制代码的许多复杂功能已经不复存在了使用过,旧学校的ASCII使用通常比Unicode更复杂)。
Numbers are used all over the place in computing, and we still have four different internal syntaxes in use (two endian styles, two complement styles) though the details are generally hidden these days.
数字在计算中被广泛使用,我们仍然使用四种不同的内部语法(两种端序样式,两种补充样式),尽管这些天通常隐藏细节。
As well as doing one chunk of the work of the creator of the format for them, and demonstrating one chunk of the work for the producer or consumer is one they are already familiar with (and hence may already have tools for), it completely eliminates one chunk of the work for a producer-consumer who is reading in one format and writing in another.
除了为他们做格式创建者的一大部分工作,并为生产者或消费者展示一大块工作是他们已经熟悉的(因此可能已经有工具),它完全消除了生产者 - 消费者的一大部分工作是以一种格式阅读并在另一种格式中书写。