XML数据和XML元数据有什么区别?

时间:2021-08-23 13:23:01

I'm rebuilding some XML feeds, so I am researching when to use elements and when to use attributes with XML.

我正在重建一些XML feed,所以我正在研究何时使用元素以及何时使用XML属性。

Several sites have said "Data goes in elements, metadata in attributes."

有几个网站称“数据属于元素,元数据属于属性”。

So, what is the difference between the two?

那么,两者有什么区别?

Let's take an example from W3Schools:

让我们以W3Schools为例:

<note date="12/11/2002">
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>

Should the date stay as an attribute of the note element? Or does it make more sense to go into its own element?

日期是否应作为note元素的属性保留?或者进入自己的元素更有意义吗?

<date>12/11/2002</date>

Or, does it make sense for it to be separated into multiple elements?

或者,将它分成多个元素是否有意义?

<date>
  <day>12</day>
  <month>11</month>
  <year>2002</year>
</date>

3 个解决方案

#1


2  

Following the "Data goes in elements, metadata in attributes.", I would have made the Date a child element. You don't need to break it down into day, month, and year, because I think there's actually a way to specify in an XSD that an element must be a Date type. I think an example of "metadata" here would be a noteID field or maybe a noteType. Example:

在“数据进入元素,属性中的元数据。”之后,我会将Date作为子元素。您不需要将其分解为日,月和年,因为我认为实际上有一种方法可以在XSD中指定元素必须是Date类型。我认为这里的“元数据”的例子是noteID字段或者noteType。例:

<note id="NID0001234" type="reminder">
  <date>2002-11-12</date>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>

UPDATE: As many others have pointed out, it can be rather subjective. I try to separate the two by how they will be used. Data will usually be presented to the user, metadata will control the presentation and may be used internally for other purposes. But there are always exceptions...

更新:正如许多其他人所指出的那样,它可能是相当主观的。我尝试将两者分开使用。数据通常将呈现给用户,元数据将控制呈现并且可以在内部用于其他目的。但总有例外......

#2


2  

The distinction between data and metadata is almost entirely subjective. One man's data is another's metadata. The "metadata in attributes" rule grew out of the markup world, where a rule of thumb was, if you remove all of the markup, and just leave the text, it should be a reasonable document. This meant attributes should be discardable, and elements essential. If you display XML in an uncomprehending browser, it will be treated this way.

数据和元数据之间的区别几乎完全是主观的。一个人的数据是另一个人的元数据。 “属性中的元数据”规则源于标记世界,其中经验法则是,如果删除所有标记,并且只留下文本,则它应该是合理的文档。这意味着属性应该是可丢弃的,并且元素必不可少。如果您在一个不理解的浏览器中显示XML,它将以这种方式处理。

But your XML (and most XML these days) likely won't be displayed to the user in an uncomprehending browser, so you can use better rules for how to design your XML.

但是,您的XML(以及目前大多数XML)可能不会在不理解的浏览器中显示给用户,因此您可以使用更好的规则来设计XML。

For example, you can have multiple elements with the same name, but not multiple attributes. And whitespace is ignored in attributes, but not in elements.

例如,您可以拥有多个具有相同名称但不具有多个属性的元素。并且在属性中忽略空格,但在元素中忽略空格。

#3


1  

There are differing views on the principles to use when deciding whether to use an attribute or an element for a piece of data. For example, see this old article from IBM, which lays out a bunch of proposed principles, and then decorates the whole article with a giant caveat that says "there are lots of exceptions and these principles are not intended to be prescriptive" (essentially).

在决定是否对一段数据使用属性或元素时,使用的原则有不同的观点。例如,请参阅IBM的这篇旧文章,其中列出了一系列提议的原则,然后用一个巨大的警告来装饰整篇文章,其中说“有很多例外,这些原则并不是规定的”(基本上) 。

I think the main thing is to be internally consistent. Be consistent within your own world, however large that is. Your "world" could be a single schema - in which you should be consistent in your approach. Every element within that schema should be philosophically consistent. Or your world could be a set of related schema, or it could be all XML documents emitted by a particular company, or even all XML schema used by an industry or technology group.

我认为最重要的是内部一致。在你自己的世界里保持一致,无论多么大。你的“世界”可能是一个单一的模式 - 你应该在你的方法中保持一致。该模式中的每个元素都应该在哲学上保持一致。或者您的世界可以是一组相关的模式,也可以是特定公司发出的所有XML文档,甚至是行业或技术组使用的所有XML模式。

Now, regarding the sample you offered:

现在,关于您提供的样本:

<note date="12/11/2002">  
  <to>Tove</to>  
  <from>Jani</from>  
  <heading>Reminder</heading>  
  <body>Don't forget me this weekend! Remember what happenned last time you forgot!!!</body>  
</note>  

...this seems internally inconsistent because only one piece of data is factored out, and there doesn't seem to be a good reason to do so.

......这似乎内部不一致,因为只有一个数据被排除在外,并且似乎没有充分的理由这样做。

Better if all the items were attributes or all were elements. One exception: the longish body element should probably always be an element. This feels right to me:

如果所有项都是属性或者所有项都是元素,那就更好了一个例外:长身体元素应该总是一个元素。这对我来说是对的:

<note date="12/11/2002" to="Tove" from="Jani" heading="Reminder">
  <body>Don't forget me this weekend! Remember what happenned last time you forgot!!!</body>  
</note>  

Putting the body into an attribute hurts readability, and that recommends putting the body into an element.

将正文放入属性会损害可读性,并建议将正文放入元素中。

Keep in mind that whitespace can be collapsed in attribute values (source: that IBM article I cited); the hard rule that arises from that, is that if whitespace is meaningful, then you should use an element.

请记住,空格可以在属性值中折叠(来源:我引用的IBM文章);由此产生的硬性规则是,如果空白是有意义的,那么你应该使用一个元素。

Now, if the heading in that fragment of xml is something like an email subject, I'd probably factor that out into an element as well, since subjects can be lengthy.

现在,如果xml片段中的标题类似于电子邮件主题,我可能会将其分解为一个元素,因为主题可能很长。

As for your question regarding the month/day/year of the date, yes, factor those things out if you need easy access to these individual data in tools that process the XML. It's easier to search for all notes from before 2009 with an xpath statement that does not have to do string parsing and then string-to-number conversion, if you see what I mean. On the other hand if your use of the XML does not require you to do selects or searches on those individual data (month, day, year), then keep them consolidated into a human-readable form as in your original.

至于你关于日期的月/日/年的问题,是的,如果你需要在处理XML的工具中轻松访问这些单独的数据,那么就要考虑这些问题。如果你明白我的意思,那么使用xpath语句搜索2009年以前的所有笔记会更容易,该语句不必进行字符串解析,然后进行字符串到数字的转换。另一方面,如果您使用XML不要求您选择或搜索这些单独的数据(月,日,年),那么请将它们合并为一个人类可读的形式,就像在原始数据中一样。


tl;dr: There are few firm rules. As long as your use of elements and attributes is consistent, it will be easy for other developers and tools to understand and use.

tl;博士:几乎没有坚定的规则。只要您对元素和属性的使用是一致的,其他开发人员和工具就很容易理解和使用。

#1


2  

Following the "Data goes in elements, metadata in attributes.", I would have made the Date a child element. You don't need to break it down into day, month, and year, because I think there's actually a way to specify in an XSD that an element must be a Date type. I think an example of "metadata" here would be a noteID field or maybe a noteType. Example:

在“数据进入元素,属性中的元数据。”之后,我会将Date作为子元素。您不需要将其分解为日,月和年,因为我认为实际上有一种方法可以在XSD中指定元素必须是Date类型。我认为这里的“元数据”的例子是noteID字段或者noteType。例:

<note id="NID0001234" type="reminder">
  <date>2002-11-12</date>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>

UPDATE: As many others have pointed out, it can be rather subjective. I try to separate the two by how they will be used. Data will usually be presented to the user, metadata will control the presentation and may be used internally for other purposes. But there are always exceptions...

更新:正如许多其他人所指出的那样,它可能是相当主观的。我尝试将两者分开使用。数据通常将呈现给用户,元数据将控制呈现并且可以在内部用于其他目的。但总有例外......

#2


2  

The distinction between data and metadata is almost entirely subjective. One man's data is another's metadata. The "metadata in attributes" rule grew out of the markup world, where a rule of thumb was, if you remove all of the markup, and just leave the text, it should be a reasonable document. This meant attributes should be discardable, and elements essential. If you display XML in an uncomprehending browser, it will be treated this way.

数据和元数据之间的区别几乎完全是主观的。一个人的数据是另一个人的元数据。 “属性中的元数据”规则源于标记世界,其中经验法则是,如果删除所有标记,并且只留下文本,则它应该是合理的文档。这意味着属性应该是可丢弃的,并且元素必不可少。如果您在一个不理解的浏览器中显示XML,它将以这种方式处理。

But your XML (and most XML these days) likely won't be displayed to the user in an uncomprehending browser, so you can use better rules for how to design your XML.

但是,您的XML(以及目前大多数XML)可能不会在不理解的浏览器中显示给用户,因此您可以使用更好的规则来设计XML。

For example, you can have multiple elements with the same name, but not multiple attributes. And whitespace is ignored in attributes, but not in elements.

例如,您可以拥有多个具有相同名称但不具有多个属性的元素。并且在属性中忽略空格,但在元素中忽略空格。

#3


1  

There are differing views on the principles to use when deciding whether to use an attribute or an element for a piece of data. For example, see this old article from IBM, which lays out a bunch of proposed principles, and then decorates the whole article with a giant caveat that says "there are lots of exceptions and these principles are not intended to be prescriptive" (essentially).

在决定是否对一段数据使用属性或元素时,使用的原则有不同的观点。例如,请参阅IBM的这篇旧文章,其中列出了一系列提议的原则,然后用一个巨大的警告来装饰整篇文章,其中说“有很多例外,这些原则并不是规定的”(基本上) 。

I think the main thing is to be internally consistent. Be consistent within your own world, however large that is. Your "world" could be a single schema - in which you should be consistent in your approach. Every element within that schema should be philosophically consistent. Or your world could be a set of related schema, or it could be all XML documents emitted by a particular company, or even all XML schema used by an industry or technology group.

我认为最重要的是内部一致。在你自己的世界里保持一致,无论多么大。你的“世界”可能是一个单一的模式 - 你应该在你的方法中保持一致。该模式中的每个元素都应该在哲学上保持一致。或者您的世界可以是一组相关的模式,也可以是特定公司发出的所有XML文档,甚至是行业或技术组使用的所有XML模式。

Now, regarding the sample you offered:

现在,关于您提供的样本:

<note date="12/11/2002">  
  <to>Tove</to>  
  <from>Jani</from>  
  <heading>Reminder</heading>  
  <body>Don't forget me this weekend! Remember what happenned last time you forgot!!!</body>  
</note>  

...this seems internally inconsistent because only one piece of data is factored out, and there doesn't seem to be a good reason to do so.

......这似乎内部不一致,因为只有一个数据被排除在外,并且似乎没有充分的理由这样做。

Better if all the items were attributes or all were elements. One exception: the longish body element should probably always be an element. This feels right to me:

如果所有项都是属性或者所有项都是元素,那就更好了一个例外:长身体元素应该总是一个元素。这对我来说是对的:

<note date="12/11/2002" to="Tove" from="Jani" heading="Reminder">
  <body>Don't forget me this weekend! Remember what happenned last time you forgot!!!</body>  
</note>  

Putting the body into an attribute hurts readability, and that recommends putting the body into an element.

将正文放入属性会损害可读性,并建议将正文放入元素中。

Keep in mind that whitespace can be collapsed in attribute values (source: that IBM article I cited); the hard rule that arises from that, is that if whitespace is meaningful, then you should use an element.

请记住,空格可以在属性值中折叠(来源:我引用的IBM文章);由此产生的硬性规则是,如果空白是有意义的,那么你应该使用一个元素。

Now, if the heading in that fragment of xml is something like an email subject, I'd probably factor that out into an element as well, since subjects can be lengthy.

现在,如果xml片段中的标题类似于电子邮件主题,我可能会将其分解为一个元素,因为主题可能很长。

As for your question regarding the month/day/year of the date, yes, factor those things out if you need easy access to these individual data in tools that process the XML. It's easier to search for all notes from before 2009 with an xpath statement that does not have to do string parsing and then string-to-number conversion, if you see what I mean. On the other hand if your use of the XML does not require you to do selects or searches on those individual data (month, day, year), then keep them consolidated into a human-readable form as in your original.

至于你关于日期的月/日/年的问题,是的,如果你需要在处理XML的工具中轻松访问这些单独的数据,那么就要考虑这些问题。如果你明白我的意思,那么使用xpath语句搜索2009年以前的所有笔记会更容易,该语句不必进行字符串解析,然后进行字符串到数字的转换。另一方面,如果您使用XML不要求您选择或搜索这些单独的数据(月,日,年),那么请将它们合并为一个人类可读的形式,就像在原始数据中一样。


tl;dr: There are few firm rules. As long as your use of elements and attributes is consistent, it will be easy for other developers and tools to understand and use.

tl;博士:几乎没有坚定的规则。只要您对元素和属性的使用是一致的,其他开发人员和工具就很容易理解和使用。