如何从html属性值中删除字符?

时间:2022-01-13 18:41:17

According to the author of htmlcompressor.com this can not be done as they have semantic meaning.

根据htmlcompressor.com的作者,由于它们具有语义含义,因此无法完成。

Here is the particular example:

这是一个特例:

<meta name='description' content='Foo lets you save and share all your 
  web bookmarks / favorites in one place. It is free with no advertising for life, and 
  has straight forward privacy controls.'>

removing the return characters you have:

删除你拥有的返回字符:

<meta name='description' content='Foo lets you save and share all your web bookmarks / favorites in one place. It is free with no advertising for life, and has straight forward privacy controls.'>

which is a single line which is what I want to send to the browser.

这是我要发送到浏览器的单行。

I want to do this for all my HTML using some string manipulation. Is this possible to do or are there other cases where a return character has meaning? Is there a way to differentiate?

我想使用一些字符串操作为我的所有HTML执行此操作。这是可能做的还是其他情况下返回字符有意义?有没有办法区分?

2 个解决方案

#1


2  

According to the HTML4.01 specification ( http://www.w3.org/TR/html4/struct/global.html#h-7.4.4.2 ), the content="" attribute of the <meta /> element is CDATA, which means that whitespace is not significant:

根据HTML4.01规范(http://www.w3.org/TR/html4/struct/global.html#h-7.4.4.2), 元素的content =“”属性是CDATA ,这意味着空白并不重要:

CDATA is a sequence of characters from the document character set and may include character entities. User agents should interpret attribute values as follows:

CDATA是文档字符集中的字符序列,可以包括字符实体。用户代理应解释属性值,如下所示:

  • Replace character entities with characters,
  • 用字符替换字符实体,
  • Ignore line feeds,
  • 忽略换行,
  • Replace each carriage return or tab with a single space.
  • 用一个空格替换每个回车或标签。
  • User agents may ignore leading and trailing white space in CDATA attribute values (e.g., " myval " may be interpreted as "myval"). Authors should not declare attribute values with leading or trailing white space.
  • 用户代理可以忽略CDATA属性值中的前导和尾随空格(例如,“myval”可以被解释为“myval”)。作者不应声明带有前导或尾随空格的属性值。

So it looks like the author of htmlcompression is wrong.

所以看起来htmlcompression的作者是错误的。

Anyway, despite dire warnings to the contrary, you can probably get-away with using a regular expression to fix this.

无论如何,尽管有相反的严厉警告,你可能会使用正则表达式解决这个问题。

I've forgotten the syntax to combine "match only this group, and replace in this sub-region" in regex, but this hack works:

我忘记了在正则表达式中组合“仅匹配此组,并在此子区域中替换”的语法,但此hack有效:

This simple regex will capture the content of the content="" attribute:

这个简单的正则表达式将捕获content =“”属性的内容:

<meta.+content='(.*)'>

Once you've got the content, you can do a straightforward '\r', '\n', ' ' -> ' ' replacement.

获得内容后,您可以直接进行'\ r','\ n','' - >''替换。

#2


0  

Whenever the specification is correct about content attribute being CDATA, a webmaster may use the value of any attribute such as "content" of the "meta" tag in the given example via JavaScript, and compressing the value of the attribute would alter the expected result.

每当关于内容属性为CDATA的规范是正确的时,网站管理员可以通过JavaScript使用任何属性的值,例如给定示例中“meta”标记的“content”,并且压缩属性的值将改变预期结果。

So the author of htmlcompressor.com is correct in that they have a semantic meaning for the purpose of compression.

所以htmlcompressor.com的作者是正确的,因为它们具有压缩目的的语义含义。

<meta id="m1" name="item1" content="Sample stuff:

  1. This text is multiline on purpose.
  2. And the author expects it to remain this way after compression.

  So yes, it does matter...">

The same meta tag compressed:

压缩的相同元标记:

<meta id="m2" name="item2" content="Sample stuff: 1. This text is multiline on purpose. 2. And the author expects it to remain this way after compression. So yes, it does matter...">

And to show the difference:

并显示差异:

<script>
  alert('"'
      + document.getElementById('m1').content
      + '"\n\n---------------\n\n"'
      + document.getElementById('m2').content + '"'
  );
</script>

Afaik, the goal of that site is to compress documents without altering the resulting layout or functionality.

Afaik,该网站的目标是压缩文档而不改变最终的布局或功能。

Live example: http://jsfiddle.net/7Qb74/

实例:http://jsfiddle.net/7Qb74/

#1


2  

According to the HTML4.01 specification ( http://www.w3.org/TR/html4/struct/global.html#h-7.4.4.2 ), the content="" attribute of the <meta /> element is CDATA, which means that whitespace is not significant:

根据HTML4.01规范(http://www.w3.org/TR/html4/struct/global.html#h-7.4.4.2), 元素的content =“”属性是CDATA ,这意味着空白并不重要:

CDATA is a sequence of characters from the document character set and may include character entities. User agents should interpret attribute values as follows:

CDATA是文档字符集中的字符序列,可以包括字符实体。用户代理应解释属性值,如下所示:

  • Replace character entities with characters,
  • 用字符替换字符实体,
  • Ignore line feeds,
  • 忽略换行,
  • Replace each carriage return or tab with a single space.
  • 用一个空格替换每个回车或标签。
  • User agents may ignore leading and trailing white space in CDATA attribute values (e.g., " myval " may be interpreted as "myval"). Authors should not declare attribute values with leading or trailing white space.
  • 用户代理可以忽略CDATA属性值中的前导和尾随空格(例如,“myval”可以被解释为“myval”)。作者不应声明带有前导或尾随空格的属性值。

So it looks like the author of htmlcompression is wrong.

所以看起来htmlcompression的作者是错误的。

Anyway, despite dire warnings to the contrary, you can probably get-away with using a regular expression to fix this.

无论如何,尽管有相反的严厉警告,你可能会使用正则表达式解决这个问题。

I've forgotten the syntax to combine "match only this group, and replace in this sub-region" in regex, but this hack works:

我忘记了在正则表达式中组合“仅匹配此组,并在此子区域中替换”的语法,但此hack有效:

This simple regex will capture the content of the content="" attribute:

这个简单的正则表达式将捕获content =“”属性的内容:

<meta.+content='(.*)'>

Once you've got the content, you can do a straightforward '\r', '\n', ' ' -> ' ' replacement.

获得内容后,您可以直接进行'\ r','\ n','' - >''替换。

#2


0  

Whenever the specification is correct about content attribute being CDATA, a webmaster may use the value of any attribute such as "content" of the "meta" tag in the given example via JavaScript, and compressing the value of the attribute would alter the expected result.

每当关于内容属性为CDATA的规范是正确的时,网站管理员可以通过JavaScript使用任何属性的值,例如给定示例中“meta”标记的“content”,并且压缩属性的值将改变预期结果。

So the author of htmlcompressor.com is correct in that they have a semantic meaning for the purpose of compression.

所以htmlcompressor.com的作者是正确的,因为它们具有压缩目的的语义含义。

<meta id="m1" name="item1" content="Sample stuff:

  1. This text is multiline on purpose.
  2. And the author expects it to remain this way after compression.

  So yes, it does matter...">

The same meta tag compressed:

压缩的相同元标记:

<meta id="m2" name="item2" content="Sample stuff: 1. This text is multiline on purpose. 2. And the author expects it to remain this way after compression. So yes, it does matter...">

And to show the difference:

并显示差异:

<script>
  alert('"'
      + document.getElementById('m1').content
      + '"\n\n---------------\n\n"'
      + document.getElementById('m2').content + '"'
  );
</script>

Afaik, the goal of that site is to compress documents without altering the resulting layout or functionality.

Afaik,该网站的目标是压缩文档而不改变最终的布局或功能。

Live example: http://jsfiddle.net/7Qb74/

实例:http://jsfiddle.net/7Qb74/