通过JavaScript获取HTML属性值

时间:2022-11-27 10:38:43

I have a website where I feed information to an analytics engine via the meta tag as such:

我有一个网站,我通过meta标签向分析引擎提供信息:

<meta property="analytics-track" content="Hey&nbsp;There!">

I am trying to write a JavaScript script (no libraries) to access the content section and retrieve the information as is. In essence, it should include the HTML entity and not transform/strip it.

我正在尝试编写一个JavaScript脚本(没有库)来访问内容部分并按原样检索信息。本质上,它应该包含HTML实体,而不是转换/删除它。

The reason is that I am using PhantomJS to examine which pages have HTML entities in the meta data and remove them as they screw up my analytics data (For example, I'll have entries that include both Hey There! and Hey&nbsp;There! when in fact they are both the same page, and thus should not have two separate data points).

原因是我正在使用PhantomJS检查哪些页面在元数据中包含HTML实体,并在它们搞砸我的分析数据时删除它们(例如,我将有包含这两个元素的条目!和Hey 在那里!实际上它们是同一个页面,因此不应该有两个单独的数据点)。

The most simple JS format I have is this:

我拥有的最简单的JS格式是:

document.getElementsByTagName('meta')[4].getAttribute("content")

And when I examined it in on console, it returns the text in the following format:

当我在控制台检查它时,它以以下格式返回文本:

"Hey There!"

What I would like it to return is:

我希望它返回:

"Hey&nbsp;There!"

How can I ensure that the data returned will keep the HTML entity. If that's not possible, is there a way to detect HTML entity via JavaScript. I tried:

如何确保返回的数据将保留HTML实体。如果不可能,有没有一种方法可以通过JavaScript检测HTML实体。我试着:

document.getElementsByTagName('meta')[4].getAttribute("content").includes('&nbsp;')

But it returns false

但它返回错误

3 个解决方案

#1


4  

Use queryselector to select the element with the property value "analytics-track", outerHTML to get the element as a String and match to select the unparsed value of the content property with Regex.

使用queryselector选择属性值“analytics-track”的元素,outerHTML将元素作为字符串获取,并匹配使用Regex选择内容属性的未解析值。

document.querySelector('[property=analytics-track]').outerHTML.match(/content="(.*)"/)[1];

See http://jsfiddle.net/sjmcpherso/mz63fnjg/

参见http://jsfiddle.net/sjmcpherso/mz63fnjg/

#2


2  

You can't, that &nbsp; isn't really there. Its just an encoding for a non-breaking space. To the document, the DOM, the web page, to everything, it looks like:

你不能,,,并不真的在那里。它只是一个不间断空格的编码。对于文档、DOM、web页面,对于所有东西,它看起来是:

Hey There!

Except the character between the y and the T isn't a space of the sort you'd get by hitting the space bar, its a completely different character.

除了y和T之间的字符不是你通过空格键得到的空间,它是一个完全不同的字符。

Observe:

观察:

<span id='a' data-a='Hey&nbsp;There!'></span>
<span id='a1' data-a='Hey&nbsp;There!'></span>
<span id='b' data-b='Hey There!'></span>

var a = document.getElementById('a').getAttribute('data-a')
var a1 = document.getElementById('a1').getAttribute('data-a')
var b = document.getElementById('b').getAttribute('data-b')
console.log(a,b,a==b)
console.log(a,a1,a==a1)

Gives:

给:

Hey There! Hey There! false
Hey There! Hey There! true

Instead, consider altering your method of 'equality' to view a space and a non-breaking space as equal:

相反,考虑改变你的“平等”方法来看待一个空间和一个不破坏的空间:

var re = '/(\xC2\xA0/|&nbsp;)';
x = x.replace(re, ' ');

#3


1  

To get the HTML of the meta tag as is, use outerHTML:

要获得meta标记的HTML as is,请使用outerHTML:

document.getElementsByTagName('meta')[4].outerHTML

Working Snippet:

工作代码片段:

console.log(document.getElementsByTagName('meta')[0].outerHTML);
<meta property="analytics-track" content="Hey&nbsp;There!">
<h3>Check your console</h3>

Element.outerHTML - Web APIs | MDN

元素。Web api | MDN


Update 1:

更新1:

To filter out the meta content, use the following:

要过滤掉元内容,请使用以下内容:

metaInfo.match(/content="(.*)">/)[1];  // assuming that content attribute is always at the end of the meta tag

Working Snippet:

工作代码片段:

var metaInfo = document.getElementsByTagName('meta')[0].outerHTML;

console.log(metaInfo);

console.log('Meta Content = ' + metaInfo.match(/content="(.*)">/)[1]);
<meta property="analytics-track" content="Hey&nbsp;There!">
<h3>Check your console</h3>

#1


4  

Use queryselector to select the element with the property value "analytics-track", outerHTML to get the element as a String and match to select the unparsed value of the content property with Regex.

使用queryselector选择属性值“analytics-track”的元素,outerHTML将元素作为字符串获取,并匹配使用Regex选择内容属性的未解析值。

document.querySelector('[property=analytics-track]').outerHTML.match(/content="(.*)"/)[1];

See http://jsfiddle.net/sjmcpherso/mz63fnjg/

参见http://jsfiddle.net/sjmcpherso/mz63fnjg/

#2


2  

You can't, that &nbsp; isn't really there. Its just an encoding for a non-breaking space. To the document, the DOM, the web page, to everything, it looks like:

你不能,,,并不真的在那里。它只是一个不间断空格的编码。对于文档、DOM、web页面,对于所有东西,它看起来是:

Hey There!

Except the character between the y and the T isn't a space of the sort you'd get by hitting the space bar, its a completely different character.

除了y和T之间的字符不是你通过空格键得到的空间,它是一个完全不同的字符。

Observe:

观察:

<span id='a' data-a='Hey&nbsp;There!'></span>
<span id='a1' data-a='Hey&nbsp;There!'></span>
<span id='b' data-b='Hey There!'></span>

var a = document.getElementById('a').getAttribute('data-a')
var a1 = document.getElementById('a1').getAttribute('data-a')
var b = document.getElementById('b').getAttribute('data-b')
console.log(a,b,a==b)
console.log(a,a1,a==a1)

Gives:

给:

Hey There! Hey There! false
Hey There! Hey There! true

Instead, consider altering your method of 'equality' to view a space and a non-breaking space as equal:

相反,考虑改变你的“平等”方法来看待一个空间和一个不破坏的空间:

var re = '/(\xC2\xA0/|&nbsp;)';
x = x.replace(re, ' ');

#3


1  

To get the HTML of the meta tag as is, use outerHTML:

要获得meta标记的HTML as is,请使用outerHTML:

document.getElementsByTagName('meta')[4].outerHTML

Working Snippet:

工作代码片段:

console.log(document.getElementsByTagName('meta')[0].outerHTML);
<meta property="analytics-track" content="Hey&nbsp;There!">
<h3>Check your console</h3>

Element.outerHTML - Web APIs | MDN

元素。Web api | MDN


Update 1:

更新1:

To filter out the meta content, use the following:

要过滤掉元内容,请使用以下内容:

metaInfo.match(/content="(.*)">/)[1];  // assuming that content attribute is always at the end of the meta tag

Working Snippet:

工作代码片段:

var metaInfo = document.getElementsByTagName('meta')[0].outerHTML;

console.log(metaInfo);

console.log('Meta Content = ' + metaInfo.match(/content="(.*)">/)[1]);
<meta property="analytics-track" content="Hey&nbsp;There!">
<h3>Check your console</h3>