I'm building a scraper and I've come across some HTML I don't know how to parse. I have a piece of code like this.
我正在构建一个scraper,我遇到了一些我不知道如何解析的HTML。我有一段这样的代码。
<div>
<span>SomeHeader</span>
"Some text"
<span>SomeOtherHeader</span>
"More text"
</div>
In JS or JQuery, I want to find "SomeHeader", and look for the "Sometext" after it without the "More Text".
在JS或JQuery中,我希望找到“SomeHeader”,然后在后面查找“Sometext”,而不是“More Text”。
Any help is appreciated!
任何帮助都是赞赏!
3 个解决方案
#1
1
You can use :contains()
selector to find element contain some text but this selector isn't exact. For example $("span:contains(Text)")
select both of span in bottom.
您可以使用:contains()选择器来查找包含一些文本的元素,但是这个选择器并不精确。例如$(“span:contains(Text)”)在底部选择这两个span。
<span>Text</span>
<span>Text text</span>
You need to use .filter( function )
method to check text of element exactly, then select element. When you selected element, use nextSibling
property to get sibling text of element.
您需要使用.filter(函数)方法准确地检查元素的文本,然后选择element。选择元素时,使用nextSibling属性获取元素的同级文本。
var targetSpan = $("div > span").filter(function() {
return $(this).text() === "SomeHeader";
});
var text = targetSpan[0].nextSibling.nodeValue.trim();
console.log(text);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div>
<span>SomeHeader</span>
"Some text"
<span>SomeOtherHeader</span>
"More text"
</div>
#2
1
After you get reference to the DIV element, you can use its "textContent()" method to extract all the text in the DIV element and its children. Then it's just a matter of finding the occurrence of what you're looking for. You could use a Regular Expression, like "/SomeHeader*/", then "/SomeOtherHeader/", to extract what you want...
在获得对DIV元素的引用之后,可以使用它的“textContent()”方法提取DIV元素及其子元素中的所有文本。然后就是找到你要找的东西的发生。您可以使用一个正则表达式,例如“/SomeHeader*/”,然后“/SomeOtherHeader/”,来提取您想要的……
#3
1
You may try something like this :
你可以试试这样的方法:
$('div')
.contents()
.filter(function () {
if($(this).text() == "SomeHeader") {
alert($(this)[0].nextSibling.nodeValue);
}
});
Example : https://jsfiddle.net/DinoMyte/bko2wsbu/1/
例如:https://jsfiddle.net/DinoMyte/bko2wsbu/1/
#1
1
You can use :contains()
selector to find element contain some text but this selector isn't exact. For example $("span:contains(Text)")
select both of span in bottom.
您可以使用:contains()选择器来查找包含一些文本的元素,但是这个选择器并不精确。例如$(“span:contains(Text)”)在底部选择这两个span。
<span>Text</span>
<span>Text text</span>
You need to use .filter( function )
method to check text of element exactly, then select element. When you selected element, use nextSibling
property to get sibling text of element.
您需要使用.filter(函数)方法准确地检查元素的文本,然后选择element。选择元素时,使用nextSibling属性获取元素的同级文本。
var targetSpan = $("div > span").filter(function() {
return $(this).text() === "SomeHeader";
});
var text = targetSpan[0].nextSibling.nodeValue.trim();
console.log(text);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div>
<span>SomeHeader</span>
"Some text"
<span>SomeOtherHeader</span>
"More text"
</div>
#2
1
After you get reference to the DIV element, you can use its "textContent()" method to extract all the text in the DIV element and its children. Then it's just a matter of finding the occurrence of what you're looking for. You could use a Regular Expression, like "/SomeHeader*/", then "/SomeOtherHeader/", to extract what you want...
在获得对DIV元素的引用之后,可以使用它的“textContent()”方法提取DIV元素及其子元素中的所有文本。然后就是找到你要找的东西的发生。您可以使用一个正则表达式,例如“/SomeHeader*/”,然后“/SomeOtherHeader/”,来提取您想要的……
#3
1
You may try something like this :
你可以试试这样的方法:
$('div')
.contents()
.filter(function () {
if($(this).text() == "SomeHeader") {
alert($(this)[0].nextSibling.nodeValue);
}
});
Example : https://jsfiddle.net/DinoMyte/bko2wsbu/1/
例如:https://jsfiddle.net/DinoMyte/bko2wsbu/1/