I saw in this forum an answare close to my "request" but not enough (Regexp to capture string between delimiters).
我在这个论坛中看到了一个接近我的“请求”的answare,但还不够(Regexp捕获分隔符之间的字符串)。
My question is: I have an HTML page and I would get only the src of all "img" tags of this page and put them in one array without using cheerio (I'm using node js).
我的问题是:我有一个HTML页面,我只得到这个页面的所有“img”标签的src,并将它们放在一个数组中而不使用cheerio(我正在使用节点js)。
The problem is that i would prefer to exclude the delimiters. How could i resolve this problem?
问题是我宁愿排除分隔符。我怎么能解决这个问题?
1 个解决方案
#1
0
Yes this is possible with regex, but it would be much easier (and probably faster but don't quote me on that) to use a native DOM method. Let's start with the regex approach. We can use a capture group to easily parse the src
of an img
tag:
是的,这可以使用正则表达式,但是使用本机DOM方法会更容易(并且可能更快但不要引用我)。让我们从正则表达式方法开始。我们可以使用捕获组轻松解析img标记的src:
var html = `test<div>hello</div>
<img src="first">
<img class="test" src="second" data-lang="en">
test
<img src="third" >`;
var srcs = [];
html.replace(/<img[^<>]*src=['"](.*?)['"][^<>]*>/gm, (m, $1) => { srcs.push($1) })
console.log(srcs);
However, the better way would be to use getElementsByTagName
:
(note the following will get some kind of parent domain url since the srcs are relative/fake but you get the idea)
但是,更好的方法是使用getElementsByTagName :(注意以下将获得某种父域url,因为srcs是相对/假的,但你明白了)
var srcs = [].slice.call(document.getElementsByTagName('img')).map(img => img.src);
console.log(srcs);
test<div>hello</div>
<img src="first">
<img class="test" src="second" data-lang="en">
test
<img src="third" >
#1
0
Yes this is possible with regex, but it would be much easier (and probably faster but don't quote me on that) to use a native DOM method. Let's start with the regex approach. We can use a capture group to easily parse the src
of an img
tag:
是的,这可以使用正则表达式,但是使用本机DOM方法会更容易(并且可能更快但不要引用我)。让我们从正则表达式方法开始。我们可以使用捕获组轻松解析img标记的src:
var html = `test<div>hello</div>
<img src="first">
<img class="test" src="second" data-lang="en">
test
<img src="third" >`;
var srcs = [];
html.replace(/<img[^<>]*src=['"](.*?)['"][^<>]*>/gm, (m, $1) => { srcs.push($1) })
console.log(srcs);
However, the better way would be to use getElementsByTagName
:
(note the following will get some kind of parent domain url since the srcs are relative/fake but you get the idea)
但是,更好的方法是使用getElementsByTagName :(注意以下将获得某种父域url,因为srcs是相对/假的,但你明白了)
var srcs = [].slice.call(document.getElementsByTagName('img')).map(img => img.src);
console.log(srcs);
test<div>hello</div>
<img src="first">
<img class="test" src="second" data-lang="en">
test
<img src="third" >