如何匹配给定HTML标记内的数字？

I would like to match the numbers inside an HTML tag such as:

我想匹配HTML标记内的数字,例如:

Sometext<sometag><htmltag>123123</htmltag></sometag>

I would like to create a regex that finds the number that is inside the HTML tag of my choice, for example the 123123 inside <htmltag>.

我想创建一个正则表达式,找到我选择的HTML标记内的数字,例如中的123123。

2 个解决方案

#1

No, you don't need to "match", you need to extract an HTML node. Use an HTML parser. An HTML parser is simpler to use, more robust against changes, and easier to extend (e.g. grabbing more parts of the same document). A regular expression, on the other hand, is just the wrong tool, because HTML is not a regular language.

不,您不需要“匹配”,您需要提取HTML节点。使用HTML解析器。 HTML解析器使用起来更简单,对更改更加健壮,并且更容易扩展(例如,抓取同一文档的更多部分)。另一方面,正则表达式只是错误的工具,因为HTML不是常规语言。

#2

If all there is between those two tags is the number, and absolutely no white space or anything, you can simply use this regex:

如果这两个标签之间只有数字,绝对没有空格或任何东西,你可以简单地使用这个正则表达式:

/<htmltag>([0-9]+)<\/htmltag>/

Or this if there might be whitespace:

或者如果可能有空格:

/<htmltag>\s*([0-9]+)\s*<\/htmltag>/

#1

#2